Tribal SQL
Tribal SQL
TRIBAL SQL
By
Dave Ballantyne, John Barnett, Diana Dee,
Kevin Feasel, Tara Kizer, Chuck Lathrope,
Stephanie Locke, Colleen Morrow,
Dev Nambi, Bob Pusateri,
Mark S. Rasmussen, Wil Sisney,
Shaun J. Stuart, David Tate, Matt Velic
3
Copyright for each chapter belongs to the attributed author, 2013
ISBN – 978-1-906434-80-9
The right of Dave Ballantyne, John Barnett, Diana Dee, Kevin Feasel, Tara Kizer, Chuck Lathrope, Stephanie
Locke, Colleen Morrow, Dev Nambi, Bob Pusateri, Mark S. Rasmussen, Wil Sisney, Shaun J. Stuart, David
Tate and Matt Velic to be identified as the author of his or her attributed chapter has been asserted by each in
accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication
may be reproduced, stored or introduced into a retrieval system, or transmitted, in any form, or by any means
(electronic, mechanical, photocopying, recording or otherwise) without the prior written consent of the
publisher. Any person who does any unauthorized act in relation to this publication may be liable to criminal
prosecution and civil claims for damages. This book is sold subject to the condition that it shall not, by way of
trade or otherwise, be lent, re-sold, hired out, or otherwise circulated without the publisher's prior consent in
any form other than which it is published and without a similar condition including this condition being
imposed on the subsequent publisher.
4
Table of Contents
About This Book
The Tribal Authors
Computers 4 Africa
The Tribal Reviewers and Editors
The Tribal Sponsors
Introduction
SQL Server Storage Internals 101
Mark S. Rasmussen
The Power of Deductive Reasoning
Records
Pages
Investigating Page Contents Using DBCC Commands
DBCC PAGE
DBCC IND
Heaps and Indexes
Heaps
Indexes
Crunching the Numbers
Index design
Storage requirements
Summary
5
Performing test restores
The magic happens here
Summary
6
Policy Based Management
Change Data Capture and Change Tracking
The Basis of Auditing: Events and Event Classes
SQL Trace
Server-side trace for DDL auditing
Using the default trace
SQL Trace: pros and cons
SQL Audit
SQL Audit: how it works
SQL Audit: terminology
SQL Audit: creating the audit
SQL Audit: creating the audit specification
SQL Audit: viewing audit output
SQL Audit: pros and cons
Develop Your Own Audit: Event Notifications or Triggers
Event notifications: how it works
Event notifications: creating an event notification
DDL and Logon triggers: how they work
DDL and Logon triggers: creating triggers
Event notifications and triggers: pros and cons
Third-party Solutions
Conclusion
7
Configure an operator
Configuring notifications for Agent jobs
Provide a Real-time Notification System for SQL Server Alerts
Troubleshooting Database Mail
Common problems
Interrogating the Database Mail system views
Maintaining the Database Mail log table
The SSMS Database Mail log
Summary
8
Initial visualization
Designing tables and graphs
Choosing the Right Reporting Tools
Reporting in the Microsoft stack
Predicting the future
Outside the Microsoft stack
Who decides and, most importantly, who pays?
A personal story
Summary
9
Habit 4: Train to Gain an Edge
Habit 5: Stand on the Shoulders of Giants
Habit 6: Control the Headlines
Habit 7: Write a Self-Appraisal that Sparks Memory of Success
Habit 8: Use Your Review to Negotiate Rewards
Habit 9: Don't Rest on Past Success
10
About This Book
The Tribal Authors
Dave Ballantyne
Dave is a freelance SQLServer database developer/designer and has been working in
the IT field for over 20 years, during the past 15 of which he has been specializing
within the SQL Server environment.
Dave is a regular speaker at UK events such as SQL Bits and user groups and is
founder of the SQLLunch User Group.
John Barnett
John has been working in IT now for over 15 years. Over this time he's had a variety
of roles in the public and private sectors. Responsibilities have included database
administration, system administration and applications support and development. Most
recently this has been in the UK Higher Education sector, supporting and developing
with a variety of vendor-supported and in-house applications.
Diana Dee
Diana is a Microsoft Certified Trainer who has taught Microsoft courses to IT
professionals and college students since 1996. She has been a course developer
throughout her computer teaching career. She has revised and developed database
design, administration, and querying courses for an online University and for a major
training company. Diana has presented at five SQL Saturdays (so far).
Kevin Feasel
Kevin Feasel is a database administrator at American Health Holding, a subsidiary of
Aetna, where he specializes in SSIS development, performance tuning, and pulling
rabbits out of hats on demand. A resident of Durham, North Carolina, he can be found
cycling the trails along the triangle whenever the weather's nice enough.
Tara Kizer
Tara Kizer is a Database Administrator and has been using SQL Server since 1996.
She has worked at Qualcomm Incorporated since 2002 and supports several 24x7
mission-critical systems with very high performance and availability requirements.
She obtained her Bachelor of Science in Mathematics with emphasis on Computer
Science from San Diego State University in 1999. She has been a Microsoft MVP
since July of 2007 and is an active member at SQLTeam.com where she posts under
the name tkizer. Tara lives in San Diego, California with her husband and kids. You
can reach her via her blog, Ramblings of a DBA (HTTP://WEBLOGS.SQLTEAM.COM/TARAD
).
Chuck Lathrope
Chuck Lathrope is a Seattle-based SQL Server administrator with over twenty years of
IT experience. He was a Top-5 nominee in the Red Gate Exceptional DBA Award in
11
2009. Currently, he manages a team of DBAs who support a very large SQL Server
replication environment across many datacenters in the USA. Chuck often speaks at
SQL Saturday events on replication. He blogs at WWW .SQLW EBPEDIA.COM , and
tweets via his account @SQLGuyChuck.
Stephanie Locke
Stephanie Locke (@SteffLocke) is an experienced analyst and BI developer within the
Finance and Insurance industries. Steph runs her local SQL Server User Group, is one
of the organizers of SQLRelay (HTTP://SQLRELAY.CO.UK/ ) and contributes to the
community at large via blogging at HTTP://STEFF.ITSALOCKE.COM/ , presenting, and
forums.
Colleen Morrow
Colleen Morrow began her career in databases providing technical support for
Informix Software. After a brief foray into programming, she decided to try her hand
in database administration. She's been at it ever since. Colleen spent 12 years at her
previous position as a DBA for a large law firm, focusing primarily on SQL Server for
the last 9 years. In October of 2012 she moved to her current position as a Database
Engineer at a software company where she provides consulting services on database
administration and performance tuning. But a little bit of the programmer inside lives
on; you'll often find her writing PowerShell or T-SQL scripts to make life a little
easier.
Dev Nambi
Dev is a data geek, developer, and aspiring polymath. He works with databases,
statistics, and curiosity to solve problems using data. He is currently working in the
University of Washington's Decision Support group.
Bob Pusateri
B ob is a Microsoft Certified Master and has been working with SQL Server since
2006. He is currently a Database Administrator at Northwestern University where he
uses data compression to manage databases over 25 TB in size for their Medical
Enterprise Data Warehouse.
He lives near Chicago, Illinois with his wife, Michelle and their big orange kitty,
Oliver. You can reach Bob on Twitter at @SQLBob, or through his blog, “The Outer
Join” at HTTP://WWW.BOBPUSATERI.COM ,
Mark S Rasmussen
Mark, a SQL Server MVP, has worked extensively with SQL Server through his years
as a consultant as well as in his current position as the CTO of iPaper A/S. He has
primarily been working on performance tuning as well as diving into the internals of
SQL Server. Besides SQL Server, Mark is also proficient in the .NET development
stack, with a decade of experience. Fueled by his interest in development and the nitty-
gritty details, Mark has created OrcaMDF, an open source parser for SQL Server data
files, written in C#. He enjoys presenting, and frequently speaks at local user groups as
well as international events. Mark keeps a blog at HTTP://IMPROVE.DK and tweets at
@improvedk.
12
Wil Sisney
Wil is a database administrator specializing in performance, core administration and
SQL Server Integration Services. He also spends a lot of time training on SQL Server
and blogs about new training opportunities. He teaches others about SQL Server
through corporate training, user groups and events like SQL Saturday.
Shaun J Stuart
Shaun J. Stuart is the Senior SQL Server DBA at the largest credit union in Arizona.
He has a Bachelor's degree in electrical engineering, but has been working with
databases for over 15 years. He started his database career as a database developer
before moving into the database administrator role. He became a Microsoft Certified
Professional on SQL Server 7.0 and is a Microsoft Certified Technology Specialist
(MCTS) on SQL Server 2005 and 2008 and a Microsoft Certified IT Professional
(MCITP) on SQL Server 2008. Shaun blogs about SQL Server at
WWW.SHAUNJSTUART.COM and can occasionally be found on Twitter as @shaunjstu.
Shaun would like to thank Jen McCown for her work in creating and shepherding this
Tribal SQL Project and Thomas LaRock, whose article about sampling on Simple-
Talk.com inspired him to implement this process.
David Tate
David Tate is a full-stack software consultant but his family thinks that he fixes
wireless computer printers. David uses his work energy to create things that do not
compare to the simple elegance of the tree outside his window. He is consistently bad
at choosing reheating times for the microwave and his channel-changing algorithm is
despised by all that have lived with him.
In his spare time he likes to ride bicycles. He once saw a bird that looked like
Madonna. He blogs at HTTP://DAVIDTATE.ORG and tweets @mixteenth.
Matt Velic
Matt Velic is a Database Developer for Sanametrix in Washington, DC. He enjoys
helping others succeed by ensuring that they've got the proper resources and support.
Towards that goal, Matt is a co-leader of the official PASSDC User Group, helps to
organize DC SQL Saturday events, finds new speakers for PASS's “Oracle and SQL
Server” Virtual Chapter, blogs, presents, and loves hanging out on Twitter.
Computers 4 Africa
All the Tribal SQL authors have agreed to donate their royalties to Computers 4 Africa
(HTTP://WWW.COMPUTERS4AFRICA.ORG.UK/INDEX.PHP ).
Computers 4 Africa is a registered charity operating as a social enterprise. We collect
redundant IT, which is refurbished and data-wiped before being sent out to African
schools, colleges, and selected community projects.
Our mission is to help lift the continent of Africa out of the poverty trap by equipping
the next generation to work in a global environment. This is the 21st Century version
of “….teach a man to fish.....”. We do this by supplying the best value computers in
13
the areas where we operate.
At our central processing unit in Kent we receive working redundant computers
through collections and local donations from around the UK. The donated equipment
is then treated and sent out to beneficiaries in Africa.
Beneficiaries pay a contribution towards the cost of preparing and shipping the
equipment – but at the best price available in their locality – that is our ambition. In
this way we make modern IT available to those that would otherwise never get to use a
computer in their years at school. Computers 4 Africa targets the poorest causes by
donating 10% of the computers we send out.
Volunteer Reviewers
Many people gave their time freely, technically reviewing chapters during the writing
of the initial chapter drafts. We'd like to say thank you to:
14
@SQLRUs @BrentO
Craig Purnell Thomas Rushton Meredith Ryan
@CraigPurnell @ThomasRushton @coffegrl
Ben Seaman Steve Stedman Thomas Stringer
@thetornpage @SQLEmt @SQLLife
Jason Thomas Robert Volk Ed Watson
@de_unparagoned @sql_r @SQLGator
Jason Yousef Melody Zacharias James Zimmerman
@Huslayer @SQLMelody
Kat Hicks
During the second round of reviews, Kat technically reviewed and edited the
completed drafts of Chapters 6 –10 and 15 .
Kat Hicks , DBA extraordinaire, works at Rackspace in beautiful San Antonio, TX,
and she's been databasing for over 12 years. MS SQL has been her primary focus,
from 6.5 all the way through 2012. She has a slew of letters after her name too – but
she's still more certifiable than certified. Her favorites (apart from her wonderful
family) include horror movies, superheroes, great fiction, cramming as many words
and commas as possible into a single sentence and, last but not least, writing random
T-SQL scripts to make her life easier.
15
volunteer for the Professional Association for SQL Server, and that's just in our spare
time. We are happiest when we're making data systems run better, or showing
someone else how.
For years now, we've been putting out free technology tutorial videos on the
MidnightDBA website (WWW.MIDNIGHT DBA. COM ). Our favorite subject matter is
SQL Server and PowerShell, but we'll present on whatever strikes our fancy. You can
also find recordings of the classes we teach at user groups and international
conferences, and of our live weekly IT web show, DBAs@Midnight. All of these
videos are designed to help data professionals do their jobs more effectively, and to
serve as a reference for our own use from time to time!
Tribal SQL was a wonderful project, and we're thrilled that so many people were brave
enough to take part. The authors worked hard on this, because they had something to
say. The editors worked hard to help them say it. Red Gate worked very hard to make
sure we were all correct and presentable. To everyone involved: thanks for making
Tribal SQL a reality!
16
Introduction
A while back, I invited the unpublished masses to submit abstracts for a new-author-written SQL
book – called Tribal SQL – and the people spoke. The chosen ones are now feverishly writing their
first drafts! – Jen McCown, March 2012
In late 2011, Sean and Jen McCown finished their weekly web show, and relaxed with
the chat room audience. Talk ranged to the recently published SQL Server MVP Deep
Dives, Volume 2 , in which 64 SQL Server MVPs, Jen included, contributed a chapter
each, with proceeds going to charity.
Everyone loved the model, but lamented that only MVPs could take part. Wouldn't it
be nice if new voices in the SQL Community could contribute? Jen asked if anyone
would be interested in such a project, and four people immediately volunteered. Tribal
SQL was born.
When prospective authors asked, “What kind of book will this be? What should I write
about?” Jen's response was simple: This is a book for DBAs, for things you think they
really ought to know…so what do you think belongs in it?
Fifteen experienced SQL people, all previously unpublished authors, have contributed
a chapter each to share their hard-won knowledge. The result? We have insights into
how to reduce data size and optimize performance with compression, verify backups,
tune SQL Server with traces and extended events, audit SQL Server activity,
implement replication, and more. Side by side with these, we have chapters on the
importance to DBAs of communicating clearly with their co-workers and business
leaders, presenting data as useful information that the business can use to make
decisions, adopting a more Agile approach to their work, and learning sound project
management skills.
Tribal SQL is a reflection of a DBA's core and long-standing responsibilities for
database security, availability and performance. Tribal SQL is a discussion of new
ideas about how the DBA role is evolving, and what it means to be a DBA in today's
businesses.
Code Examples
We provide a code download bundle containing every script in this book, and a few
more that were too large to present in the text. You can find it on the Tribal SQL
website ( HTTP://TRIBALSQL.COM ) or download it directly from the following URL:
HTTP://WWW.SIMPLE-TALK.COM /R ED G ATE B OOKS /T RIBAL SQL/T RIBAL _C ODE.ZIP
17
Feedback and Errata
We've tried our very best to ensure that this book is useful, technically accurate, and
written in plain, simple language. If we've erred on any of these elements, or you just
want to let us know what you think of the book, we want to hear from you.
Please post your feedback and errata to the Tribal SQL book page, here:
HTTP://TRIBALSQL.COM/FEEDBACK .
18
SQL Server Storage Internals 101
Mark S. Rasmussen
In this chapter, I offer a concise introduction to the physical storage internals behind
SQL Server databases. I won't be diving into every detail because my goal is to
provide a simple, clear picture of how SQL Server stores data.
Why is this useful? After all, the day-to-day routine of most SQL Server developers
and DBAs doesn't necessarily require detailed knowledge of SQL Server's inner
workings and storage mechanisms. However, armed with it, we will be in a much
better position to make optimal design decisions for our systems, and to troubleshoot
them effectively when things go wrong and system performance is suffering.
There are so many optimization techniques in a modern RDBMS that we're unlikely to
learn them all. What if, instead of striving to master every technique, we strive to
understand the underlying structures that these techniques try to optimize? Suddenly,
we have the power of deductive reasoning. While we might not know about every
specific feature, we can deduce why they exist as well as how they work. It will also
help us to devise effective optimization techniques of our own.
In essence, this is a simple manifestation of the ancient Chinese proverb:
Give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime.
It's the reason I believe every SQL Server DBA and developer should have a sound
basic understanding, not just of what storage objects exist in SQL Server (heaps and
indexes), but of the underlying data structures (pages and records) that make up these
objects, what they look like at the byte level and how they work. Once we have this,
we can make much better decisions about our storage requirements, in terms of
capacity, as well as the storage structures we need for optimal query performance.
19
Quick to fetch my calculator, I started crunching the numbers. Assuming each
magazine had an average of 50 pages, each page being viewed at least once an hour,
this would result in roughly half a million rows (50 pages * 365 days * 24 hours ) of
data in our statistics table in the database, per year, per magazine. If we were to end up
with, say, 1,000 magazines then, well, this was approaching more data than I could
think about comfortably and, without any knowledge of how a SQL Server database
stored data, I leapt to the conclusion that it would not be able to handle it either.
In a flash of brilliance, an idea struck me. I would split out the data into separate
databases! Each magazine would get its own statistics database, enabling me to filter
the data just by selecting from the appropriate statistics database, for any given
magazine (all of these databases were stored on the same disk; I had no real grasp of
the concept of I/O performance at this stage).
I learned my lesson the hard way. Once we reached that magic point of having
thousands of magazines, our database backup operations were suffering. Our backup
window, originally 15 minutes, expanded to fill 6 hours, simply due to the need to
back up thousands of separate databases. The DDL involved in creating databases on
the fly as well as creating cross-database queries for comparing statistics…well, let's
just say it wasn't optimal.
Around this time, I participated in my first SQL Server-oriented conference and
attended a Level 200 session on data file internals. It was a revelation and I
immediately realized my wrongdoing. Suddenly, I understood why, due to the way
data was stored and traversed, SQL Server would easily have been able to handle all of
my data. It struck me, in fact, that I'd been trying to replicate the very idea behind a
clustered index, just in a horribly inefficient way.
Of course, I already knew about indexes, or so I thought. I knew you created them on
some columns to make queries work faster. What I started to understand thanks to this
session and subsequent investigations, was how and why a certain index might help
and, conversely, why it might not. Most importantly, I learned how b-tree structures
allowed SQL Server to efficiently store and query enormous amounts of data.
Records
A record , also known as a row, is the smallest storage structure in a SQL Server data
file. Each row in a table is stored as an individual record on disk. Not only table data is
stored as records, but also indexes, metadata, database boot structures and so forth.
However, we'll concentrate on only the most common and important record type,
namely the data record , which shares the same format as the index record.
Data records are stored in a fixedvar format. The name derives from the fact that
there are two basic kinds of data types, fixed length and variable length. As the name
implies, fixed-length data types have a static length from which they never deviate.
Examples are 4-byte integers, 8-byte datetimes, and 10-byte characters (char(10)).
20
Variable-length data types, such as varchar(x) and varbinary(x) , have a length
that varies on a record-by-record basis. While a varchar(10) might take up 10 bytes
in one record, it might only take up 3 bytes in another, depending on the stored value.
Figure 1 shows the basic underlying fixedvar structure of every data record.
Every data record starts with two status bytes, which define, among other things:
• the record type – of which the data and index types are, by far, the most common
and important
• whether the record has a null bitmap – one or more bytes used to track whether
columns have null values
• whether the record has any variable-length columns.
The next two bytes store the total length of the fixed-length portion of the record. This
is the length of the actual fixed-length data, plus the 2 bytes used to store the status,
and the 2 bytes used to store the total fixed length. We sometimes refer to the fixed-
length length field as the null bitmap pointer , as it points to the end of the fixed-
length data, which is where the null bitmap starts.
The fixed-length data portion of the record stores all of the column data for the fixed-
length data types in the table schema. The columns are stored in physical order and so
can always be located at a specific byte index in the data record, by calculating the
size of all the previous fixed-length columns in the schema.
The next two areas of storage make up the null bitmap , an array of bits that keep
21
track of which columns contain null values for that record, and which columns have
non-null values in the record. As fixed-length data columns always take up their
allotted space, we need the null bitmap to know whether a value is null. For variable-
length columns, the null bitmap is the means by which we can distinguish between an
empty value and a null value. The 2 bytes preceding the actual bitmap simply store the
number of columns tracked by the bitmap. As each column in the bitmap requires a
bit, the required bytes for the null bitmap can be calculated by dividing the total
number of columns by 8 and then rounding up to the nearest integer: CEIL(#Cols /
8).
Finally, we have the variable-length portion of the record, consisting of 2 bytes to store
the number of variable-length columns, followed by a variable-length offset array ,
followed by the actual variable-length data.
Figure 2 shows an expanded example of the sections of the data record relating to
variable-length data.
We start with two bytes that indicate the number of variable-length columns stored in
the record. In this case, the value, 0x0200 , indicates two columns. Next up is a series
of two-byte values that form the variable-length offset array, one for each column,
pointing to the byte index in the record where the related column data ends. Finally,
we have the actual variable-length columns.
Since SQL Server knows the data starts after the last entry in the offset array, and
knows where the data ends for each column, it can calculate the length of each
column, as well as query the data.
Pages
Theoretically, SQL Server could just store a billion records side by side in a huge data
file, but that would be a mess to manage. Instead, it organizes and stores records in
smaller units of data, known as pages. Pages are also the smallest units of data that
SQL Server will cache in memory (handled by the buffer manager).
There are different types of pages; some store data records, some store index records
and others store metadata of various sorts. All of them have one thing in common,
which is their structure. A page is always exactly 8 KB (8192 bytes) in size and
contains two major sections, the header and the body. The header has a fixed size of 96
bytes and has the same contents and format, regardless of the page type. It contains
information such as how much space is free in the body, how many records are stored
22
in the body, the object to which the page belongs and, in an index, the pages that
precede and succeed it.
The body takes up the remaining 8096 bytes, as depicted in Figure 3 .
At the very end of the body is a section known as the record offset array , which is an
array of two-byte values that SQL Server reads in reverse from the very end of the
page. The header contains a field that defines the number of slots that are present in
the record offset array, and thus how many two-byte values SQL Server can read. Each
slot in the record offset array points to a byte index in the page where a record begins.
The record offset array dictates the physical order of the records. As such, the very
last record on the page, logically, may very well be the first record, physically.
Typically, you'll find that the first slot of the record offset array, stored in the very last
two bytes of the page, points to the first record stored at byte index 96, which is at the
very beginning of the body, right after the header.
If you've ever used any of the DBCC commands, you will have seen record pointers in
the format (X:Y:Z) , pointing to data file X , page Y and slot Z. To find a record on
page Y , SQL Server first needs to find the path for the data file with id X. The file is
just one big array of pages, with the very first page starting at byte index 0, the next
one at byte index 8192, the third one at byte index 16384, and so on. The page number
correlates directly with the byte index, in that page 0 is stored at byte index 0*8192 ,
page 1 is stored at byte index 1*8192 , page 2 is stored at byte index 2*8192 and so
on. Therefore, to find the contents of page Y, SQL Server needs to read 8192 bytes
23
beginning at byte index Y*8192 . Having read the bytes of the page, SQL Server can
then read entry Z in the record offset array to find out where the record bytes are stored
in the body.
DBCC PAGE
By default, SQL Server sends the output from DBCC PAGE to the trace log and not as a
query result. To execute DBCC PAGE commands from SSMS and see the results
directly in the query results window, we first need to enable Trace Flag 3604, as
shown in Listing 1 .
--Enable
DBCC TRACEON (3604);
--Disable
DBCC TRACEOFF (3604);
The trace flag activates at the connection level, so enabling it in one connection will
not affect any other connections to the server. Likewise, as soon as the connection
closes, the trace flag will no longer have any effect. Having enabled the trace flag, we
can issue the DBCC PAGE command using the following syntax:
DBCC PAGE (<Database>, <FileID>, <PageID>, <Style>)
Database is the name of the database whose page we want to examine. Next, the
FileID of the file we want to examine; for most databases this will be 1 , as there will
only be a single data file. Execute Listing 2 within a specific database to reveal a list
of all data files for that database, including their FileID s.
24
Next, the PageID of the page we wish to examine. This can be any valid PageID in
the database. For example, the special file header page is page 0, page 9 is the equally
special boot page, which is only stored in the primary file with file_id 1, or any
other data page that exists in the file. Typically, you won't see user data pages before
page 17+.
Finally, we have the Style value:
• 0 – outputs only the parsed header values. That is, there are no raw bytes, only the
header contents.
• 1 – outputs the parsed header as well as the raw bytes of each record on the page.
• 2 – outputs the parsed header as well as the complete raw byte contents of the page,
including both the header and body.
• 3 – outputs the parsed header and the parsed record column values for each record
on the page. The raw bytes for each record are output as well. This is usually the
most useful style as it allows access to the header as well as the ability to correlate
the raw record bytes with the column data.
Listing 3 shows how you'd examine the rows on page 16 in the primary data file of the
AdventureWorks2008R2 database.
Looking at the output, you'll be able to see the page ID stored in the header
(m_pageId) , the number of records stored on the page (m_slotCnt) , the object ID
25
to which this page belongs (m_objId) and much more.
After the header, we see each record listed, one by one. The output of each record
consists of the raw bytes (Memory Dump), followed by each of the column values
(Slot 0 Column 1… , and so on). Note that the column values also detail how many
(physical) bytes they take up on disk, making it easier for you to correlate the value
with the raw byte output.
DBCC IND
Now that you know how to gain access to the contents of a page, you'll probably want
to do so for tables in your existing databases. What we need, then, is to know on which
pages a given table's records are stored. Luckily, that's just what DBCC IND provides
and we call it like this:
We specify the name of the database and the name of the object for which we wish to
view the pages. Finally, we can filter the output to just a certain index; 0 indicates a
heap, while 1 is the clustered index. If we want to see the pages for a specific non-
clustered index, we enter that index's ID. If we use -1 for the IndexID , we get a list of
all pages belonging to any index related to the specified object.
Listing 4 examines the Person.Person table in the SQL Server 2008 R2
Adventure-Works database, and is followed by the first five rows of the results (your
output may differ, depending on the release).
There are a couple of interesting columns here. The PageType column details the type
of page. For example, PageType 10 is an allocation structure page known as an IAM
page, which I'll describe in the next section. PageType 2 is an index page and
PageType 1 is a data page.
The first two columns show the file ID as well as the page ID of each of those pages.
Using those two values, we can invoke DBCC PAGE for the first data page, as shown in
Listing 5 .
26
DBCC PAGE (AdventureWorks2008R2, 1, 19904, 3);
Listing 5: Using DBCC PAGE to view the contents of a data page belonging to the Person.Person table.
Heaps
Heaps are the simplest data structures, in that they're just “a big bunch of pages,” all
owned by the same object. A special type of page called an index allocation map
(IAM) tracks which pages belong to which object. SQL Server uses IAM pages for
heaps and indexes, but they're especially important for heaps as they're the only
mechanism for finding the pages containing the heap data. My primary goal in this
chapter is to discuss index structure and design, so I'll only cover heaps briefly.
Each IAM page contains one huge bitmap, tracking 511,232 pages, or about 4 GB of
data. For the sake of efficiency, the IAM page doesn't track the individual pages, but
rather groups of eight, known as extents. If the heap takes up more than 4 GB of data,
SQL Server allocates another IAM page to enable tracking the pages in the next 4 GB
of data, leaving in the first IAM page's header a pointer to the next IAM page. In order
to scan a heap, SQL Server will simply find the first IAM page and then scan each
page in each extent to which it points.
One important fact to remember is that a heap guarantees no order for the records
within each page. SQL Server inserts a new record wherever it wants, usually on an
existing page with plenty of space, or on a newly allocated page.
Compared to indexes, heaps are rather simple in terms of maintenance, as there is no
physical order to maintain. We don't have to consider factors such as the use of an
ever-increasing key for maintaining order as we insert rows; SQL Server will just
append a record anywhere it fits, on its chosen page, regardless of the key.
However, just because heap maintenance is limited, it doesn't mean that heaps have no
maintenance issues. In order to understand why, we need to discuss forwarded
records.
Unlike in an index, a heap has no key that uniquely identifies a given record. If a non-
clustered index or a foreign key needs to point to a specific record, it does so using a
pointer to its physical location, represented as (FileID:PageID:SlotID) , also
known as a RID or a row identifier. For example (1:16:2 ) points to the third slot in
the 17th page (both starting at index 0) in the first file (which starts at index 1).
27
Imagine that the pointer to record (1:16:2 ) exists in 20 different places but that, due
perhaps to an update to a column value, SQL Server has to move the record from page
16 as there is no longer space for it. This presents an interesting performance problem.
If SQL Server simply moves the record to a new physical location, it will have to
update that physical pointer in 20 different locations, which is a lot of work. Instead, it
copies the record to a new page and converts the original record into a forwarding
stub , a small record taking up just 9 bytes storing a physical pointer to the new
record. The existing 20 physical pointers will read the forwarding stub, allowing SQL
Server to find the wanted data.
This technique makes updates simpler and faster, at the considerable cost of an extra
lookup for reads. As data modifications lead to more and more forwarded records, disk
I/O increases tremendously, as SQL Server tries to read records from all over the disk.
Listing 6 shows how to query the sys.dm_db_index_physical_stats DMV to
find all heaps with forwarded records in the AdventureWorks database. If you do
have any heaps (hopefully not), then monitor these values to decide when it's time to
issue an ALTER TABLE REBUILD command to remove the forwarded records.
SELECT o.name ,
ps.forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks2008R2'), NULL, NULL,
NULL, 'DETAILED') ps
INNER JOIN sys.objects o ON o.object_id = ps.object_id
WHERE forwarded_record_count > 0
Indexes
SQL Server also tracks which pages belong to which indexes through the IAM pages.
However, indexes are fundamentally different from heaps in terms of their
organization and structure. Indexes, clustered as well as non-clustered, store data
pages in a guaranteed logical order, according to the defined index key (physically,
SQL Server may store the pages out of order).
Structurally, non-clustered and clustered indexes are the same. Both store index pages
in a structure known as a b-tree. However, while a non-clustered index stores only the
b-tree structure with the index key values and pointers to the data rows, a clustered
index stores both the b-tree, with the keys, and the actual row data at the leaf level of
the b-tree. As such, each table can have only one clustered index, since the data can
only be stored in one location, but many non-clustered indexes that point to the base
data. With non-clustered indexes, we can include copies of the data for certain
columns, for example so that we can read frequently accessed columns without
touching the base data, while either ignoring the remaining columns or following the
index pointer to where the rest of the data is stored.
For non-clustered indexes, the pointer to the actual data may take two forms. If we
28
create the non-clustered index on a heap, the only way to locate a record is by its
physical location. This means the non-clustered index will store the pointer in the form
of an 8-byte row identifier. On the other hand, if we create the non-clustered index on
a clustered index, the pointer is actually a copy of the clustered key, allowing us to
look up the actual data in the clustered index. If the clustered key contains columns
already part of the non-clustered index key, those are not duplicated, as they're already
stored as part of the non-clustered index key.
Let's explore b-trees in more detail.
b-tree structure
The b-tree structure is a tree of pages, usually visualized as an upside-down tree,
starting at the top, from the root , branching out into intermediate levels, and finally
ending up at the bottom level, the leaf level. If all the records of an index fit on one
page, the tree only has one level and so the root and leaf level can technically be the
same level. As soon as the index needs two pages, the tree will split up into a root page
pointing to two child pages at the leaf level. For clustered indexes, the leaf level is
where SQL Server stores all the data; all the intermediate (that is, non-leaf) levels of
the tree contain just the data from the key columns. The smaller the key, the more
records can fit on those branch pages, thus resulting in a shallower tree depth and
quicker leaf lookup speed.
The b-tree for an index with an integer as the index key might look something like the
one shown in Figure 4 .
The bottom level is the leaf level, and the two levels above it are branches, with the
top level containing the root page of the index (of which there can be only one). In this
example, I'm only showing the key values, and not the actual data itself, which would
otherwise be in the bottom leaf level, for a clustered index. Note that the leftmost
intermediate level pages will always have an ø entry. It represents any value lower
than the key next to it. In this case, the root page ø covers values 1–16 while the
29
intermediate page ø covers the values 1–3.
The main point to note here is that the pages connect in multiple ways. On each level,
each page points to the previous and the next pages, provided these exist, and so acts
as a doubly-linked list. Each parent page contains pointers to the child pages on the
level below, all the way down to the leaf level, where there are no longer any child
page pointers, but actual data, or pointers to the actual data, in the case of a non-
clustered index.
If SQL Server wants to scan a b-tree, it just needs a pointer to the root page. SQL
Server stores this pointer in the internal sys.sysallocunits base table, which also
contains a pointer to the first IAM page tracking the pages belonging to a given object.
From the root page, it can follow the child pointers all the way until the first leaf-level
page, and then just scan the linked list of pages.
30
of the middle key. As there are only two keys, it will round up, look at the rightmost
key, holding the value 4, and follow the chain to the leaf-level page containing Keys 4
and 5, at which point it has found the desired key.
If, instead, the search is for the Key 22, SQL Server starts in the same way but this
time, after 17, inspects the middle key of all the keys higher than 17. Finding only Key
23, which is too high, it concludes that the page to which Key 17 points in the second
level contains the values 17–22. From here, it follows the only available key to the leaf
level, is unable to find the value 22 and concludes that no rows match the search
criteria.
After splitting the page, we now have three pages in the leaf level and three keys in the
root page. The page split is a costly operation for SQL Server in its own right,
compared to simply inserting a record on an existing page, but the real cost is paid in
terms of the fragmentation that arises from splitting pages. We no longer have a
physically contiguous list of pages; SQL Server might store the newly allocated page
in an entirely different area of disk. As a result, when we scan the tree, we'll have to
read the first page, potentially skip to a completely different part of the disk, and then
back again to continue reading the pages in order.
As time progresses and fragmentation gets worse, you'll see performance slowly
degrading. Insertions will most likely run at a linear pace, but scanning and seeking
will get progressively slower.
An easy way to avoid this problem is to avoid inserting new rows between existing
31
ones. If we use an ever-increasing identity value as the key, we always add rows to the
end, and SQL Server will never have to split existing pages. If we delete existing rows
we will still end up with half-full pages, but we will avoid having the pages stored
non-contiguously on disk.
It is surprisingly simple. Part of the beauty here was in designing a schema that doesn't
need any secondary indexes, just the clustered index. In essence, I'd designed a single
clustered index that served the same purpose as my thousands of separate databases,
but did so infinitely more efficiently.
Index design
There's one extremely high-impact choice we have to make up front, namely, how to
design the clustered index, with particular regard to the ViewDate column, an ever-
increasing value that tracks the date and hour of the page view. If that's the first
column of the clustered index, we'll vastly reduce the number of page splits, since
SQL Server will simply add each new record to the end of the b-tree. However, in
doing so, we'll reduce our ability to filter results quickly, according MagazineID. To
do so, we'd have to scan all of the data.
I took into consideration that the most typical query pattern would be something like
“Give me the total number of page views for magazine X in the period Y.” With such a
read pattern, the schema in Listing 7 is optimal since it sorts the data by MagazineID
and ViewDate .
While the schema is optimal for reading, it's suboptimal for writing, since SQL Server
cannot write data contiguously if the index sorts by MagazineID first, rather than by
ViewDate column. However, within each MagazineID , SQL Server will store the
records in sorted order thanks to the ViewDate and ViewHour columns being part of
the clustered key.
32
This design will still incur a page split cost as we add new records but, as long as we
perform regular maintenance, old values will remain unaffected. By including the
PageNumber column as the fourth and last column of the index key, it is also
relatively cheap to satisfy queries like “Give me the number of page views for page X
in magazine Y in period Z.”
While you would generally want to keep your clustered key as narrow as possible, it's
not necessary in this case. The four columns in the key only add up to 9 bytes in total,
so it's still a relatively narrow key compared, for example, to a 16-byte
uniqueidentifier (GUID).
The presence of non-clustered indexes or foreign keys in related tables exacerbates the
issue of a wide clustering key, due to the need to duplicate the complete clustered key.
Given our schema and query requirements, we had no need for non-clustered indexes,
nor did we have any foreign keys pointing to our statistics data.
Storage requirements
The MagazineStatistics table has two 4-byte integers, a 2-byte smallint , a 4-
byte smalldatetime and a 1-byte tinyint. In total, that's 11 bytes of fixed-length
data. To calculate the total record size, we need to add the two status bytes, the two
fixed-length length bytes, the two bytes for the null bitmap length indicator, as well as
a single byte for the null bitmap itself. As there are no variable-length columns, the
variable-length section of the record won't be present. Finally, we also need to take
into account the two bytes in the record offset array at the end of the page body (see
Figure 3 ). In total, this gives us a record size of 20 bytes per record. With a body size
of 8096 bytes, that enables us to store 8096 / 20 = 404 records per page (known as the
fan-out ).
Assuming each magazine had visitors 24/7, and an average of 50 pages, that gives us
365 * 24 * 50 = 438,000 records per magazine, per year. With a fan-out of 404, that
would require 438,000 / 404 = 1,085 data pages per magazine, weighing in at 1,085 *
8 KB = 8.5 MB in total. As we can't keep the data perfectly defragmented (as the latest
added data will suffer from page splits), let's add 20% on top of that number just to be
safe, giving a total of 8.5 + 20% = 10.2 MB of data per magazine per year. If we
expect a thousand magazines per year, all with 24/7 traffic on all pages, that comes in
at just about 1,000 * 10.2 MB = 9.96 GB of data per year.
In reality, the magazines don't receive traffic 24/7, especially not on all pages. As such,
the actual data usage is lower, but these were my “worst-case scenario” calculations.
Summary
I started out having no idea how to calculate the required storage with any degree of
accuracy, no idea how to create a suitable schema and index key, and no idea of how
SQL Server would manage to navigate my data. That one session on SQL Server
Internals piqued my interest in understanding more and, from that day on, I realized
33
the value in knowing what happens internally and how I could use that knowledge to
my advantage.
If this chapter has piqued your interest too, I strongly suggest you pick up Microsoft
SQL Server 2008 Internals by Kalen Delaney et al., and drill further into the details.
34
SQL Server Data Compression
Bob Pusateri
Data compression, one of the many features introduced in SQL Server 2008, can help
decrease the size of a database on disk, by reducing wasted space and eliminating
duplicate data. Additionally, data compression can increase the speed of queries that
perform high amounts of I/O, to and from disk, since each I/O operation can read or
write more data.
Numbers will always vary from one environment to the next but, at my organization,
the strategic deployment of data compression cut our overall database size on disk by
nearly 50% and the execution time for key queries and processes by an average of
30%.
Nevertheless, DBAs must not take lightly the decision of when and where to deploy
data compression; while its syntax is rather simple, the operations performed behind
the scenes are not. This chapter aims to give you a good understanding of how data
compression works so you can make an informed decision when deploying it.
Compression Basics
Data file compression technology has existed for many years and provides a simple
way to reduce the size of data, save storage space, and minimize transfer time when
sending data between locations.
However, database compression is far more challenging than standard file
compression because with a database it is critical that the compression does not
compromise data access speed, for both queries and modifications.
Simply compressing a SQL Server data file (typically files with .mdf or .ndf
extensions) with a compression technology such as ZIP would yield a good
compression ratio in many cases, but would degrade the performance of both reads
and writes. In order to read a file compressed in this manner, SQL Server would have
to decompress the entire file. If we also make changes, then it would subsequently
have to recompress the entire file. These operations take considerable time and CPU
cycles, even if we only need to change a few bytes of the file. Most of the work that
SQL Server does involves making small changes to large files, so compressing the
entire file, as described, would be rather inefficient.
35
compressing data pages individually instead of entire files. In fact, that is exactly how
SQL Server data compression works.
Benefits
Compressing data does not reduce a page's 8 KB size. Instead, SQL Server compresses
records stored on the page, which allows more records per page. SQL Server reads
data as pages, i.e. one read will retrieve one 8 KB page from disk or memory.
Therefore, with compression enabled, SQL Server requires fewer reads to return the
same amount of data.
SQL Server is an in-memory database and the data format in memory exactly matches
that used on disk. In other words, with data compression enabled, we have compressed
data, and so more data per page, both in the buffer cache and on disk. This makes both
logical I/O (reads and writes to memory) and physical I/O (reads and write to disk)
quicker and more efficient.
By fitting more data in memory, we can reduce the necessity for disk reads, since it
becomes more likely that the required data will already be in memory. Since logical
I/O is orders of magnitude faster than physical I/O, this can also boost query
performance relative to non-compressed data.
Typically, reads from disk account for a considerable portion of query execution time,
so fewer disk reads generally means much faster queries. This is particularly evident
for queries that involve large scans but is also advantageous for some seek operations.
Costs
Data compression and decompression takes place in an area of the SQL Server Storage
Engine known as the “Access Methods.” When a user issues a query, the Access
Methods code requests the required rows from the buffer manager, which will return
the compressed data pages from its cache, after loading them from disk if necessary
and will decompress these data rows before passing them off to the relational engine.
Conversely, when writing data, during an INSERT or UPDATE statement, the Access
Methods code will compress the rows it receives from the relational engine and write
the compressed pages to the buffer cache.
In other words, every time a query reads or writes a record, the Access Methods code
must decompress or recompress the record, on the fly. This requires additional CPU
resources and is the primary “cost” of using data compression.
For this reason, systems with high CPU usage may not be good candidates for data
compression. Nothing in computing is free and if you want the benefits of
compression then you will pay for them in terms of CPU utilization. Fortunately, the
reduced disk usage and performance increase that accompany compression, more than
compensate typically for this expense.
36
Types and granularity
SQL Server offers two compression settings or levels: row compression and page
compression . The former converts fixed width data types to variable width in order to
remove unused space from columns, and the latter (the more CPU-intensive of the
two) removes duplicate values across multiple rows and columns on a page.
We configure SQL Server data compression, not on tables or their indexes, but on the
partitions that comprise these tables and indexes. Of course, this doesn't mean we
must partition our tables or indexes in order to use compression, because a non-
partitioned object is simply one with a single partition. What it does mean is that we
can, if desired, have a different compression setting for each object partition, rather
than having to share a single setting for all. For example, a partition containing
infrequently accessed data can have a high level of compression (i.e. page), whereas a
partition containing frequently accessed records can have a lower setting (i.e. row), or
be uncompressed. We'll discuss this in more detail a little later in the chapter.
Since compression occurs on a per-partition basis, we can view compression settings
in the data_compression and data_compression_desc columns of the
sys.partitions catalog view. Each partition will have a value of NONE , ROW , or
PAGE .
37
licensing model for Enterprise Edition means that, unfortunately, you'll be paying
more than for previous versions.
There is also the “gotcha” of not being able to restore a database utilizing Enterprise-
only features onto any lesser edition of SQL Server. The restore will fail and, worse
yet, this failure will occur at the very end of the restore process, meaning you might
waste significant time before arriving at that result.
Row Compression
The first of the two available data compression settings is row compression . As its
name suggests, row compression compresses data within individual rows on a page.
Row compression can offer excellent compression ratios in many situations, and is less
resource intensive than page compression.
The primary purpose of row compression is to squeeze unused space from columns in
each row, whenever possible. Variable width types such as VARCHAR only consume the
space necessary to store a value, but fixed-width types such as CHAR and INT are, by
definition, the same size regardless of the stored value.
Row compression is able to save more space by utilizing a different record structure
and storing fixed width data types as if they were variable width. The data types
presented to users and applications outside of the database remain unchanged, but the
storage engine treats them differently behind the scenes. The best way to demonstrate
this is with a few examples.
Imagine a table that contains just one column, an integer. Being a fixed-width data
type, an integer column for an uncompressed row takes up 4 bytes of space no matter
what value it stores. With row compression enabled, SQL Server adjusts column sizes
such that they only consume the space necessary to store the value in a particular row.
As shown in Figure 1 , the values NULL and 0 will consume no space at all with
compression enabled (more on how this works later). The value 1 requires only one
byte, saving three unused bytes. The value 40,000 requires three bytes, yielding a
saving of one byte.
Character types can also benefit from row compression. Let's envision an
uncompressed table with two columns, an ID column of type SMALLINT , and a Name
column of type CHAR(8) . Since a SMALLINT requires two bytes and CHAR(8)
requires eight, each row requires ten bytes for data, as shown in Figure 2 .
38
Figure 2: Two-column table, uncompressed.
With row compression enabled, our table looks very different, as shown in Figure 3 .
The ID values for Sam and Michelle are less than 128, meaning they would fit into a
single byte instead of two. Nick's ID is large enough that it requires both bytes. As for
the Name columns, Michelle requires all eight bytes, while Sam and Nick have unused
bytes that can be removed from the end.
Where all three rows previously required ten bytes each, with row compression
enabled the first row now consumes only four, the second row uses six, and the third
uses nine, a net space saving of over 36%, for the three rows combined.
As a side effect, the columns no longer begin at specific offsets. In the uncompressed
example, the first two bytes of each row are part of the ID column and the third byte is
the beginning of the name. In the compressed version, the name column might start at
the second byte if the ID value is small enough, or even the first byte if the ID value is
0 or NULL . Since fixed-width columns are now essentially variable width, no longer
can SQL Server use offsets to find these columns. Instead, the compressed record
structure contains column descriptors that store the exact size of each variable width
column, so that SQL Server knows where each column starts. For this service, column
descriptors consume 4 bits per column per row. The special cases of NULL and 0 values
are also stored in the column descriptor, which explains why those values can consume
zero bytes.
As demonstrated in the previous examples, compression ratios will vary, based on both
the value of the data stored and the data type of the column that is storing it. Storing
five characters in a column of type CHAR(10) will see 50% compression, while
storing those same five characters in a CHAR(100) column will achieve 95%
compression. Chances are good with row compression that choosing an appropriate
data type, when designing a table, will lead to lower compression ratios. Conversely,
some values will consume all the space allocated to their column and thus
compression will have no effect on them at all.
39
Another feature of row compression is Unicode compression , made available in SQL
Server 2008 R2. Unicode compression utilizes the Standard Compression Scheme for
Unicode (SCSU) to reduce the number of bytes required to store Unicode text. It
supports the NCHAR and NVARCHAR data types, with specified lengths, but not
NVARCHAR(MAX) . For most Western languages, it results in a 50% saving, so each
character will require only one byte instead of two. In the event that Unicode
compression does not provide any advantage, SQL Server will not apply it.
The majority of SQL Server data types can benefit from row compression when the
values stored in them allow for it. Notable exceptions are LOB (Large Object) types
such as TEXT/NTEXT , IMAGE , NVARCHAR(MAX) , XML , or FILESTREAM . A more
complete list of data types affected by row compression is available in Books Online
at HTTP://MSDN.MICROSOFT.COM/EN-US/LIBRARY/CC280576.ASPX .
Page Compression
While row compression squeezes unused space out of each column on a row-by-row
basis, page compression takes a more drastic approach and removes duplicate values
across multiple rows and columns on a page. This method can yield higher
compression ratios than row compression but, because of its relative complexity, it
also utilizes more CPU.
Page compression comprises three operations. First, SQL Server performs row
compression as described in the previous section. Second is column prefix
compression , and finally page dictionary compression .
Figure 4 shows a data page, containing three rows and three columns, which has
already been row compressed.
Figure 4: Data page before column prefix compression has been applied.
Let's see how the two page-compression operations affect the structure of this data
page.
40
After row compression completes, SQL Server applies the first page compression
operation (column prefix compression), which examines the values in a column and
tries to find the prefix shared by the greatest number of values. This operation looks at
the byte patterns stored, so it doesn't matter what data type each column contains. The
term “prefix compression” refers to the fact that SQL Server considers pattern matches
only when reading the bytes in order starting at the beginning. For example, consider
the following two values in hexadecimal notation:
• 0xAABC1234
• 0xAABCC234
The common prefix is “0xAABC” . Even though both values share the fourth byte
“34” , it is not part of the prefix because the third bytes “12” and “C2” differ.
If SQL Server finds a common prefix for the data in the column, the largest value
containing that prefix is stored in a special record known as the anchor record ,
located right after the page header in a region known as the Compression
Information structure , or CI structure for short. The anchor record is no different
from any other record on the page, with the exception that it is marked as being the
anchor, and so SQL Server does not return it with query results. If the column data has
no common prefix value, then a NULL value is stored in the anchor record. Figure 5
shows the anchor record for the first column populated with the value “DABBCA” ,
since this is the largest value containing the prefix, “DA” , found in two of the values
in this column.
Figure 5: Data page after the anchor record has been determined.
Having determined the anchor record, SQL Server adjusts the values stored in the data
rows so they are relative to the anchor record. For any data rows that share a common
prefix with the anchor row, it replaces the shared bytes with a number indicating how
many prefix bytes should come from the anchor. For example, in Figure 5 , the
compressed form of the first record, DAADA , is 2ADA .
If a row value does not match the prefix of the anchor value, SQL Server adds a “0” to
the stored value, indicating that the actual value is composed of the first 0 bytes of the
41
anchor record followed by the row value. Therefore, in the second row of the first
column, the value AB becomes 0AB .
For any rows that match the anchor record exactly, such as the third row in the first
column, SQL Server replaces the row value with a special “pseudo-null” value, which
is different from a standard NULL . SQL Server knows that this pseudo-null simply
means that it should copy the anchor value in its entirety.
Figure 6 shows the result of applying column prefix compression to all of the columns
in the table.
42
Figure 7: Data page with page dictionary compression completed.
43
choice between row and page compression; there is no single best practice for
deploying compression. Hardware resource constraints, datasets, and workloads can
profoundly affect the outcome and performance of both row and page compression,
even if all the basic litmus tests indicate it should work.
Ultimately, the safest method for rolling out compression involves rigorous testing in a
development environment followed by a careful, heavily monitored deployment to
production. It is not unheard of to compress tables individually, first with row
compression and then escalate to page compression, if warranted.
When considering either type of compression, the three most important considerations
are as follows:
• C ompression ratio vs. CPU overhead – if the data doesn't compress well, then
you'll waste extra CPU cycles on compressing and decompressing the data, only to
receive little or no space-saving benefit in return.
• Overall performance impact of compression – you'll want to confirm the impact
of the CPU overhead, via monitoring, and see some positive benefits in terms of
reduced disk reads and writes. Query response times should be equal or better than
their pre-compression values. If compression reduces performance, it is not worth
the cost.
• Data usage patterns – the read-write ratio of a partition may dictate the best type
of compression.
44
they state that, in their experience, row compression results in a CPU overhead of
10%, or less, and suggest:
“…if row compression results in space savings and the system can accommodate a 10 percent increase in
CPU usage, all data should be row compressed. ”
While my findings for CPU overhead align broadly with SQLCAT's, I wouldn't go out
and blindly row-compress all my data; the space savings for some tables is so small
that it is not worth the trouble. However, if your testing suggests a table's size will
shrink by 20% or more, and the CPU capacity is available, I see no problem in
applying row compression, combined with careful monitoring to ensure the database
continues to perform as expected.
Page compression can result in greater space savings than row compression, but higher
CPU overheads. I don't even consider page compression unless it can result in at least
a 15 to 20% advantage over row compression. Even if this is the case, page
compression might still not be appropriate.
Due to the associated CPU overhead, SQLCAT's advice for page compression is more
cautious. They recommend first applying page compression to tables and indexes that
are utilized less frequently, in order to confirm system behavior, before moving to
more frequently accessed ones. They advise page-compressing all objects across the
board only for large-scale data warehouse systems with sufficient CPU resources. I
still prefer to estimate the benefits of compression and apply it where warranted rather
than enabling it everywhere. Even if your system has a tremendous amount of CPU
power to spare, the fastest operations are the ones you never perform.
Performance monitoring
There's a lot of great third-party software available for keeping an eye on CPU and
other performance-related aspects of your servers. Many of these tools keep statistics
over time, which is great for benchmarking and finding changes in performance
trends. If you don't have a tool like that, Perfmon is a good alternative that's included
with Windows and won't cost you anything. Brent Ozar has an excellent tutorial on his
website, explaining how to use Perfmon to collect SQL Server performance data over
time (see: HTTP://BRENTOZAR.COM/GO/PERFMON ).
When monitoring the effects of data compression, I would pay particular attention to
the following counters:
• % processor time – since CPU is the resource with which we're most concerned,
you'll want to watch out for anything more than a modest increase in processor
utilization.
• Disk Reads / sec and Disk Writes / sec – ideally, you'll see a decrease in
disk activity since SQL Server should be reading and writing fewer pages to and
from disk.
I hope that you will already have a benchmark of your database and application
45
performance with which to compare your post-compression results. If not, you can use
Brent's method to record performance metrics for a few days before starting to
compress objects and then compare these values to metrics recorded after
compressing, to see what changed.
46
Listing 1: Calculating index update and scan ratios.
SQL Server uses a different record format for compressed data, so the only way to
change compression settings on an existing table or index is to perform a rebuild.
Listing 3 rebuilds dbo.Sales to use page compression.
Changing the compression setting for a table affects only its clustered index or heap.
Other objects such as non-clustered indexes never inherit compression settings. If we
create a non-clustered index on a compressed table, that index will be uncompressed,
unless we explicitly apply data compression.
47
SELECT OBJECT_NAME(i.object_id) AS TableName ,
i.name AS IndexName ,
partition_number ,
data_compression_desc
FROM sys.partitions AS p
INNER JOIN sys.indexes AS i ON i.object_id = p.object_id
AND i.index_id = p.index_id
WHERE i.object_id = OBJECT_ID('dbo.Sales');
GO
Listing 4: Non-clustered indexes do not inherit compression settings from the table.
We must specify the compression setting explicitly on creation, or rebuild the index.
48
/* re-run sys.partitions query from Listing 4
TableName IndexName partition_number data_compression_desc
-----------------------------------------------------------------------
Sales NULL 1 ROW
Sales IDX_Sales_Price 1 NONE
*/
Our data is now partitioned so that older, less frequently accessed data is page
compressed for maximum space savings, while the current data is row compressed, to
strike a balance between space savings and performance. We can confirm this by re-
49
running our sys.partitions query from Listing 4 on dbo.Sales_Partitioned .
50
Encrypting the data first would eliminate repetitive patterns, meaning data
compression would have little left to work with. Conversely, compressing the data
before encryption means that data compression would be able to take advantage of
duplicates and repeating patterns before encryption eliminates them. SQL Server uses
the latter method so that data compression and TDE can coexist. SQL Server TDE
encrypts data only when writing it to disk, and decrypts it when reading from disk into
memory. Data compression, on the other hand, keeps data in a compressed state both
on disk and in memory, decompressing it only in the Access Methods as discussed
earlier. This design means that TDE is transparent to data compression.
The fact that data compression and TDE can coexist doesn't mean that combining
them is a simple decision. Doing so requires careful testing and monitoring to ensure
there are no negative impacts to end-users. CPU resources should be substantive in
servers requiring both data compression and encryption, as the load of performing
both operations will fall on the processor. In my opinion, we should only enable TDE
when strictly necessitated by business requirements and/or legislation. Transparent
Data Encryption is only available in the Enterprise Edition.
Summary
The data compression options available in SQL Server 2008 and later are a powerful
tool for reducing the size of a database and improving query performance in certain
situations. Data compression comes at the cost of increased CPU utilization. However,
if the appropriate form of compression is enabled on objects that can benefit from it,
the performance gains observed will greatly outweigh their cost.
51
Verifying Backups Using Statistical
Sampling
Shaun J. Stuart
Your database backups are only valid when you've proved you can restore the database
This little nugget of wisdom should be foremost in the mind of every DBA. Yes, you
created the backup using the WITH CHECKSUM option and, once the backup was
complete, you ran a RESTORE VERIFYONLY to validate that the backup was good. The
problem is that this confirms only that no corruption occurred between the time the
SQL Server wrote the data to disk and reading it for your backup. It will not detect any
corruption that happened to the data in memory after data was updated, but before
SQL Server hardened the data to the database file. In addition, many mishaps can
befall your backup files, after you've taken and verified them. For example:
• A subsequent problem with the I/O subsystem corrupts the backup file.
• A developer makes a backup of a production database, to restore to a test system,
and forgets to use the COPY_ONLY option, thus breaking your backup chain.
• Even worse, a developer creates the backup on an existing backup set and uses the
WITH INIT option, wiping out all previous backups in that set.
• A mistake made when setting up the automated file deletion routine means it is
deleting full backups that you might still need to restore later differential backups.
If a backup file is missing, won't restore because it is corrupt, or will restore but
contains corrupted data, then the DBA needs to find out about this before he or she
might need it for disaster recovery. The only way to do this, and simultaneously to
offer reasonable levels of confidence in the validity of the backups, is to restore the
database backups to a test server on a regular basis, and perform DBCC CHECKDB tests
on the restored copy.
Performing manual test restores might be manageable with five servers, but it doesn't
scale to tens or hundreds of servers. In this chapter, I offer a technique that, using the
power of statistical sampling , allows you to restore a small subset of your backups
and report to your boss that “with 95% confidence, our backups are valid. ” If that
percentage isn't good enough for your boss, you simply need to nudge that percentage
higher, by performing more test restores, and I'll show you how to determine how
many backups you need to test to obtain the desired confidence level.
52
Many DBAs tend to believe that the chances of database or backup corruption are
remote. Sixty-five billion neutrinos from the sun pass through Earth every second, but
most never touch a thing. The chances seem slim that one will hit a small magnetic
particle in your storage array, changing it from a one to a zero, and corrupting your
file. The chances, though, are somewhat higher that the deliveryman accidentally
knocks his hand truck into the SAN rack, while delivering your new blade server,
causing a read/write head to bounce across the surface of a couple of platters, wiping
out several sectors of bits.
Corruption is not a rare event. Consider Paul Randal's response to a reader who,
having never personally encountered the problem in ten years, questioned how often
corruption occurs in the real world:
“Hundreds to thousands of times every week across the world, in the tens of millions of SQL Server
databases…Every single week I receive multiple emails asking for some advice about corruption
recovery…I expect every DBA to see database corruption at some point during their career.
– Paul Randal (HTTP://WWW.SQLSERVERCENTRAL.COM/ARTICLES/CORRUPTION/96117/ )
You do not want to find out about a corruption problem with your backups at the time
you actually need to restore a database to a production system. Usually, this is a high-
pressure situation. Something bad has happened and you need to fix it. The clock is
ticking, you'll likely have at least one manager looking over your shoulder, and you
need to get the database back online now . If you can't, you may find yourself smack in
the middle of an R.G.E. (Résumé Generating Event).
To protect against this, you need to perform regular test restores of your backups.
53
need to restore, randomly, in order to attain that confidence level in the validity of all
backups on that server.
Armed with this magic number, there are many ways to implement an automated
restore routine, depending on the tools at your disposal. In my case, I simply run a
linked server T-SQL script to retrieve a list of all the databases and backup files from
my servers, and assign a number to each file to establish the correct restore sequence.
My automated restore routine then uses the T-SQL RAND function to select, randomly,
the required “magic number” of databases, restores them to the test server, runs DBCC
CHECKDB and then drops the restored databases.
Planning Considerations
In this section, I'll review the planning considerations such as necessary hardware,
impact of compression and encryption technologies and so on.
You'll need a test server, with adequate drive space, onto which to restore your
databases. I've found virtual machines make good candidates for this. You don't need a
powerful machine with tons of RAM or fast processors, but you will likely need lots of
disk space, at least enough to hold your biggest database. If you run an environment
with multiple versions of SQL Server, your test server should have the latest version,
since you can restore backups from older versions of SQL Server to a newer version,
but not the other way around.
Ideally, your backup file repository will be available on a network share, or some other
shared device that all your SQL Servers and your test server can access. If not, you'll
need some way to move those backup files to a location that the test server can access.
If you are using any third-party backup tools that provide compression or other
features, you'll need to make sure the test server has that software installed as well. If
some servers use third-party backup and others use native SQL Server backups, you'll
also need to devise some way to differentiate between the two (e.g. based on file
name, though there are other ways).
If you are using Transparent Data Encryption, then in order to restore a TDE-protected
database to a test server, you will need to restore the service master key to the test
server.
54
• Retrieve a list of databases and their most recent backup files.
• Select, randomly, the required “magic number” of databases to restore – this
number comes from the statistical sampling stored procedure.
• For each selected database, perform the test restore, run DBCC checks and drop the
restored database.
It sounds simple enough, but there are many different parts to consider. The code
download for this chapter includes an example backup verification stored procedure,
called up_Backup_Verification_Process , which performs all the above steps,
for verifying full backups only. I won't present all of the code in this chapter, but I will
explain all the major parts, and describe the considerations for extending it to include
testing of differential backups, as well as point-in-time restores. Included in the same
code file are the required supporting tables to store the server list, store databases we
don't wish to test, store configuration details such as the restore paths, and store the
results of the tests, including errors.
55
proper restore order. Obtaining any information from the file system is a weak point of
T-SQL and so this type of information gathering typically involves VB scripting or
PowerShell.
CMS and MOM
If you use a Centralized Management Server or Microsoft Operations Manager, you can pull database
information from multiple servers from its tables.
Currently, I use a T-SQL approach. I have a linked server query (SSIS is an alternative,
if you prefer to avoid linked servers) that cycles through the server list, grabs the
backup history of all the databases from each server, and writes it into a table.
The T-SQL script in Listing 1 demonstrates this approach, querying the backupset
table to retrieve the last full and incremental backups for each database. It provides the
name of the databases that have been backed up on that server (database_name
column), the type of backup (type column), and start and end times of the backup
(backup_start_date and backup_finish_date columns). I leave the code to
return the last several transaction logs as an exercise for the reader. As discussed, it
adds a little complexity, as we can't simply select the oldest one. We need to select all
log backups dated later than the last full or differential backup. This is easy enough to
do by searching the backupset table, ordering by the backup_finished_date
column and looking for entries with an “I” (for differential backup) or “L” (for log
backup) in the type column that were taken after the most recent entry with a “D” (for
full backup).
Note that this code specifically excludes the master database (you may also want to
exclude msdb ). If you want to include these, be sure to rename the databases during
your restore process to avoid overwriting the system databases on your test platform!
However, later I'll explain why creating a copy of master during a test restore will
thwart any DBCC checks you'd like to run on the restored database.
SELECT a.server_name ,
a.database_name ,
b.physical_device_name ,
a.[type] ,
a.backup_finish_date ,
LogicalFile = d.logical_name ,
d.File_number
FROM msdb.dbo.backupset a
JOIN msdb.dbo.backupmediafamily b ON a.media_set_id = b.media_set_id
JOIN ( SELECT backup_finish_date = MAX(c1.backup_finish_date) ,
c1.database_name ,
c1.server_name
FROM msdb.dbo.backupset c1
JOIN msdb.dbo.backupmediafamily c2
ON ( c1.media_set_id = c2.media_set_id )
WHERE c1.type IN ( 'D' )
GROUP BY c1.database_name ,
c1.server_name
) c ON a.backup_finish_date = c.backup_finish_date
AND a.database_name = c.database_name
JOIN msdb.dbo.[backupfile] d ON a.backup_set_id = d.backup_set_id
AND d.file_type = 'D'
56
WHERE a.type IN ( 'D' )
AND a.database_name NOT IN ( 'master' )
AND a.is_copy_only = 0
Listing 1: Retrieving full and differential database backups for a test restore.
One other peculiarity to note is that SQL Server writes to the backupset table when
we restore a database. This can lead to problems when we move a database from one
server to another using a backup/restore process. The restore process on the
destination server will result in an entry in the backupset table with a server_name
value of the source server. Therefore, you may want to modify Listing 1 to verify that
the server_name field matches the actual server to which you're connected.
A query such as this appears in the example up_Backup_Verification_Process
stored procedure, adapted to write the results into a temporary table, called
##Backup_Verification_Restore , from which we later choose random restores.
For each server, we create a linked server if one doesn't already exist. You may need to
create some logins on the various servers to allow your test server to connect and grab
the backup history from the system tables. Typically, I store the linked server login
details, along with other important details, such as the path to which to restore the
database and log file, in a dedicated table (see the table
Backup_Verification_Configuration in the code download).We then retrieve
the database and file details, as shown Listing 1 , and drop the linked servers. If any
errors occur in retrieving the data and file details, I write the error into the
Backup_Verification_Log table.
57
Finally, there may be databases that you wish to exclude from test restores, for some
reason. For example, a database might be too big to make a test restore viable with the
current disk space, so you want to exclude it until more space becomes available.
For such cases, I maintain an exclusion list in a table called
Backup_Verification_Exception_List . Having retrieved into
##Backup_Verification_Restore the full list of databases for each server, I
simply have code that deletes from this table any that appear in the current exclusion
list.
One way to pick, at random, the right number of databases to test, is to include a
column in the backup file list table that assigns a number to each file using the
NEWID() function. You can then sort by that number, which functionally gives you a
randomized list, and just select the number of files you need by using the TOP clause in
a SELECT statement (alternatively, we can generate random numbers between one and
the maximum number of entries, using the T-SQL RAND function, and select the
matching rows).
How do you determine how many to test? That's where the magic statistical formula
comes into play, and I'll get to that in the next section.
Your routine will then need to restore the correct backup files for the selected database
and restore them in the correct order. For example, if you want to perform a test
restore for the last differential backup, the routine must retrieve the last full backup
prior to that differential, and restore that full backup, with the NORECOVERY option,
before restoring the differential. Testing transaction log backups will require similar
logic, according to the following pseudo-code, in order to identify the prior full
backup, most recent differential backup, and subsequent transaction log backups up to
the one you want to test.
An important consideration during the test restore is handling failures. You have two
options:
58
• On failure, let the whole procedure fail and send out an alert.
• Trap the failure and continue to the next backup file.
The first method is easiest, but has a rather obvious problem. If you need to test 30
backups to reach your desired confidence level and the first one fails, it means that
your routine will not test the next 29 backups.
The second option is the more robust solution. You'll need to trap for any errors
encountered during the process. The TRY -CATCH block, introduced in SQL Server
2005, helps here. You can log any errors to a table, then move on to restoring the next
backup file.
Having tested the required number of backups, you can then report off your logging
table. The next step, optional but highly recommended, is to run a DBCC CHECKDB test
against the restored database. SQL Server modifies data in memory and memory is
flushed to disk only periodically (during checkpoints), so it is possible that the data
can become corrupted in memory before it is written to the disk. In this case, the
database would have no corruption that a backup checksum could detect, but it would
have logical corruption, which the DBCC CHECKDB process would detect.
Having already restored the database, running an additional DBCC CHECKDB test is
relatively easy; it adds minimal steps to the process, and adds another level of
robustness to your verification process. This method is especially helpful when dealing
with very large production databases whose maintenance window might not be wide
enough to allow time for a full DBCC CHECKDB to complete.
If you do decide to incorporate a DBCC check, you'll need to exclude the master
database from the check. Earlier, I advised either excluding the master database from
the list of databases to test, or restoring the backup of the master database as a user
database with a name other than master (otherwise you will overwrite the master
database on your test server and really mess things up).
If you go down the latter route, and then run DBCC against your newly restored and
renamed copy of master , it will report all kinds of errors. The master database has a
special page that contains configuration information specific to the SQL Server
instance from which it originated. This special page can only exist in the master
database on a server and DBCC will report errors if it finds it in any user databases.
In short, if you include the master database in your backup plans and your backup/
59
restore testing, then exclude it from the DBCC checks on your test server. For the
master database, you'll always need to run DBCC on the live copy.
Finally, your routine should drop the restored databases from your test server.
Sample size
The sample size is the required number of people in the survey. In our case, the sample
size is the number of backups we need to test restore, in order to obtain a certain
confidence level, given a certain margin of error , response distribution , and
population size (all terms we'll discuss shortly).
Confidence level
This number represents how closely our sampling results represent the entire
population. When we say we are 95% confident that the majority of people prefer
vanilla ice cream to chocolate, our confidence level is 95%. This means we are
confident that the results of our sampling will mirror those of the entire population
95% of the time.
In other words, if we asked a sample size of 100 people if they preferred vanilla ice
cream or chocolate, and 70% said vanilla and we have a 95% confidence level, then if
we repeatedly ask another random 100 people the same question, 95% of the time the
respondents would favor vanilla 70% to 30%, over chocolate.
Margin of error
We are not sampling every person in the world, so we need to understand the margin
of error (or confidence interval) on our confidence level, due to sampling. In other
words, because our sampling is picking random people, there is the possibility that we
will pick randomly a group of people whose tastes do not mirror the tastes of the entire
population. The margin of error is a reflection of this possibility.
For example, if we say we have a margin of error of 1%, what we mean is that if we
60
ask many groups of people which ice cream flavor they prefer, 1% of the time we will
get a group whose response does not match the population as a whole. Using our ice
cream example and a margin of error of 1%, if we asked 500 groups of 100 people
which ice cream flavor they prefer, 5 of those groups might have 100% of the people
preferring vanilla over chocolate, instead of the average 70% of the other groups.
I use an error margin of 1% for my backup testing purposes, but you can adjust that to
suit your needs.
Response distribution
This one can be confusing. This number indicates how your results would vary if you
repeated the entire sampling process many, many times. Suppose your first poll of 100
people resulted in 10% of the people favoring vanilla over chocolate ice cream, but in
each of the next 9 polls, each with 100 different people, 70% preferred vanilla. If you
based your predictions only on the first poll, you would reach a much different
conclusion than if you looked at all 10 polls. You can look at response distribution as a
measure of the “spread” between your various poll results. If each poll returned
exactly the same results, your response distribution would be zero, but some polls
might return a 69% vanilla over chocolate ratio, some might return 72% vanilla over
chocolate, and so on. If, as in the example above, there are variations in results
between polls, response distribution is a measure of that variation. As you might
expect, this value, confidence level, and margin of error are closely related.
When it comes to sampling for quality checking, we can regard response distribution
as the failure rate (this isn't strictly true but, for our purposes, we can think of it this
way). The higher the response distribution, the larger the sampling size must be in
order to attain the required confidence level.
The most conservative setting for response distribution is 50% because that results in
the largest sample size (technically, 100% would give the largest, but if every one of
your samples is failing, you have larger problems than statistical sampling can fix).
Again, though, if our backup failure rate really were 50%, we would surely know
about it because backups would be failing left and right. If you're creating the backups
using the WITH CHECKSUM option, performing a RESTORE VERIFYONLY for basic
backup verification, and you have alerts configured for backup job failures, out of disk
space events, and other common causes of backup problems, then your true failure rate
is going to be much lower.
In my tests, I use a value of 0.1% (that's 0.001, or one failure in one thousand).
Population
This one is straightforward. Population is the total group size from which we are
pulling our random samples. If we are trying to figure out what percentage of a city's
residents prefer vanilla ice cream to chocolate, our population number is the city's
61
population.
In our backup testing case, the population is the total number of backups from which
we are pulling our random samples.
I don't know about you, but I wouldn't want to translate that into T-SQL. Luckily, we
can avoid almost all of that stuff. Shanti Rao and Potluri Rao have created a website
(HTTP://WWW.RAOSOFT.COM/SAMPLESIZE.HTML ) that lets us experiment with all of our
four variables (confidence level, margin of error, response distribution, and
population) to see how they affect our sample size.
Their page uses Java and, if we examine the web-page source code, we can see they
are performing their calculations using a numerical approximation. If you remember
your early calculus classes, this is similar to approximating integrals using Riemann
sums or trapezoidal approximations. I'm not going to go into an explanation of how
the approximation was derived because that is outside our scope and, really, it's not
knowledge that is needed by the typical DBA (also, I don't want to put anyone to
sleep).
In Listing 2 , I've translated their Java source into a T-SQL stored procedure. Java
supports arrays whereas T-SQL doesn't, and that makes things a bit uglier.
62
Example usage:
DECLARE @SampleSize FLOAT
EXEC up_GetSampleSize @ResponseDistribution = .1,
@Population = 3000,
@ConfidenceLevel = 95,
@MarginOfError=1,
@SampleSize=@SampleSize OUTPUT
-- SJS
Shaunjstuart.com
9/7/11
*/
SET NOCOUNT ON
DECLARE @Y FLOAT
DECLARE @Pr FLOAT
DECLARE @Real1 FLOAT
DECLARE @Real2 FLOAT
DECLARE @P1 FLOAT
DECLARE @P2 FLOAT
DECLARE @P3 FLOAT
DECLARE @P4 FLOAT
DECLARE @P5 FLOAT
DECLARE @Q1 FLOAT
DECLARE @Q2 FLOAT
DECLARE @Q5 FLOAT
DECLARE @ProbCriticalNormal FLOAT
63
SET @d1 = @ProbCriticalNormal * @ProbCriticalNormal
* @ResponseDistribution * ( 100.0 - @ResponseDistribution )
SET @d2 = ( @Population - 1.0 ) * ( @MarginOfError * @MarginOfError )
+ @d1
IF @d2 > 0.0
BEGIN
SET @SampleSize = CEILING(@Population * @d1 / @d2)
END
SELECT @SampleSize;
The procedure takes our first four variables as inputs and returns the resulting sample
size as an output parameter.
Of course, if you have a reasonably small population size (i.e. a small number of
databases), then a single failure will have a greater impact on your confidence level,
and so attaining a higher confidence level will mean testing a higher percentage of
your population. In other words, you won't avoid as many test restores as you might
hope!
Nevertheless, by incorporating this procedure into your automated backup testing
routine, you can always be sure you are sampling enough backups to meet your
required confidence level, no matter how your backup population grows.
Summary
From using them to populate test and development servers to safeguarding against
disasters, a company relies on backups for its operation and survival. They are perhaps
the most critical files DBAs work with. Give them the respect they deserve.
64
Performance Tuning with SQL Trace
and Extended Events
Tara Kizer
We can gather information about activity on our SQL Servers in numerous ways, such
as using Dynamic Management Views (DMVs), Performance Monitor, Extended
Events and SQL Trace. Due to the level of detail it provides and its relative ease of
use, the latter is one of my most-used tuning tools due to the level of detail it provides
and its relative ease of use. SQL Trace allows us to collect and analyze data for
different types of events that occur within SQL Server. For example, SQL Server
acquiring a lock is an event, as is the execution of a statement or stored procedure, and
so on. We can find our worst performing queries, see how often a query executes,
check what exceptions and other errors are occurring, and much more.
The SQL Trace feature comes with a dedicated GUI called SQL Server Profiler,
integrated into SSMS, which we can use for client-side tracing. We'll cover Profiler
only briefly, mainly to explain the overhead associated with event data collection with
this tool, and how this can degrade the performance of the server on which you're
running the trace.
For this reason, I perform all my tracing server-side, using SQL Trace, accessing the
tracing system stored procedures directly from scripts, rather than the GUI, and storing
the event data to a file on the server. We'll dig into SQL Trace in detail, covering the
SQL Trace system stored procedures, how to create, start, and stop a server-side trace,
how to import the collected data, and some tips on how to analyze the collected data.
The chapter wraps up by offering a first “stepping stone” in the process of migrating
from SQL Trace to Extended Events, the successor to SQL Trace. It summarizes some
of the advantages of Extended Events and challenges, compared to SQL Trace, and
demonstrates a way to migrate existing traces over to Extended Events.
65
Nevertheless, SQL Trace is still in SQL Server 2012, and will be in SQL Server 2014,
and I consider it a viable and valuable tool for performance diagnostics, at least over
the next 3–4 years. I still find it one of the simplest diagnostic tools to use and, if you
follow the practices in this chapter, you can minimize overhead to tolerable levels for a
wide range of traces. In addition, SQL Trace is the only supported way to replay
workloads (see HTTP://MSDN.MICROSOFT.COM/EN-US/LIBRARY/MS 190995. ASPX ) and,
unlike Extended Events (currently at least), it integrates well with other tools such as
PerfMon. With SQL Trace, we can synchronize a captured and saved trace with
captured and saved PerfMon data.
Of course, in the meantime, we will need to start learning Extended Events and
planning for the eventual migration of our existing tracing capabilities.
66
stores the event data temporarily in the trace buffers for the rowset I/O provider, from
where the Profiler client consumes it. Each active SPID on the instance requires an
exclusive latch to write the event data. The Profiler client often consumes the data at a
slow rate, row by row, over a network. As a result, if a trace produces many events,
queues can form as SPIDs wait for a latch, or simply some free space in the trace
buffer, to write their event data, and this can affect the overall performance of the SQL
Server instance.
We can minimize Profiler's impact by following these five rules or best practices:
1. Never run Profiler on the database server – run it on another machine that has
the SQL client tools installed, making sure that this machine has a fast network
connection.
2. Save the trace data to a file, not a table – writing the data to a table is much
slower than to a file and simply compounds the previously described problem. In my
early days, I once brought the performance of a SQL Server to its knees simply by
running a Profiler trace that wrote directly to a table.
3. Filter the results carefully, such as “Duration > 1000” – events that fail to
meet the filter will still fire, but we avoid the overhead of saving the data and writing
it over to the client.
4. Limit traces to only the most relevant events – for example, when investigating
slow queries, you may only need SP:Completed and SQL:BatchCompleted .
5. Try to avoid collecting events that will fire a lot – such as the lock- and latch-
related events.
However, even if you follow all of these guidelines, the intrinsic overhead of writing
event data to Profiler can still be too high and we can avoid it if, instead of using
Profiler, we run a server-side trace . Rather than operating SQL Trace, and consuming
the data through the GUI, we script out the trace definitions and run them directly on
the server.
Of course, Profiler still has its uses. We can use it to generate the trace definitions
quickly and easily and then export them to server-side traces that we can run on our
production instances. Let's use Profiler to create a simple trace definition for capturing
events relating to SQL batch and stored procedure execution that we can use to
investigate slow and expensive queries. We'll script out and save the trace definition as
a .sql file.
• Start Profiler and connect to the instance you wish to trace (using a login with
sysadmin permissions).
• From the Windows Start menu:
• select Program Files | Microsoft SQL Server 2008 R2 | Performance Tools
• select File | New Trace and connect to the target server.
• From SSMS:
• select Tools | SQL Server Profiler
67
• connect to the target server.
• Set the required trace properties ( General tab).
• Give the trace a name, such as SlowQueries .
• Choose the required template, the Blank template in this case.
• Pick an output destination, if desired. This is not necessary in this case, but if you
were planning to run this as a Profiler trace:
• always choose a file destination, and set the properties as required (more on this
in the next section)
• always select the Server processes trace data box so that SQL Server rather than
Profiler processes the trace data. This helps reduce trace overhead and avoids
possible event loss when running Profiler on an already heavily loaded instance.
• Check the Enable trace stop time box, but don't worry about its value right now
(more on this in the next section). Setting a stop time allows us to schedule a trace
to run via a job.
• Select the event classes ( Events Selection tab).
• In the Stored Procedures category, select the RPC:Completed event class.
• In the T-SQL category, select the SQL:BatchCompleted event class.
• Uncheck the Show all events box so Profiler displays only these two events.
• Select the data columns.
• By default, the template will collect all available data columns for each of the
event classes. To minimize the overhead of running the trace, collect only those
columns that you really need to diagnose the problem. For the purpose of this
trace, select only the following eight columns: CPU , Duration , LoginName , Reads
, SPID (a required column), StartTime , TextData , Writes . I normally also add
DatabaseName , ObjectName , ApplicationName , EndTime , and HostName ,
but I omitted those here for simplicity.
• Uncheck the Show all columns box so Profiler displays only the selected columns.
68
• Click Run to start the trace and then stop it immediately (you do not need to wait
for any events to appear in the trace).
• In the File menu, click Export , Script Trace Definition , and then For SQL Server
2005–2008 R2 .
• Save the trace definition to a file (e.g. SlowQueries.sql ) on the server that you
wish to trace.
We're now ready to examine the trace definition, modify it as required, and run it as a
server-side trace.
Server-side trace
Event data collection with server-side trace is still not “zero overhead.” For example,
the late-filtering problem hinted at in Rule 3 in the previous section means that SQL
Server still incurs the overhead of firing an event, and collecting all the necessary
event data, even if that event does not pass our trace's filter criteria. Extended Events
(discussed towards the end of the chapter) addresses this overarching problem with the
SQL Trace architecture.
However, if we use server-side tracing, and apply Rules 2–5 above, we can lessen the
observer overhead dramatically. If you need further convincing, Linchi Shea has done
some testing to prove this (HTTP://TINYURL.COM /33 K 489 ).
We implement a server-side trace using four system stored procedures, prefixed with
sp_trace_ :
• sp_trace_create – creates a new trace, defining the path to the trace file,
maximum file size, the number of rollover files, and other options.
• sp_trace_setevent – adds an event to the trace, or a data column to an event.
Our trace script must call this stored procedure once for every event and column
combination; four events, with ten columns each, means we call it 40 times.
• sp_trace_setfilter – adds a filter to the trace; called once for every filter we
wish to add.
• sp_trace_setstatus – sets the trace status (start, stop, close/delete).
We created the trace definition in the previous section so open the .sql file in SSMS
and it should look as shown in Listing 1 .
/****************************************************/
/* Created by: SQL Server 2008 R2 Profiler */
/* Date: 08/10/2013 08:00:58 PM */
/****************************************************/
-- Create a Queue
declare @rc int
declare @TraceID int
declare @maxfilesize bigint
declare @DateTime datetime
69
set @DateTime = '2012-08-10 21:00:34.207'
set @maxfilesize = 5
error:
select ErrorCode=@rc
finish:
go
Let's walk through the various sections of the script, focusing on those sections we
need to modify to suit our requirements.
70
sp_trace_create section
We need to modify the script according to our requirements for the trace definition, as
specified in the call to sp_trace_create .
We can skip past @TraceID , since SQL automatically assigns a value to this output
parameter.
The second parameter is @Options , and we're currently using a value of 0, meaning
that we're not specifying any of the available options, most significantly the number of
trace rollover files. This means that SQL Server will only create one trace file, sized
according to the value of @maxfilesize , and when the trace file reaches that size the
trace will stop running. That's not usually the behavior I want, so I change the value of
@options to 2, enabling the TRACE_FILE_ROLLOVER option. Other options are
available, such as SHUTDOWN_ON_ERROR , but I typically don't use them (see Books
Online for details, HTTP://TECHNET.MICROSOFT.COM/EN-US/LIBRARY/MS 190362. ASPX ).
Next, we come to the @tracefile argument, and this is the only part of the script that
we must modify, in order to run it, supplying the path and name for the trace file, as
indicated by 'InsertFileNameHere' in the script. 'InsertFileNameHere' can be
a path and file name that is local or remote to the server. Ensure that you have
sufficient free disk space at this location. How much space you'll need depends on
your system and on what you select in the trace definition. Notice that we exclude the
.trc extension from the file name because it is going to add it automatically. If you
do add .trc , you'll end up with a file named MyTrace.trc.trc .
Next, we have the @maxfilesize argument, set to 5 MB by default. I set the
@maxfilesize to 1024 for a maximum file size of 1 GB, but you should set this as
appropriate for your system.
Finally, we come to the @stoptime argument, where we specify a specific time for
the trace to stop running. Notice that the Profiler-generated script names the local
variable @DateTime , rather than @stoptime and passes it to sp_trace_create via
its input parameter position (5th). Profiler defaults to running a trace for an hour, but
maybe we want it to run for a shorter or longer period, or maybe we want it to run
until we stop it manually. In the latter case, we pass NULL to @DateTime . My servers
are located in a different time zone to me, so I like to modify the script to use DATEADD
instead, such as dateadd(hh, 1, getdate()) .
Listing 2 summarizes all of these modifications.
71
set @maxfilesize = 1024
…
exec @rc = sp_trace_create @TraceID output, 2, N'\\Server1\Share1\MyTrace',
@maxfilesize, @Datetime
There is one further input parameter, @filecount , that we didn't add to our script.
It's the final argument for sp_trace_create and it allows us to specify the
maximum number of trace files to maintain. We can only use @filecount if the script
also specifies the value 2 (TRACE_FILE_ROLLOVER ) for @options , as we did here.
Once the specified number of rollover files exists, SQL Trace will delete the oldest
trace file before creating a new one.
In this example, we don't need to modify the events, columns, or filters that we set up
in Profiler, so we don't need to touch the sp_trace_setevent and
sp_trace_setfilter sections.
Tracing a workload
72
Having created the trace, we need to run it at a time, and for a period, that will enable
us to capture the necessary workload. For a general performance problem we will need
to run it over the period that will capture a representative workload for the server. If
the problem is isolated to a specific user process, we can start the trace, ask the user to
rerun the process to reproduce the problem, and then stop tracing.
In a development or testing environment, we may wish to simulate a realistic workload
to evaluate performance and the impact of our query tuning efforts. Options here
include the use of SQL Server Profiler's Replay Traces feature
(HTTP://MSDN.MICROSOFT.COM/EN-US/LIBRARY/MS 190995. ASPX ), or the SQL Server
Distributed Replay feature, for bigger workloads. Alternatively, various third party
tools will allow us to simulate a workload on the server, for example Apache JMeter
(HTTP://JMETER.APACHE.ORG/ ) or LoadRunner (HTTP://TINYURL.COM /9 KZAT 2 T ). For
the purposes of this chapter, you might consider simply running a random workload
against your AdventureWorks2008R2 database (see, for example, Jonathan
Kehayias's article at HTTP://TINYURL.COM/NMLOJWR ).
Having captured the workload, we simply stop the trace. If the value of @DateTime
(see Listing 2 ) is non-NULL , the trace will stop automatically when the system time
on the server reaches that value. If we want to stop the trace earlier, or if we set
@DateTime to NULL , then we need to stop and close/delete the trace manually. We
can do this using sp_trace_setstatus , supplying the appropriate TraceID and
setting its status to 0 (zero). If our user-defined trace is the only one running on the
server, and assuming no one disabled the default trace, which has a TraceID of 1,
then our trace will have a TraceID of 2.
The default trace
If you are unfamiliar with the default trace, it might be because Books Online documents it so poorly. You
can find some better information about it here: HTTP://TINYURL.COM/PCU 8 LFP . In Chapter 7 , What
changed? – Auditing Solutions in SQL Server, Colleen Morrow discusses its use for auditing purposes.
We would use Listing 3 to stop, and then close and delete, our trace.
73
system.
You should be able to recognize your TraceId from the result set as it contains
information we passed to sp_trace_create , like the name of your trace file and a
value of 1024.
USE MyDatabase
GO
74
-- import all the files
SELECT *
INTO MyTrace
FROM FN_TRACE_GETTABLE('\\Server1\Share1\MyTrace.trc', DEFAULT);
You may have numerous files to import that total several gigabytes, and remember,
these imports will create many log records in the transaction log, possibly causing it to
grow considerably. If you have limited free space on the transaction log drive of the
database, you may want to break up the import into multiple batches. Typically, I will
import the trace data into a database that is using SIMPLE recovery model. However, if
your database uses FULL or BULK_LOGGED recovery model, you will want to run a
transaction log backup after each batch. In Listing 6 , we break the import of 20 trace
files into two batches of ten.
SELECT *
INTO MyTrace
FROM FN_TRACE_GETTABLE('\\Server1\Share1\MyTrace.trc', 10)
If we specify a value for the number_files parameter that is higher than the number
of trace files, it won't cause an error. The function is smart enough to read only the
files that exist.
We can also use fn_trace_gettable to query the active trace file; we don't need to
stop the trace to view or even import the data. This means we can view the trace data
while the trace is running, just as we can in Profiler.
Using ClearTrace
I start my analysis at the summary level, using ClearTrace, a tool written by Bill
Graziano that summarizes trace data information. Not only is it a great tool, it's free!
75
You don't even need to wait for the import to complete to run ClearTrace.
fn_trace_gettable and ClearTrace can access your trace file(s) at the same time,
which gives you an opportunity to start your analysis early. I sometimes even skip
importing the data into a table because often ClearTrace has already revealed the
problem.
After you've downloaded ClearTrace onto a client computer, double-click on the exe to
launch it. You will need to specify a server and a database where it can store the
summary data. Next, click the first Browse button next to the First File field and
select the first file in your trace. By default, it will import all of the trace files in the
sequence if you have multiple files. If you only want to import the first file, uncheck
Process All Trace Files in Sequence . Click the Import Files button at the bottom for
ClearTrace to start the import.
Depending on the size of your trace file(s), the import could take a few minutes. You
can watch its progress in the Import Status tab. Once it has finished importing your
trace file(s), it will show you the summarized results in the Query the Imported Files
tab. It'll show you the top 15 summarized results, sorted by CPU . There are a lot of
things we can change to see different results, but the most common things that I do is
sort it by Reads , AvgReads and Duration .
Figure 1 shows sample output from our previous trace.
I selected the Display Averages option and, in the Order By drop-down, sorted the
data by AvgReads . Note that ClearTrace strips all of the stored procedures of their
parameters in order to group together the common SQL statements. The CPU , Reads
, Writes , and Duration values are all cumulative. The Average columns show values
averaged over the number of calls to the common SQL statement.
76
The first row in this example shows that USP_GETLOGS (ClearTrace uppercases the
data) was called three times and had 827,163 reads. Doing some math, we get 275,721
for the average reads which matches what ClearTrace shows.
When performance tuning at a high level, I start my analysis with the “worst
performers,” which I define loosely as those statements that average more than 5,000
reads per call, and investigate the possibility of missing indexes. Depending upon how
well tuned your system is, your value for “high” may be different.
In this example, the top seven rows qualify as my “starting points” for analysis, but I'd
look first at USP_GETLOGS , due to the very high average reads. This stored procedure
also has the highest AvgCPU , as confirmed by sorting the results on this column.
Let's take a quick look at our trace data. The fn_trace_gettable function returns a
table with all the columns that are valid for that specified trace. One such column is
EventClass , which corresponds to the event IDs (see Listing 1 ). We can convert the
event ID values to proper event names by joining to the sys.trace_events , a
catalog view that lists the trace events, on the trace_event_id column. I always
start my investigation by looking for the read-intensive queries.
On an OLTP system, a high number of reads often indicates missing indexes, resulting
77
in too many index or table scans. Therefore, I can frequently resolve performance
issues just by looking at the Reads column and investigating tuning and indexing
possibilities.
If you're analyzing a system that has queries taking a few seconds or longer, simply
convert Duration to seconds (Duration/1000000.0 as Duration ), rather than
milliseconds. Use 1,000,000.0 instead of just 1,000,000 because the decimal points
matter when dealing with seconds. There's a big difference between 1.01 seconds and
1.9 seconds and if you divide by 1,000,000, both values will display as 1.
Of course, it is easy to adapt Listing 8 as required. For example, to focus on specific
events, we simply add the appropriate filter (e.g. where e.name =
'RPC:Completed' ), or look for queries that breach a certain read or duration
threshold (where Reads > 5000 for example), and so on.
Listing 9 shows a different query that calculates the proportion of CPU usage
attributable to each query in the workload (that has a duration of over 1,000 ms, since
we added the filter to our trace).
Unless a single process is hogging the CPU, this data may not be too interesting. It
gets more exciting if we group together queries based on the object accessed. The
easiest way to do this is if our system is using stored procedures for data access and
can group by ObjectName .
I broke out the total CPU time to make the query easier to read. For simplicity reasons,
we didn't collect the ObjectName data column in our server-side trace. You will need
to add it if you want to run the above query, or you can try parsing the procedure name
78
out of the TextData column (see HTTP://TINYURL.COM/PF 9 KZMR for example).
If your system isn't using stored procedures for data access, then grouping isn't as easy
unless you have the same queries running over and over again. You can try grouping
by TextData , as shown in Listing 11 , but the problem is that different query
structures and hard-coded parameter values can make it hard to identify “similar”
queries. We could normalize the data by stripping out the parameter values, but the T-
SQL code becomes complex. It's easier to use ClearTrace, which does this
automatically, in order to group together common queries.
There may be times when the problem isn't related to high reads, duration or CPU
alone but rather a combination of the duration of a statement and the number of
executions of that statement. For example, a stored procedure that takes 5 minutes to
run but is called only once per day is much less problematic than one that averages
200 milliseconds per execution but is called 50 times per second by multiple sessions.
In the latter case, I'd certainly look for ways to tune the stored procedure to bring
down the duration.
You'll notice that I removed objects with a name of sp_reset_connection from the
result set. If you have a system that is using connection pooling, you'll see a lot of
these in your trace. You won't generally notice it for the other trace queries that we've
run, but it might show up in a query like this where we are finding out how often
things are being run. It doesn't provide anything useful here, hence the exclusion. You
could also add this exclusion as a filter in the trace.
79
Resolution
Having identified through tracing the most problematic queries, our attention turns to
resolution. It is not the intent of this chapter to discuss the details of query
performance tuning; there are many whole books on this topic, but I will just review a
few common causes from my own experience.
There are two broad reasons why code performs badly. Firstly, it could be bad code,
for example, code that uses inefficient logic in calculations, contains non-SARGable
predicates, or scans through the data more times than necessary, and so on. If so, we
need to refactor the code, rewriting it to be more efficient.
If you're convinced the code is sound, then it could be that, for some reason, the SQL
Server query optimizer is choosing a suboptimal execution plan. Let's review briefly
some of the common causes of suboptimal plans.
Missing indexes
This is by far the Number One cause of performance issues in the environments that I
support. Without the right index, queries will scan the clustered index or table, which
will show up in our trace data as high reads. The most common resolution is to add an
index, or to alter an existing index to make it a covering index and so avoid a key
lookup.
If you have run a trace to identify the most expensive queries in your workload in
terms of reads, duration, and frequency of execution, and so on, then you've already
identified the ones most likely to benefit from a different indexing strategy. If not, then
you can have Database Engine Tuning Advisor (DTA) analyze your workload. As
discussed previously, you can use the Tuning template to create a workload that the
DTA will then analyze to make index recommendations. Alternatively, and many
DBAs prefer this route, you can use the missing indexes report (HTTP://TINYURL.COM/K
8 PALPB ) to help you identify the “low-hanging fruit.”
In either case, make sure that you analyze the suggested indexes. Don't just blindly
create them, as each index comes with additional overhead, in terms of both space and
the need for SQL Server to maintain each index in response to data modifications. I
recommend Aaron Bertrand's article on what to consider before creating any
“missing” indexes (see HTTP://TINYURL.COM/M 9935 UA ).
Getting the right indexes on your system takes experience. For example, I recently had
to drop an index to fix a bad execution plan problem. The query optimizer kept
picking the “wrong” index for a table in a particular stored procedure, executed several
times per second. We tried updating statistics and providing index hints, but it was still
picking what we believed to be the wrong index. After reviewing other stored
procedures that access the problem table, we realized that the “wrong” index was
superfluous. We dropped it, and the query optimizer was now picking the “right” index
and the stored procedure returned to running fast.
80
Out-of-date statistics
Statistics are metadata used by the query optimizer to “understand” the data, its
distribution, and the number of rows a query is likely to return. Based on these
statistics, the optimizer decides the optimal execution path. If the statistics are wrong,
then it may make poor decisions and pick suboptimal execution plans. You need to
make sure that for your SQL Server instances you've enabled both the Auto Update
Statistics and Auto Create Statistics database options. For certain tables,
especially large tables, you may in addition need to run a regular (such as daily)
UPDATE STATISTICS job. Erin Stellato covers this topic in great length
(HTTP://TINYURL.COM/PSF 8 G 66 ).
81
why you might want to migrate sooner rather than later.
Extended Events is available only in SQL Server 2008 and later, and it is only in SQL
Server 2012 that we have a built-in UI, allowing users to analyze the collected event
data without recourse to XQuery and T-SQL, although it's possible to analyze SQL
Server 2008 extended event data in the SQL Server 2012 UI.
82
trigger. This is a lot harder to do with SQL Trace.
Extended Events also offers a lot more flexibility in terms of output targets for the
event data. SQL Trace can only output to a file, to a table or to the GUI in SQL
Profiler. As well as providing basic in-memory (ring_buffer ) and file
(event_file ) targets, Extended Events offers a set of advanced targets that perform
automatic aggregations. For example, an Event Bucketizer target that produces a
histogram and an Event Pairing target that matches beginning and ending events. It
can also output the SQL Server Error Log and the Windows Event Log.
There is no room in this chapter for more than this brief overview of just a few of the
potential advantages of Extended Events. I highly recommend that you refer to
Jonathan Kehayias's blog at HTTP://TINYURL.COM/OQ 7 OZ 23 for much more in-depth
coverage of this topic.
83
Click on the Configure button to move to the Configure view, which contains the
configuration options for the selected events. We'll start with the Event Fields tab,
which shows, for each event, the data columns that form part of its base payload. For
example, Figure 3 shows the base payload for the sql_batch_completed event.
Notice that its base payload includes the equivalent of five out of eight of the data
columns we included in the equivalent trace. The batch_text field is optional but
included by default. Missing are the SPID , LoginName and StartTime .
Switch to the Global Fields (Actions) tab to see what data columns we can add as
actions to our event session. In this case, we wish to add the session_id (equivalent
to SPID ) and server_principal_name (equivalent to LoginName ) fields to both
events. The StartTime column from SQL Trace is not available, though when the
events fire, the event data will automatically include the collection time of the data.
Finally, we want to add a filter, so move to the Filter (Predicate) tab. In Extended
Events, unlike in SQL Trace, we can apply filters separately to individual events. We
can create a filter on any event field or global field available to the event. For both our
events, we want to add a filter on the Duration event field (which is common to both
84
events), so CTRL-click to highlight both events, and define the filter as shown in
Figure 5 (the duration is in microseconds).
Next, move to the Data Storage page to define a target. To mimic our trace, we'll
choose the event_file target with a maximum size of 1 GB.
Having done this, simply click OK to create the event session definition and then, in
SSMS, right-click on the new event session to start the session. Run an appropriate
workload and then right-click on the event_file target (under the SlowQueries
event session) and select View Target Data , to see the event data collected.
85
Figure 7: Viewing event data from the target.
86
CLR: HTTP://TINYURL.COM/CPB 5387 .
• Jonathan Kehayias's comprehensive converter: HTTP://TINYURL.COM/O 36 HKAO .
Manually converting traces will be a lot of work if you have many traces. The SQL
CLR converter sounds better, but you have to compile the assembly and then load it
into your SQL instance. Jonathan Kehayias's converter uses a T-SQL stored procedure
and works reliably. With his tool, I can convert my trace in a matter of minutes.
Download his converter and then open the script, which includes the code to create the
stored procedure, and mark it as a system object, and to convert an existing trace. It
will convert a specified server-side trace that is running, or whose definition is loaded
into SQL Server.
Run the trace script from Listing 1 with the modifications from Listing 2 and then stop
it (@status = 0 from Listing 3 ) but don't close it. We can now run the converter, as
shown in Listing 13 , supplying the value of @TraceID for the trace to be converted.
EXECUTE sp_SQLskills_ConvertTraceToExtendedEvents
@TraceID = 2,
@SessionName = 'XE_SlowQueries2',
@PrintOutput = 1,
@Execute = 1;
Specifying a value of 1 for @PrintOutput , the default value, prints out the T-SQL
that creates the XE session. Listing 14 shows the output.
IF EXISTS ( SELECT 1
FROM sys.server_event_sessions
WHERE name = 'XE_SlowQueries2' )
DROP EVENT SESSION [XE_SlowQueries2] ON SERVER;
GO
CREATE EVENT SESSION [XE_SlowQueries2] ON SERVER
ADD EVENT sqlserver. rpc_completed
( ACTION
( sqlserver.server_principal_name -- LoginName from SQLTrace
, sqlserver. session_id -- SPID from SQLTrace
-- BinaryData not implemented in XE for this event
) WHERE ( duration >= 1000000 )
),
ADD EVENT sqlserver.sql_batch_completed
( ACTION
( sqlserver. server_principal_name -- LoginName from SQLTrace
, sqlserver. session_id -- SPID from SQLTrace
) WHERE ( duration >= 1000000 )
)
ADD TARGET package0. event_file
( SET filename = '\\MyDirectory\XE_SlowQueries2.xel',
max_file_size = 1024,
max_rollover_files = 0
)
87
sqlserver.sql_batch_completed (sqlserver is the name of the parent package
that contains these events). As noted previously, these events contain in their base
payloads most of the event columns we defined in our server-side trace. The converter
adds, as actions, the sqlserver.server_principal_name (equivalent to
LoginName ) and sqlserver.session_id (equivalent to SPID ) global fields.
The filter (“duration >= 1000000” ) applies to each event, individually. As noted
earlier, this is a big difference between server-side traces and XE sessions. We can
define multiple filters in a server-side trace, but they apply to all events in the trace
that contain the filtered column. With XE, each event can have its own predicate.
The converter created the XE session, since we specified 1 for @Execute , but did not
start it. We can start it from SSMS or via an ALTER EVENT SESSION command.
Of course, it is never a good idea to convert code blindly, so we need to analyze the
converted trace definition and test it. In this case, we can script out the event session
we created manually in the previous section, and compare it to the converted trace
definition.
Once we start to understand fully the Extended Events architecture, and how it works,
we can start to build optimized event sessions.
Conclusion
Traces provide a wealth of information, including events that can help you diagnose a
performance problem. By learning to perform a server-side trace instead of running a
trace through Profiler, we can ensure that our traces do not impede performance on the
server we are analyzing. Alongside this, we can also start to experiment with the
migration of our existing trace portfolio over to Extended Events.
88
Windows Functions in SQL Server 2012
Dave Ballantyne
If you are a SQL Server 2012 developer, you may not be so interested in Always-on
High Availability and “Big Data,” the focus of the marketing hype for SQL Server
2012, but you'll love the enhancements to the SQL Windows functions. This is the real
big-ticket item.
SQL Server 2005 ushered in the first ANSI-standard Windows ranking and aggregate
functions, along with the OVER clause, which allowed us to apply the functions across
each row in a specified window/partition of data. The use of OVER alongside the new
Common Table Expressions (CTEs), made for a revolutionary change in the way that
one could develop T-SQL solutions. Still, certain procedures, such as the rolling
balance calculation, remained difficult and required the use of cursors or inefficient
sub-selects.
SQL Server 2012 takes support for Windows functions to the next of many logical
steps, by allowing us to use a true sliding window of data. With a sliding window, we
can perform calculations such as the rolling balance using a faster, more efficient, and
readable statement.
This chapter will walk through how to use the OVER clause and sliding windows, with
aggregate and analytic functions. It will also demonstrate the performance overhead
associated with different techniques for defining the frame extent of the window for
our calculations.
Sliding Windows
Let's see what sorts of calculations are possible when using sliding windows. First, we
need to generate some sample data from AdventureWorks2012 , as shown in Listing
1.
USE AdventureWorks2012;
GO
SELECT SALESPERSONID ,
MIN(ORDERDATE) AS ORDERMONTH ,
SUM(TOTALDUE) AS TOTALDUE
INTO #ORDERS
FROM SALES.SALESORDERHEADER SOH
WHERE SOH.SALESPERSONID IS NOT NULL
GROUP BY SALESPERSONID ,
DATEDIFF(MM, 0, ORDERDATE);
GO
89
In SQL Server 2005 and later, we can use the OVER clause to partition the data and
perform calculations over the rows in each partition. For example, we can partition the
data by SALESPERSONID in order to calculate the total sales attributable to each sales
person, as shown in Listing 2 . For each row (i.e. order) in each partition, it calculates
the total sales for that sales person. This allows the developer to perform comparisons
and calculations on the present row's value, relative to the total value (for example,
calculate the percentage).
SELECT * ,
SUM(TOTALDUE) OVER ( PARTITION BY SALESPERSONID )
FROM #ORDERS
ORDER BY SALESPERSONID;
In SQL Server 2012, we now have the means to look at specific rows within that
partition of data relative to the “current” row i.e. to perform calculations across a
sliding window of data. For example, Listing 3 calculates a simple rolling monthly
balance for each sales person, simply by adding an ORDER BY clause to the OVER
clause, and specifying that the order is ORDERMONTH .
SELECT SALESPERSONID ,
ORDERMONTH ,
TOTALDUE ,
SUM(TOTALDUE) OVER ( PARTITION BY SALESPERSONID ORDER BY ORDERMONTH )
AS ROLLINGBALANCE
FROM #ORDERS
ORDER BY SALESPERSONID ,
ORDERMONTH;
Listing 3: Using OVER with a sliding window to calculate a rolling monthly balance for each sales person.
By adding ORDER BY ORDERMONTH to the OVER clause, we changed the scope of the
calculation from “totals sales value across the entire partition,” as in Listing 2 , to
“total sales value for this row and every row preceding.” However, watch out for
missing data that can cause errors when we're trying, for example, to aggregate data
over the previous financial quarter. If there has been no financial activity for a month
within that financial quarter, then we will need a “blank” row to ensure that rows will
still be equivalent to “months.” Otherwise, there will be no entry for that month.
When using these functions, it is important to remember that the rows are relative to
the current row within the partition, and the calculation is performed in the ORDER
specified in the OVER clause. This order can be in an entirely different order to the
order of the result set returned by the statement and, indeed, the order specified for any
other window functions. All partitions and their ORDER BY clauses operate and
calculate entirely independently of each other. In some cases, this can lead to
suboptimal execution plans. For example, consider Listing 4 , which uses two
windowing functions, each establishing a different order for the ORDERMONTH column.
SELECT SALESPERSONID ,
90
ORDERMONTH ,
TOTALDUE ,
SUM(TOTALDUE) OVER ( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH ) AS ROLLINGBALANCE ,
SUM(TOTALDUE) OVER ( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH DESC ) AS ROLLINGBALANCEDESC
FROM #ORDERS
ORDER BY SALESPERSONID ,
ORDERMONTH;
GO
The execution plan for Listing 4 contains three Sort operations, one for each of the
windowing functions and then a third for the final ORDER BY clause of the statement.
However, if we simply swap the positions of the ROLLINGBALANCE and
ROLLINGBALANCEDESC calculations, the plan contains only two sorts; the final ORDER
BY clause requires no sort because the previous sort guarantees the data is now in the
required order.
The optimizer cannot change the sequence of functions, and therefore the sorting
operations required, but it can exploit the fact that data is already guaranteed to be in
the required order.
The OVER clauses in Listings 3 and 4 “hide” two default values, and are equivalent to:
The first “hidden” default is use of the RANGE clause, and the second is UNBOUNDED
PRECEDING , which defines the window frame extent . The use of RANGE means that,
by default, the function works with each distinct value in the column used to order the
rows, in each partition. The alternative is to specify ROWS , which means that the
function works with each row in each partition. Use of UNBOUNDED PRECEDING
simply means that, by default, the windows start with the first row in the partition.
However, we have other options to define the window frame extent. Let's look at these
first, before investigating the behavioral and performance differences between using
RANGE versus ROWS .
91
data, in which we can do our aggregations.
The RANGE option supports only UNBOUNDED PRECEDING AND CURRENT ROW and
CURRENT ROW AND UNBOUNDED FOLLOWING , the latter meaning simply that the
window starts at the current value and extends to the last row of the partition. An
attempt to use any other values will result in the following error message:
If we're using ROWS , we have more flexibility. Continuing our #ORDERS example,
Listing 7 specifies that we wish to SUM the TotalDue and get the MIN of OrderMonth
for the present row and the four rows preceding it, inclusive, so five rows in total.
SELECT * ,
SUM(TOTALDUE) OVER ( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH ROWS 4 PRECEDING )
AS SUMTOTALDUE ,
MIN(ORDERMONTH) OVER ( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH ROWS 4 PRECEDING )
AS MINORDERDATE
FROM #ORDERS
WHERE SALESPERSONID = 274
ORDER BY ORDERMONTH;
92
Behavioral differences between RANGE and ROWS
In our previous #ORDERS example, the values in the ORDERMONTH column, used to
order the partitions, are unique. In such cases, our windowing function will return the
same results regardless of whether we use ROWS or RANGE .
Differences between the two clauses arise when the values in the column specified in
the ORDER BY clause of the OVER clause are not unique. If we use RANGE , then our
windowing function will produce an aggregate figure for any rows in each partition
with the same value for the ordering column. If we specify ROWS , then the windowing
function will calculate row by row.
To appreciate these differences clearly, let's look at an example.
USE AdventureWorks2012;
GO
CREATE TABLE #SIMPLESALES
(
SIMPLESALESID INTEGER PRIMARY KEY ,
SALEID INTEGER NOT NULL ,
LINEID INTEGER NOT NULL ,
SALES MONEY NOT NULL
)
GO
The query in Listing 7 performs two SUM calculations, in each case partitioning the
data by SALEID and ordering data within each partition according to LINEID . The
only difference is that the first calculation uses RANGE and the second uses ROWS .
SELECT SALEID ,
LINEID ,
SALES ,
SUM(SALES) OVER ( PARTITION BY SALEID ORDER BY LINEID
RANGE UNBOUNDED PRECEDING ) AS UNBOUNDEDRANGE ,
SUM(SALES) OVER ( PARTITION BY SALEID ORDER BY LINEID
ROWS UNBOUNDED PRECEDING ) AS UNBOUNDEDROWS
FROM #SIMPLESALES
ORDER BY SALEID ,
LINEID
93
Notice that for ROWS , for each row in each partition we get a running total for the
current row and all previous rows in the window, in other words a row-by-row rolling
balance. However, when we use RANGE , the function produces a single sales value for
all rows within each partition that have the same value for LINEID . So for those rows
where SaleID=1 and LineID=2 , the UnboundedRange values for both are 16.
Similarly for rows with SaleID=2 and LineID=2 .
There are deep-underlying operational differences between ROWS and RANGE
calculations, and this can lead to a considerable difference in performance. We'll look
at this in more detail shortly, after a brief examination of options for defining the
window frame extent.
94
differences might have on performance. The RANGE and ROWS queries in Listing 8
return identical results (due to the uniqueness of ORDERMONTH ).
SELECT /* RANGE */ * ,
SUM(TOTALDUE) OVER ( PARTITION BY SALESPERSONID ORDER BY ORDERMONTH )
FROM #ORDERS;
GO
SELECT /* ROWS */ * ,
SUM(TOTALDUE) OVER ( PARTITION BY SALESPERSONID ORDER BY ORDERMONTH ROWS
UNBOUNDED PRECEDING )
FROM #ORDERS;
If we were to examine the execution plans, we would see that SQL Server reports the
Query cost (relative to the batch) as being an even 50/50 split, as shown in Figure 2 .
Unfortunately, this is simply not true. The correct phrasing should be estimated query
cost (relative to total estimate for batch) ; the cost figures used by the Optimizer are
estimated costs, not actual costs. SQL Server estimates that the ROWS and RANGE
operations cost the same in terms of I/O, though the actual execution costs are vastly
different due to the Window Spool operator for the RANGE operation writing to disk
but the one for the ROWS operation working with the faster in-memory spool.
The estimated I/O costs are only of use to the optimizer to enable it to pick between
different candidate plans for a single query. For example, when joining two tables in a
query, two candidate plans may be considered as viable, logically equivalent (i.e. they
will produce the same output) alternatives. Let's say one of these plans uses a Nested
Loop to join the tables, the other a Hash Match join. For each viable plan, the
optimizer will calculate an estimated I/O figure and use it to make an educated choice
as to the cheapest alternative.
However, to the optimizer, the Window Spool operator is a black box and in order to
produce the user's expected results it has to execute it; there are no alternative
95
execution plans that it can consider. In that context, it doesn't matter to the Optimizer
that the estimated cost is unrepresentative of the actual execution cost.
A Profiler trace of the two queries highlights how much more expensive the RANGE
query is over the ROWS query, and how wrong the estimate of a 50/50 split is.
There is, as you can see, a major difference between the reads and duration of the two,
so you should always use ROWS unless you really need the functionality that RANGE
offers. In light of this, it may seem odd that the default is RANGE but it is the ANSI
standard and Microsoft is obliged to follow it.
Analytic Functions
SQL Server 2012 supports a collection of new analytic functions that we use with the
OVER clause. These are LAG /LEAD , FIRST_VALUE / LAST_VALUE and then a set of
functions for statistical analysis, PERCENTILE_CONT , PERCENTILE_DISC ,
PERCENT_RANK and CUME_DIST .
SELECT * ,
LAG(ORDERMONTH, 1) OVER ( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH ) AS LASTORDERMONTH
FROM #ORDERS
ORDER BY SALESPERSONID ,
ORDERMONTH;
GO
96
Listing 9: Using the LAG analytic function.
For the first row, there is no previous row, of course, so the LAG function attempts to
reference a row outside of the partition boundary, and therefore returns the DEFAULT
value.
Listing 10 shows the logically equivalent, and considerably more verbose, pre-SQL
Server 2012 version.
SELECT #ORDERS.* ,
LASTORDERMONTH.ORDERMONTH AS LASTORDERMONTH
FROM #ORDERS
OUTER APPLY ( SELECT TOP ( 1 )
INNERORDERS.ORDERMONTH
FROM #ORDERS INNERORDERS
WHERE INNERORDERS.SALESPERSONID = #ORDERS.SALESPERSONID
AND INNERORDERS.ORDERMONTH < #ORDERS.ORDERMONTH
ORDER BY INNERORDERS.ORDERMONTH DESC
) AS LASTORDERMONTH
ORDER BY SALESPERSONID ,
ORDERMONTH;
Not only is the intent of the code harder to fathom, but performance is very poor in
comparison to LAG .
The LEAD function is the mirror image of LAG and it works in exactly the same way
with the same arguments, but processes rows ahead in the partition rather than behind.
SELECT * ,
FIRST_VALUE(ORDERMONTH) OVER ( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH )
97
FROM #ORDERS
ORDER BY SALESPERSONID ,
ORDERMONTH ,
TOTALDUE;
GO
As discussed, the default of a windowing function is RANGE , so we can write the same
query more formally by making the RANGE clause explicit.
FIRST_VALUE(ORDERMONTH) OVER
( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
As we saw earlier, there is a big performance difference between RANGE and ROWS , so
if we change the previous query to use ROWS and then compare it to the RANGE query
we can see a huge difference in the performance data, using Profiler.
FIRST_VALUE(ORDERMONTH) OVER
( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
As you can see, RANGE performs poorly compared to ROWS . Remember, though, that
performance is not the only consideration in choosing between the two; as discussed
earlier, they also behave differently if the ordering column in non-unique. As long as
the ordering column for the OVER clause is unique, as is the case with the #ORDERS
table, FIRST_VALUE queries will return the same data, regardless of whether we use
ROWS or RANGE . If it's not, then the sort order will be ambiguous and they may return
different data.
Listing 12 for the #SIMPLESALES table uses FIRST_VALUE twice, once with RANGE
and once with ROWS .
SELECT * ,
FIRST_VALUE(SALES) OVER ( PARTITION BY SALEID, LINEID ORDER BY LINEID
RANGE UNBOUNDED PRECEDING ) AS FIRSTVALUERANGE ,
FIRST_VALUE(SALES) OVER ( PARTITION BY SALEID, LINEID ORDER BY LINEID
ROWS UNBOUNDED PRECEDING ) AS FIRSTVALUEROWS
FROM #SIMPLESALES
ORDER BY SALEID ,
LINEID;
98
In fact, the values returned are the same in each case, in this example, but only the use
of ROWS guarantees those values. When using RANGE , for rows of equal LINEID
values, such as the rows where SALEID=1 and LINEID=2 , the FIRST_VALUE
function, in theory at least, can return either 5.00 or 10.00. The fact that we ordered
the partition by LINEID means that, in the case of ties (two rows with the same value),
we do not care which value the function returns. If we do care then we need to make
the sort order unambiguous by adding a “tie-breaker” to the sort; potentially,
SIMPLESALESID would be a good choice. This is the only way to guarantee ROWS and
RANGE will always return the same results.
Overall, it's easy to reach the conclusion that the RANGE functionality is rather
superfluous and can only serve to soak up some of our precious machine resources
without adding any functional value.
LAST_VALUE is functionally similar to FIRST_VALUE , but returns data for the last row
in the partition.
SELECT * ,
CUME_DIST() OVER ( PARTITION BY SALEID
ORDER BY SALES ) AS CUME_DIST ,
PERCENT_RANK() OVER ( PARTITION BY SALEID
ORDER BY SALES ) AS PERC_RANK
FROM #SIMPLESALES
ORDER BY SALEID ,
99
LINEID
Both functions return the relative position of the row within the partition, according to
the order specified by the partition function's ORDER BY clause. The difference is that
PERCENT_RANK returns 0 for the first row, and then evenly spaced values up to 1, for
each partition, whereas the starting value for CUME_DIST is the step value.
If there are duplicate values in the dataset, then these functions return the same value
for the duplicate rows, as demonstrated by Listing 14 and subsequent output, after
rerunning Listing 13 .
We see that for the rows WHERE SIMPLESALESID IN(2,9) , each of the functions
returns the same values.
100
will answer “What value is x% through the partition? ”
The PERCENTILE_DISC function returns a value that exists in the result set, but
PERCENTILE_CONT will return a calculated value that is exactly the specified
percentage through the result set. The value may exist in the result set, but only by
chance.
Listing 15 uses each function to return the value that is .5 (or 50%) of the way through
the data, partitioned by SALEID and ordered by SALES .
SELECT * ,
PERCENTILE_DISC
( 0.5 )
WITHIN GROUP ( ORDER BY SALES )
OVER ( PARTITION BY SALEID ) AS PERC_DISC ,
PERCENTILE_CONT
( 0.5 )
WITHIN GROUP ( ORDER BY SALES )
OVER (PARTITION BY SALEID ) AS PERC_CONT
FROM #SIMPLESALES
ORDER BY SALEID ,
LINEID;
Notice how, as there are an even number of rows for SALEID 2 , the PERC_CONT
function returns a value midway between the two middle values (107.5), which is not a
value that exists in the result set.
101
We also add a supporting index.
In the test in Listing 17 , we have two logically equivalent statements that return all of
the orders along with the value of the first order made by the relevant sales person.
The first statement uses a 2005-and-later compatible CROSS APPLY clause, and the
second uses the new FIRST_VALUE analytic function.
I use SELECT INTO simply to suppress the returning of results to SSMS. In my
experience, the time it takes SSMS to consume and present results can skew the true
performance testing results (an often-overlooked factor in performance testing).
-- CROSS APPLY
DECLARE @T1 DATETIME = GETDATE()
SELECT #ORDERS.* ,
FIRSTORDERVALUE.TOTALDUE AS FIRSTTOTDUE
INTO #1
FROM #ORDERS
CROSS APPLY ( SELECT TOP ( 1 )
INNERORDERS.TOTALDUE
FROM #ORDERS INNERORDERS
WHERE INNERORDERS.SALESPERSONID = #ORDERS.SALESPERSONID
ORDER BY INNERORDERS.ORDERMONTH
) AS FIRSTORDERVALUE
ORDER BY SALESPERSONID ,
ORDERMONTH ;
--FIRST_VALUE
DECLARE @T1 DATETIME = GETDATE()
SELECT #ORDERS.* ,
FIRST_VALUE(TOTALDUE) OVER ( PARTITION BY SALESPERSONID
ORDER BY ORDERMONTH ROWS UNBOUNDED PRECEDING )
AS FIRSTTOTDUE
INTO #2
FROM #ORDERS
ORDER BY SALESPERSONID ,
ORDERMONTH;
102
SELECT DATEDIFF(MS, @T1, GETDATE()) AS 'FIRSTVALUE TIME (ROWS)';
Listing 17: Comparing logically equivalent CROSS APPLY and FIRST_VALUE queries.
If we were to make a naïve judgment based only on these figures, we'd conclude that
FIRST_VALUE runs a little slower and so, in terms of duration at least, there seems
little reason to use it. However, don't leap to any conclusions because we've yet to gain
the full performance picture. Let's start a profiler trace and then rerun Listing 17 .
Although the duration of the FIRST_VALUE query exceeds that for the CROSS APPLY
query, the number of reads for CROSS APPLY is vastly greater. You'll see, as well, that
the CPU time is much greater than the duration in the case of the CROSS APPLY query.
What's happened here is that the CROSS APPLY query exceeded SQL Server's cost
threshold for parallelism threshold and so the optimizer has produced a
parallel plan . However, SQL Server did not parallelize the FIRST_VALUE query,
simply because the threshold cost was not crossed. Accidentally comparing parallel
and serial plans is another common problem in performance tests, and it complicates
the answer to the question “which is faster?”
If you run this test on a system with only one available CPU, or with parallelism
disabled, you should see that FIRST_VALUE query is at least twice as fast. Ultimately,
which one will perform better at scale with many concurrent requests of this and other
queries? The answer is FIRST_VALUE .
Summary
With support for sliding windows of data, SQL Server 2012 offers a way to perform
complex analytical calculations in a more succinct and high-performance fashion. I
hope that this chapter offers you some ideas for how we can exploit them to solve
more business problems within the SQL Server engine than has ever previously been
possible.
103
SQL Server Security 101
Diana Dee
In theory, SQL Server security sounds simple. We grant permissions to the people and
processes that need access to our SQL Server instances and databases, while
controlling which data they can access and what actions they can perform.
However, behind this simple statement lies much complexity. In practice, it is easy for
the novice database administrator (DBA) to get lost in the hierarchy of principals ,
securables and permissions that comprise the SQL Server security architecture. The
most common result is the path of least resistance, where people end up with “too
many privileges.” They can access data they shouldn't be able to see; they can perform
actions they have no need to perform. Most people may be unaware of their over-
privileged status. However, if an illegitimate person gains access to their credentials, it
will, of course, compromise the security and integrity of your databases and their data.
My primary goals in this chapter are to provide a simple, concise description of how
the SQL Server permission hierarchy works, and then to demonstrate, step by step,
how to implement the “Principle of Least Privilege” in securing SQL Server.
According to this principle, any person or process that we allow to access a SQL
Server instance, and any database in it, should have just enough privileges to perform
their job, and no more.
Any DBA tasked with implementing basic security within a SQL Server instance
should refer first to their written security plan to find a detailed specification of who
should be allowed to do what. After reading this chapter, they should then be able to
implement that plan, ensuring access to only the necessary objects, and the ability to
perform only essential actions, within that instance and its databases.
104
• Auditing of actions and object access within the SQL Server instance –
government regulations mandate many of these auditing requirements.
• Encrypting certain data, or entire databases, as necessary – never store or
transmit personal and sensitive data in plain text form.
• Protecting backups from unauthorized access – unless we encrypt the backup
file directly or use transparent data encryption, then someone can read it.
• Securing the SQL Server instance – granting permissions to perform actions or
access objects.
This chapter covers only the very last of those bullet points.
We can think of the SQL Server Security hierarchy as a set of nested containers,
similar to a folder hierarchy in Windows Explorer, or similar to a city block containing
buildings, which in turn contain rooms, which in turn contain objects, or “content.”
Books Online provides a good illustration of this hierarchy, which I reproduce here.
For the original, see: HTTP://TECHNET.MICROSOFT.COM/EN-
US/LIBRARY/MS191465.ASPX .
105
Figure 1: The permission hierarchies.
Imagine that the city block represents the Windows operating system, the outermost
level of the hierarchy. On the city block sits a building that represents the SQL Server
instance (an installation of SQL Server). A principal needs permission to enter the
building. Within the building's walls are a few objects, and the principal may be
granted permissions on these objects.
The building also contains rooms that represent databases, and a principal needs
permission to enter each room. Within each room, there exist objects, organized into a
set of containers. This is the database level of the hierarchy and a principal who can
enter a database may be granted permission on the database and/or its schema and
objects.
106
The hierarchy of principals
The outermost container in the principal hierarchy is the Windows operating system
(OS), on the computer host. Principals at the OS container level are:
• Windows local groups and users.
• Active Directory groups and users (if the computer host is in an Active Directory
domain).
Within this hierarchy, a SQL Server instance is an application installed within the
Windows operating system. A SQL Server instance, referred to as “the server,” is a
complete database environment created by one installation of SQL Server. At the SQL
Server instance or server container level, the principals are:
• SQL Server login – there are two types of login, a SQL-authenticated login within
the server, associated with a single SQL Server principal, or a Windows login based on
a Windows operating system principal (Windows group or Windows user).
• Server-level role – a collection of SQL Server logins. Each member of a role
receives all of the server-level permissions granted to that role.
Figure 2 shows the server-level principles in the SQL Server Management Studio
(SSMS) Object Explorer.
Contained within the server are databases. The principals within a database are:
• A database user – an account defined within a specific database, and usually
associated with a SQL Server login. Certain database user accounts appear by default
in every database (though many are disabled by default). We can also create our own
database users.
• A database role – a collection of database users. Each member of a database role
receives all of the permissions granted to that role, for that database. SQL Server
provides a number of fixed database roles that exist in every database, as well as the
107
ability to create user-defined database roles.
• An application role – after an application connects to a database using a login
mapped to a user, the application may change context to the application role by
supplying the name of the role and its password. After that, the connection to the
database has the permissions of the application role, and can connect to any another
database only as the guest user in the other database. Application roles are not
covered in further detail; see Books Online, Application Roles , for more information.
Figure 3 shows the database-level principals in SSMS object explorer, under the
Security folder of the AdventureWorks2012 database.
Within the SQL Server instance (the server), the securable hierarchy follows the four-
part SQL Server object naming convention:
server.database.schema.object
We will refer to each level (“nested container”) within the hierarchy as a “level”.
108
Books Online refers to the first three levels as “scopes.” Each of these four levels
forms the major focus of this chapter. We assign permissions at each level.
All contained schema or objects inherit a permission granted at the database scope,
and below, in the hierarchy. For example, granting EXECUTE permission on the
database will, in effect, grant EXECUTE permission on every stored procedure, user-
defined function, and user-defined data type in the database.
Although we may grant permissions to a principal on each individual object, if the
principal needs the same permission on more than one object, it is easier to group
those objects in a schema . If we grant, to a principal, permissions at the schema
scope, then that principal inherits the same permissions on all relevant objects in the
schema. For example, granting SELECT permission on a schema to a principal will, in
effect, grant SELECT permission on every table and view in that schema to the
principal.
109
• DENY – denies an action; overrides a GRANT that the principal may have from a role
membership or from an inherited GRANT at a higher scope.
• REVOKE – removes a GRANT or a DENY . Think of REVOKE as an eraser, removing the
GRANT or DENY permission. This is the same as unchecking a box in the GUI.
110
computer host and in the Active Directory domain of which the computer host is a
member.
It is a best practice to create a login from a Windows group, as opposed to one login
for each individual Windows user. This allows permission control by adding or
removing Windows users from the Windows group.
Following are instructions for creating a login from Windows for SQL Server, in
SSMS Object Explorer. In these instructions, angle brackets < > indicate a place for
you to enter a value; the angle brackets are not typed.
Expand the server's Security folder, right-click Logins and then select New Login to
bring up the Login – New dialog. In the Login name box, either:
• Type the Windows group name in the format <domain>\<group>. The domain is
either the Active Directory domain or the name of the computer host (to use the local
computer's security database).
• Alternatively, click the Search button to the right of the Login name box to find the
appropriate, recognized Windows group through the Select User or Group dialog.
Ensure that Object Types includes Group . In Locations , choose the domain or the
local computer. In the Enter the object name to select box, type the group name, and
then click the Check Names button. If the group is recognized, the name will appear
in <domain>\<group> format with an underline. Click OK to have the <domain>\
<group> item placed into the Login Name box of the Login – New dialog. The
resulting dialog should look as shown in Figure 4 .
Back at the Login – New dialog, we're obviously going to use Windows
authentication, and we can leave all the other options at their default settings.
At the bottom of Figure 5 , you'll see that the default database is master . The login
will connect automatically to this database when it establishes a connection to the
server. We may choose any user-defined database as the login's default database, but
111
keep in mind that the login must be mapped to a user in that database in order for the
login to successfully connect to the server. The master database allows guest access
and so we don't need to map the login to a user in master .
Figure 5: The Login – New dialog for the TribalSQL/SQL Users Windows group.
Before clicking OK to create the login, click the Script button to capture the CREATE
LOGIN script to a new query window.
Save the query as part of your documentation. Alternatively, if you are creating a
Solution in SSMS to contain all your CREATE statements, you can choose Script
Action to Clipboard and paste it into the Solution's appropriate SQL script window.
The Transact-SQL (T-SQL) code that performs the action can then be saved together in
a Project within the Solution.
Finally, in the Login – New dialog, click OK to create the login. The login name will
112
appear under the Logins for the server, in SSMS Object Explorer (see Figure 2 ).
Alternatively, of course, we can create the login directly from T-SQL code, as
demonstrated in Listing 1 .
Listing 1: Create a login for a Windows group named SQL Users , in a domain named TribalSQL .
SQL-authenticated logins
A login created within the SQL Server instance is a SQL-authenticated login (“SQL
login”), because the server creates and stores a user name and a (hashed) password for
that login. A SQL login is a “singleton” and not a group login.
We can only use SQL logins to connect to a server that is using “SQL Server and
Windows Authentication mode.” The other server authentication mode is “Windows
Authentication mode” and is more secure, as it allows only trusted connections.
However, it may be necessary to allow SQL Server-authenticated connections, because
some front-end applications can use only SQL logins to connect. We can change the
server authentication mode on the Security page of the Server Properties window.
Restart the SQL Server service for the change to take effect.
To create a SQL login, use the Login – New dialog as explained for Windows logins,
but select the SQL Server authentication radio button and enter a name for the login
and the password details.
113
Figure 7: Login – New dialog for a SQL login.
SQL logins in SQL Server 2005 or later can use one of three password enforcement
options. These enforcement options may be used if the computer host's operating
system is Windows Server 2003 or later, or Windows Vista or later. The password
enforcement options take the Windows account policies that apply to the computer
host and apply them to the SQL login, according to which password enforcement
policies are selected for the login.
• Enforce password policy (T-SQL: CHECK_POLICY=ON ) enforces settings of the
following Windows Account Policy settings: Password must meet complexity
requirements; Minimum password length; Minimum password age; Enforce
password history.
• Enforce password expiration (T-SQL: CHECK_EXPIRATION=ON ) enforces the
following account policies: Maximum password age; Account lockout threshold;
Account lockout duration; Reset account lockout counter after. CHECK_POLICY must
be ON to enable this option.
• User must change password at next login (T-SQL: MUST_CHANGE ) enforces the
Windows user account property of that name. CHECK_EXPIRATION must be ON to
enable this option.
114
If the computer host is a member of a Windows Active Directory domain, the
Windows account policy settings linked to the Organizational Unit (OU) to which the
computer host belongs will take precedence over the settings in any parent OUs.
Those, in turn, will take precedence over any policy settings linked to the domain,
which will take precedence over any policy settings set on the local computer. If the
computer host is not a member of a domain, the policy settings set on the local
computer will be in effect. You can find a good explanation of all Windows Account
Policies at:
HTTP://TECHNET.MICROSOFT.COM/EN-
US/LIBRARY/DD349793(V=WS.10).ASPX .
Details for the default database for the login are as described previously, and we can
script out the login creation. Listing 2 creates the login named SQLAuth1 with a
password of Pa$$w0rd , and enables the three password policy options.
USE [master]
GO
CREATE LOGIN [SQLAuth1]
WITH PASSWORD=N'Pa$$w0rd'
DEFAULT_DATABASE=[master],
CHECK_EXPIRATION=ON,
CHECK_POLICY=ON,
MUST_CHANGE;
GO
When a person or application wants to connect to the server with SQL authentication,
the login name and password must be presented to the SQL Server instance. If the
connection is over a network, this information is sent in clear, unless the transmission
is encrypted.
115
DENY permissions to a sysadmin member are ignored).
• securityadmin – members can manage logins. Members can grant server-level
and database-level permissions. Note: a security admin member may grant
CONTROLSERVER permission to a login other than itself.
• dbcreator – members can create, alter, drop, and restore databases.
With the exception of public , we cannot change the permissions granted to a built-in
server-level role. Starting with SQL Server 2012, we can create additional, user-
defined server-level roles.
Every login is automatically a member of the public server-level role, introduced in
SQL Server 2008; permissions assigned to the public role apply to all logins. By
default, the public role has the VIEW ANY DATABASE permission, meaning that any
login may see all the databases that exist on the server (although not what is in them).
116
• Full-text Filter Daemon Launcher service
• A service for each of the other components, if installed (Analysis Services,
Reporting Services).
Except for Browser and IS services, each installed SQL Server instance has its own set
of services, and these services are said to be “instance aware.” For example, the
database engine service for a default instance of SQL Server is MSSQLSERVER ,
whereas for an instance named DBENGINE2 , the database engine service is
DBENGINE2 .
When running SQL Server 2005 or later, you can see all the SQL Server-related
services running on the server using SQL Server Configuration Manager tool (Start >
All Programs > Microsoft SQL Server nnnn > Configuration Tools > SQL Server
Configuration Manager ).
Select SQL Server Services in the left pane and a list of services will appear on the
right. A service for the default instance of SQL Server will have (MSSQLSERVER) at
the end of its name whereas a named instance will have (instance_name) .
The Log On As column reveals the Windows user account under which the service is
running.
The two most important Windows user accounts are those for the SQL Server and SQL
Server Agent services. In SQL Server 2008, the SQL Server installation process
creates the following two Windows groups on the local “host” computer (the groups
may have different names in SQL Server 2005):
• SQLServerMSSQLUser$ computer $MSSQLSERVER – for the SQL Server service
• SQLServerMSSQLAgentUser$ computer $MSSQLSERVER – for the SQL Server
Agent service.
The character string computer will be the name of the computer host. A named
instance will have the instance name at the end of the group name, instead of
MSSQLSERVER .
The Windows user account for the SQL Server service is placed into the…MSSQLUser
…group; the Windows user account for the SQL Server Agent service is placed into
the…MSSQLAgentUser …group.
SQL Server automatically creates a login based on each of these Windows groups and
places them into the sysadmin role, meaning that the service Windows user accounts
117
are granted sysadmin privileges within the SQL Server instance.
SQL Server 2012 assigns new account types to the SQL Server services. In a Windows
domain, a Managed Service Account should be created in Active Directory (AD)
prior to installation, with the name having the format: DOMAIN\ACCOUNTNAME$ . If a
computer is not a member of a Windows AD domain, a Virtual Account is
automatically created, with the name NT SERVICE\<SERVICENAME> . For example,
for the default instance database engine: NT SERVICE\MSSQLSERVER will be the SQL
Server service account, and NT SERVICE\SQLSERVERAGENT will be the SQL Server
Agent service account.
Follow these steps to implement “least privileges” security for the service accounts:
• If possible, use different Windows user accounts for the SQL Server and SQL
Server Agent services. There are more circumstances requiring the SQL Server
Agent service account to be a Windows administrator than there are for the SQL
Server service account.
• Preferably, make the accounts in the AD domain. Make the accounts on the local
computer host if the computer does not belong to an AD domain.
• Set the account property Password never expires – an expired password for the
service's Windows user means the service will not start!
• Use complex passwords and store them securely.
• Create the Windows user accounts that will run the services before installing the
SQL Server instance that will use them.
• Do not place the Windows user accounts that will run the services in any group, to
avoid the account having privileges it does not need. The SQL Server installation
process (or changing the account using SQL Server Configuration Manager) will
give the account appropriate privileges.
• Never allow a real person to use the Windows user accounts for the SQL Server
services.
118
• A Windows user account under which a service runs – as explained previously.
Use SQL Server Configuration Manager, not the Windows Services applet, if you
need to change this account.
Server-level permissions
Following are the types of securable on which we may assign server-level
permissions:
• The server instance.
• Endpoints – connection points to SQL Server; not discussed further.
• Logins.
• Server role (SQL Server 2012 only).
• Availability groups (SQL Server 2012 only).
To see a list of permissions that apply to the server, see the Books Online topic,
Permissions (Database Engine) . For SQL Server 2012, you can find this at
HTTP://MSDN.MICROSOFT.COM/EN-US/LIBRARY/MS191291.ASPX .
Within this reference, scroll down to the table labeled SQL Server Permissions .
Within this SQL Server Permissions table, scroll down to the point at which the first
column has the value SERVER . The second column will show all the permissions for
the base securable SERVER . The table also shows possible permissions for endpoints,
logins, and server roles.
Formatting tip for the SQL Server Permissions table
If you want to copy the Permissions table to Excel, set the page to landscape and, before the copy, pre-set the
first five column widths to: 25.22, 33.22, 8.11, 17.00, and 34.44.
It may not be easy to discover what permissions you need to grant at the server level to
conform to the Principle of Least Privilege. Sometimes, permissions granted to a fixed
server role might allow more actions than you want to permit the login to perform.
If you look up, in Books Online, the Transact-SQL (T-SQL) command of an action
you want to grant a principal permission to perform, the Permissions section of that
command will tell you what permissions are required for that action. For example, the
CREATE DATABASE action requires us to grant either CREATE ANY DATABASE or
ALTERANY DATABASE server-level permission to the login that needs to create
databases. As another example, the ALTER TRACE permission is required to set up and
run a trace in SQL Profiler.
To grant a server-level permission in SSMS Object Explorer, right-click on the server
(the securable), select Properties and select the Permissions page. In the upper
Logins or roles pane, select the login or role to which you want to grant the
permissions. In the lower pane, on the Explicit tab, which show the explicitly-granted
permissions, select the check box in the Grant column, opposite each permission
119
required (the Effective tab shows the accumulated permissions from role membership,
plus explicit permissions).
Notice that the Connect SQL permission is already granted to the selected login.
Capture the script for documentation, and then click OK .
Database users
A database user (or user, for short) allows a person or application to authenticate to,
and connect to, the database in which the user is defined. The database user type
primarily described in this chapter is one that is mapped from a login (CREATE USER…
FOR LOGIN). We'll cover the special case of a contained database, introduced in SQL
120
Server 2012, a little later.
Users without logins and users created from a certificate or key
There are other users, created from a certificate or from a key (e.g. CREATE USER…FOR CERTIFICATE) , or
users created without logins, using CREATE USER…WITHOUT LOGIN. We won't cover these user types in this
chapter but Raul Garcia's blog, Quick Guide to DB users without logins in SQL Server 2005, explains how
to use a WITHOUT LOGIN user for impersonation:
HTTP://BLOGS.MSDN.COM/B/RAULGA/ARCHIVE/2006/07/03/655587.ASPX .
Note that a user without login may not connect to a database, but must be used via execution context
switching after a connection is made.
SQL Server stores, in the master database, a security identifier (SID) for each login.
The SID of a login from Windows is the same as the SID for the Windows group (or
Windows user) stored in the Windows operating system. Therefore, a login from a
particular Windows group (or user) on any SQL Server instance in the domain will
have the same SID for the login. On the other hand, the SID for a SQL-authenticated
login will be random, because it is created within the SQL Server instance. Two SQL-
authenticated logins named Bob on two different instances with have two different SID
s.
When we map a login to a user in a database, SQL Server stores the SID of the login in
the database, as the SID of the user created from that login. In other words, the login
SID and the database user SID match (we'll deal with cases where they don't, called
orphaned users , a little later).
121
bring up the Database User – New dialog. Enter the details as shown in Figure 9
(using the ellipsis button to search for the correct, associated login). Notice that the
user name does not have to be the same as the login name.
Figure 10: Create a SQLUsers database user for the TribalSQL\SQL Users login.
Listing 4 shows the equivalent T-SQL script for creating this user (be sure you are
connected to the appropriate database).
To see more of the CREATE USER command syntax, see the Books Online topic,
CREATE USER (Transact-SQL) .
122
we restore a database backup from one SQL Server instance to a different target
instance, we'll also restore, with that database, any associated users. Therefore, we
might restore to the target a database user called Bob but the new instance might have
no login for Bob , or there might be an existing login called Bob on the target instance,
but its SID will not match the SID for the restored user. In such cases, Bob is what we
call an “orphaned” user.
There are a couple of useful catalog views for exploring login and user SID s, shown
in Listing 5 .
We can synchronize the user's SID (user Bob in our example) to the login's SID by
executing Listing 6 , when connected to the database on the target instance.
123
We can back up a contained database and restore it to a different instance without
worrying about synchronizing user and login SID s, therefore solving the orphaned
user problem. This helps, for example, in moving or copying a database from test to
production, or from a production version to other members of an Always On
Availability Group. A user that authenticates at the contained database may connect
directly to that database.
For a contained database, the Principle of Least Privilege steps are as follows:
1. Create a Windows group to contain Windows users who need the same privileges
within the database.
2. Create a user in the contained database for that group. If a database-authenticated
user is required (because a Windows principal cannot be used), create it.
3. Grant/deny permissions to the database user, or place the user in a fixed or
administrator-defined database role, as appropriate.
Database roles
A database role is a database-level principal that we use to group together database
users who need the same permission or permission set.
Every database has several “fixed” database roles that exist by default. Permissions
assigned to a fixed database role (except public ) cannot be altered. Some of the key
fixed database roles are:
• db_owner – Members can perform any action in the database and can drop the
database. This is equivalent to CONTROL DATABASE permission. Members of
sysadmin map to the dbo user in every database, and are thus a member of this
database role.
• db_backupoperator – Members can back up the database.
• db_datareader – Members can SELECT data from all tables and views.
• db_datawriter – Members can INSERT , UPDATE , and DELETE data from all user
tables and views. Note that UPDATE and DELETE statements usually target one or more
specific rows, requiring a WHERE clause. A WHERE clause requires SELECT permission!
• public – All users are automatically members of this database role. Permissions
granted to this role apply to all users in the database.
When a fixed database role does not have the permission set that fits the need (too
little or too much permission), we can create a user-defined database role.
To add a user (or another database role) to a database role in SSMS, open the
Properties window for that role and click the Add button near the bottom, to browse
and select from the list of database users and roles. This method is best if you want to
select multiple users/roles to add to the role.
124
If you have the Properties window for a user open, you may select a role to which to
add the user. Be careful! The upper pane has a list of Owned Schemas that are named
the same as database roles. Selecting one of these will give the user ownership of the
schema. This is probably not what you want. Use the lower list, Database role
membership , to select the role to which to add the user.
If you prefer the T-SQL route, Listing 7 provides the script to add the SQLUsers user
to the db_datareader role.
Remember that the security hierarchy is Server > Database > Schema > Object . At
the server scope, permissions are granted to logins (or server roles). The
server/database boundary represents a security boundary, because at the database
scope and below, permissions are granted to users (or database roles). A login does not
125
have any privileges within an existing database (unless the login is a sysadmin or has
CONTROL SERVER permission).
126
Figure 11: Granting SELECT permission on HumanResources to SQLUsers .
127
Figure 12: Granting SELECT permission on vEmployees to SQLAuth1 .
/*Database scope*/
GRANT SELECT TO [SQLUsers];
/*Schema Scope*/
DENY SELECT ON SCHEMA::[HumanResources] TO [SQLUsers];
/*Object level*/
GRANT SELECT ON [HumanResources].[vEmployee] TO [SQLAuth1];
The two colons following SCHEMA make SCHEMA:: a “scope identifier.” This tells
SQL Server that the [HumanResources ] object is a schema.
Testing permissions
To test permissions for a Windows user, have the Windows administrator create a
Windows user and place that user in the appropriate Windows group. Use Run As (or
128
Switch user) to open SQL Server Management Studio as that Windows user, and
connect to the server using Windows authentication.
You can also use the SQLCMD application to check permissions. The following
command line code demonstrates a good way to open a command prompt window as
another Windows user. It will show the Windows user (Alice , in this example), in the
command prompt window title bar:
runas /noprofile /user:Alice cmd
You would then run SQLCMD as a “trusted” user in this command prompt window.
For a SQL Server-authenticated login, in SSMS, simply make a connection to the SQL
instance using SQL Server authentication. In a command prompt window, use the –U
option of SQLCMD.
Another way of checking permission is to switch execution context using EXECUTE
AS .
Example 1
Assume that the AdventureWorks company is running a Windows domain named AW
. Assume the computer host for the production SQL Server instance, SQL1 , is a
member of the AW domain.
The requirement in your permission plan is that a group of Windows users will be
responsible for creating new databases and backing up all user databases.
129
Example 2
In the AW domain, on the test SQL Server instance, SQLT1 , whose computer host is a
domain member, a group of Windows users will be responsible for creating and testing
stored procedures in the AdventureWorks database.
Answer 1
• Create a domain global group, AW\SQLMaintenance , and place the appropriate
Windows users into this group.
• Create a login on the SQLT1 production instance from the Windows group
AW\SQLMaintenance .
• Place that login into the server-level role dbcreator . This will satisfy the
requirement for the capability of creating new databases.
• Map that login to a user named SQLMaintenance in each user database.
• In each database, make the SQLMaintenance user a member of the
db_backupoperator database role.
• The last two steps will satisfy the requirement for the capability to back up all user
databases. If the same task is to be performed in many databases, consider using the
undocumented sp_MSforeachdb procedure. (See, for example:
HTTP://WEBLOGS.SQLTEAM.COM/JOEW/ARCHIVE/2008/08/27/60700.ASPX
.)
Answer 2
[Hint: CREATE PROCEDURE requires CREATE PROCEDURE permission in the database
and ALTER permission on the schema in which the procedure is being created.]
• Create a domain global group, AW\SQLDevelopers , and place the appropriate
Windows users into this group.
• Create a login on the SQLT1 test instance from the Windows group
AW\SQLDevelopers .
• Map that login to a user named SQLDevelopers in the AdventureWorks database.
• On the AdventureWorks database, grant CREATE PROCEDURE permission to
SQLDevelopers .
130
If a member of sysadmin , mapped to the dbo user in the database, creates a schema, that schema is, by
default, owned by the dbo user. Objects created in the schema are, by default, owned by the schema owner.
When a stored procedure owned by dbo is executed, the default execution context is the dbo user. Any action
performed by the stored procedure will be allowed.
Further exercise for the reader: How would you modify the above answer if
AdventureWorks were a partially contained database and the Test server is running
SQL Server 2012?
Further reading
• SQL Server Books Online (for whatever version you are using).
• Securing SQL Server, Second Edition , Cherry, Denny (2012) Waltham, MA:
Elsevier, Inc. ISBN: 978-1-59749-947-7
• SQL Server 2012 Security Best Practice Whitepaper :
HTTP://TINYURL.COM/6VXFH67
• Context Switching (Database Engine) , and subtopics:
HTTP://TECHNET.MICROSOFT.COM/EN-
US/LIBRARY/MS188268(V=SQL.105).ASPX
• Transparent Data Encryption (TDE) :
HTTP://TECHNET.MICROSOFT.COM/EN-US/LIBRARY/BB934049.ASPX
• Encrypt a Column of Data :
HTTP://TECHNET.MICROSOFT.COM/EN-US/LIBRARY/MS179331.ASPX
• Creating logins and users from certificates for the purpose of module signing:
• SQL Server 2005: procedure signing demo , Laurentiu Cristofor, 2005:
HTTP://BLOGS.MSDN.COM/B/LCRIS/ARCHIVE/2005/06/15/429631.ASPX
• Module Signing (Database Engine) :
131
HTTP://MSDN.MICROSOFT.COM/EN-US/LIBRARY/MS345102.ASPX
• Giving Permissions through Stored Procedures ( Ownership chaining,
Certificates, and… EXECUTE AS ):Erland Summerskog, 2011:
HTTP://WWW.SOMMARSKOG.SE/GRANTPERM.HTML
• Catalog views to list permissions, principals, and role members: See Security
Catalog Views (Transact-SQL) :
HTTP://TECHNET.MICROSOFT.COM/EN-US/LIBRARY/MS178542.ASPX
132
What Changed? Auditing Solutions in
SQL Server
Colleen Morrow
Picture this: it's a beautiful morning. The sun is shining; big, fluffy clouds dot the sky.
Perhaps it's early autumn and a light breeze is rustling the newly-changing leaves.
Traffic is light on your commute, so you arrive at work early enough to swing by
Starbucks for a non-fat latte. You feel great.
Until you arrive at your desk. Help Desk personnel and the odd manager or two are
circling like a pack of jackals. The WhatsIt application is performing dog-slow and
users are complaining. On top of that, the WhosIt application is suddenly throwing an
error every time a user tries to save a new record. They all look at you and ask, “What
changed? ” (because, of course , it's the database, silly!).
If you're not auditing your databases, that can be a tough question to answer,
especially if “what changed” was that someone dropped a database object. However,
with a DDL audit in place, a quick scan of the results will allow you to explain the
situation, calmly. The WhatsIt application is slow because John dropped a critical
index, meaning that a very common query is now performing a full table scan. The
application error is due to a syntax error in a new insert trigger that Mary added to
the SuchAndSuch table in the WhosIt database. Recreate the index, fix the trigger bug,
and you're free to enjoy your latte.
Whether it's to log DDL changes, record who's logging in to a server, or track what
users are accessing sensitive data, there comes a time in just about every DBA's career
when we need to perform some sort of audit (or wish we had auditing in place). It's
important to know what options are available and the pros and cons of each, because
they are not all created equal.
SQL Server provides many ways to audit activity on your instances. In this chapter,
we'll focus on the following tools, discussing how each one works, along with their
strengths and weaknesses:
• SQL Trace – familiar, easy to implement, and available in all Editions.
• SQL Audit – robust, first-class auditing tool offering a wide range of covered
events.
• DIY auditing solutions using triggers and event notifications – when an out-of-
the-box solution won't cut it, why not develop your own?
C2 auditing
C2 is a security standard established by the National Computer Security Center
(NCSC) and intended for high-security environments. A C2 audit records every single
event that occurs in your SQL instance and logs it to an audit file in your default data
directory. It is not configurable in any way, except to turn it on or off, which requires
an instance restart. So what you have is a lot of audit information being written to your
default data directory. If that directory were to fill up, SQL Server would shut down
and would not start until it could write to the audit file again. To put it bluntly, don't
enable C2 auditing unless you absolutely need to, and if you need to, you'll know.
Just about everything of any significance that happens inside the SQL Server engine
generates an event. Logging in to the instance? That's an event. Creating a table?
Event. Even issuing a SELECT statement generates an event. Events aren't confined to
just user-related activities. Internal processes such as locking and allocating disk space
also generate events.
An event class is the definition of an event, including all of its related information. An
event is an instantiation of an event class. An analogy that works for me is that an
event class is a table and all its columns. It defines what data is stored inside. An event
would be a row inside that table. If you're more developer-minded, you might liken an
event class to a struct definition in a language like C (did I just date myself there?) and
the event itself to declaring a variable based on that struct .
Each event class has a different set of information that's relevant to that class. For
example, the Data File Auto Grow event class contains the column Filename ,
which stores the name of the data file that auto-expanded. The Object:Created
event class however, doesn't contain this column; it simply doesn't make sense for that
event. On the other hand, Object:Created does include columns like IndexID and
ObjectID , which are relevant to that event.
Listing 1: SQL Trace – event categories, events and data columns that comprise each event.
That's quite a list, to be sure, with twenty-one event categories containing 180 event
classes. For auditing purposes, our primary focus will be on those events in the
Objects and Security Audit event categories. To help make sense of it all, BOL
provides the following resources:
• A description of all the event classes:
HTTP :// MSDN.MICROSOFT.COM / EN-US / LIBRARY / MS 175481. ASPX
135
HTTP :// MSDN.MICROSOFT.COM / EN-US / LIBRARY / MS 190762. ASPX
The new Extended Events framework, introduced in SQL Server 2008, which forms
the basis for the SQL Audit tool, exposes even more events. We can query the
Extended Events metadata, stored in various Dynamic Management Views (DMVs)
for the Extended Events packages, and the 600+ events they contain (see, for example,
HTTP :// TINYURL.COM / M 6 FU 384 ). However, many of the auditing events are not
available to the Extended Events DDL, only through the SQL Audit tool (more on this
shortly).
Now that we know a little more about what we'll be auditing, let's look at some of the
tools we can use.
SQL Trace
SQL Trace, first introduced in SQL Server 6.5, is the oldest audit method still
available in SQL Server. The 6.5 version was limited in scope; we could monitor what
a client was executing or what stored procedures it was calling, but that was about it.
SQL Server 7.0 improved matters (and introduced Profiler), enabling us to gather
much more information about what was going on with regard to locking, blocking, and
activity on our database instances.
SQL Trace is now a mature and robust tool. Furthermore, as described in detail in
Chapter 4 , as long as we perform server-side traces, rather than using Profiler, and we
adopt general good practices in our trace definitions, we can capture traces with
minimal performance impact on even the most heavily loaded server.
Most DBAs use SQL Trace to monitor and troubleshoot performance issues in SQL
Server, but we can also use it as a tool for auditing activities in the database. Chapter 4
covers how to create trace definitions in Profiler, export them, and run server-side
traces, so refer there for those details. Here, I'll focus on:
• creating a simple server-side trace for DDL auditing
• using the default trace for auditing.
136
columns that don't apply to our selected events, such as ColumnPermissions and
FileName , appear in gray and aren't available for selection.
To hide irrelevant events and columns uncheck the Show all events and Show all
columns check boxes.
By default, Profiler will collect 27 data columns for each of these event classes. If
there are any you don't require, it's good practice to deselect them, in order to reduce
the trace overhead. However, for simplicity here, we'll capture them all.
We want to audit DDL changes in the AdventureWorks database only, so click on the
Column Filters button, select the DatabaseName column, expand the Like group and
enter AdventureWorks . Wildcards are also acceptable here.
Having done this, click Run to start the trace, and verify that it's working as expected,
by running the DDL in Listing 2 .
USE [AdventureWorks2008R2];
GO
CREATE TABLE MyAuditTable
(
137
FirstName VARCHAR(20) ,
LastName VARCHAR(20)
);
GO
We should see that event recorded in our Profiler session. We see two rows for the
creation of our table, one each for the start and end of the transaction that created it. If
desired, we could add an EventSubClass = 1 filter to the trace definition to limit
the output to just the commit records.
Stop the trace. If we want to save this trace definition as a template, we can simply
select Save As | Trace Template , give it a name, such as
AdventureWorksDDLAudit , and the next time we need to create a trace, it will
appear as a custom template at the bottom of the list of available templates.
Having configured our audit trace using Profiler, we want to export the trace definition
to a file so that we can run it as a server-side trace. Navigate File | Export | Script
Trace Definition | For SQL Server 2005–2008 R2 and save the .sql file with an
appropriate name and location.
Open the .sql file in SSMS to see the trace definition, and modify it as necessary, as
shown in Listing 3 . The full script, available in the code download, contains 81 calls
to sp_trace_setevent , one for each combination of 3 events and 27 data columns.
-- Create a Queue
DECLARE @rc INT
DECLARE @TraceID INT
DECLARE @maxfilesize BIGINT
SET @maxfilesize = 5
EXEC @rc = sp_trace_create @TraceID OUTPUT, 2,
N'D:\SQL2008\Audit\AdventureWorksDDLAudit.trc', @maxfilesize, NULL
IF ( @rc != 0 )
GOTO error
-- Set the Events
DECLARE @on BIT
SET @on = 1
EXEC sp_trace_setevent @TraceID, 164, 7, @on
EXEC sp_trace_setevent @TraceID, 164, 8, @on
EXEC sp_trace_setevent @TraceID, 164, 24, @on
--<…further events ommitted for brevity…>
138
GOTO finish
error:
SELECT ErrorCode = @rc
finish:
go
Execute the script to start the server-side trace. Once started, it will continue to run
until a predefined stop time (we did not define one here), or until it is explicitly
stopped with a call to sp_trace_setstatus , or until the SQL Server instance is
stopped. Usually, we want our audit trace to run continuously and to resume on
startup. To make this happen, use the sp_procoption stored procedure (see Books
Online for more information on this procedure).
A server-side trace has only one output option, a trace (.trc ) file. Once we have the
audit trace up and running, we can simply open that trace file in Profiler to view the
output. However, for security or DDL auditing purposes, we'll want to be able to
process the output and report on it. For this, we use the fn_trace_gettable
function.
SELECT *
FROM FN_TRACE_GETTABLE('D:\SQL2008\Audit\AdventureWorksDDLAudit.trc', ?
NULL);
Go
This built-in function allows us to treat the trace file as a table, i.e . joining it with
other tables, sorting, grouping, filtering, and so on (see Chapter 4 for further details).
139
ColumnName;
GO
Listing 5: SQL Trace – event categories, events and data columns that comprise the default trace.
While certainly useful for investigating what just happened on an instance, the default
trace has some limitations that make it less than ideal for a rigorous auditing solution.
The default trace comprises five rollover files, each 20 MB in size. Unfortunately, we
cannot modify this configuration. Considering the number of events captured by the
default trace, you can imagine how quickly the oldest file might be aged out. When
auditing for later reporting purposes, you'll need to implement a procedure for saving
the trace data to a permanent repository on a regular basis in order to prevent audit
data loss (see, for example, HTTP :// TINYURL.COM / KGB 27 U 2 ).
SQL Audit
SQL Server 2008 introduced the SQL Audit feature, for the first time making Audit
objects first-class objects, meaning we can manage them via DDL statements, rather
than through stored procedures.
SQL Audit is built on the Extended Events framework, also introduced in SQL Server
2008. I can't delve into the details of Extended Events here but suffice it to say that
using Extended Events allows for synchronous or asynchronous processing of event
data with minimal performance overhead, even less than SQL Trace.
You can find further information on Extended Events in Chapter 4 of this book, and in
Books Online, which includes a query that provides a correlation of SQL Trace event
classes to Extended Events (HTTP :// TINYURL.COM / KXNZHPH ). If you run it, you'll
notice that while some of the auditing events in which we're interested, such as the
object modification events, and a few of the security audit events, are available in the
Extended Events sqlserver package, many others are “missing.” These events
migrated to the SecAudit package, which is for exclusive use by SQL Audit (see
140
HTTP :// TINYURL.COM / KSKEJWJ ).
141
In the SSMS Object Explorer (at the server level), expand the Security directory,
right-click the Audits folder and choose New Audit . Figure 4 shows the General
page, completed for our example, similar to our previous server-side trace for auditing
DDL actions on the AdventureWorks database, but more fine-grained.
142
complete until SQL Server hardens the audit records to the target. More likely to
impact user performance but guarantees no audit activity will be lost in the event
of a system failure.
• On Audit Log Failure – what should happen if SQL Server can't harden the audit
records for some reason?
• CONTINUE – SQL Server operations continue without audit records until SQL
Server is able to resume writing to the target (the default).
• SHUTDOWN – force a shutdown of the server.
• FAIL_OPERATION – SQL Server 2012 and later only. Fail the audited operation,
ensuring no audited events go unrecorded, but without the extreme measure of
taking the entire instance off line.
• Audit destination – the target for the audit records.
• File – a user-designated file, ideally in a central directory (encrypted and restricted
access) on a network available to all systems you wish to audit. This allows us to
query and process all audit files at once.
• Security log – the Windows Security log. High security; writing to the Security
log is a privileged operation and requires “Generate Security Audit” rights on the
server.
• Application log – the server's central Windows Application Log.
• Target file properties (relevant only if FILE is selected for Audit destination).
• File path – we specify a path, but SQL Server will automatically generate a file
name using the audit name and GUID, a sequential number and a time-stamp.
• Audit File maximum limit – maximum number of rollover files to create in
response to current file reaching maximum size. Keep in mind that multiple
smaller files are easier to manage than a single large file.
• Maximum file size – maximum size for the file.
In SQL Server 2012, there is also a Filter page, where we can add a filter to the Audit
object, in the form of a predicate. The type of filter we want to create might depend on
whether we intend to use a database- or server-level audit specification. In the former
case, we might want to limit the audit to just the Customers table, in which case we
could add an object_name='Customers' predicate to the Audit object definition.
For a server-level audit, we might want to add a predicate to limit it to specific
databases, or exclude certain databases. Our audit specification, later, will ensure we
only audit events on the required database, so there's no need to define a filter here, in
this case.
We can now select Script | New Query Edit Window to view the underlying
definition of our Audit object.
143
USE [master]
GO
Either run this script, or click OK in the Create Audit window, to create the Audit
object.
It's not always evident which option will record the event, or action, you wish to audit.
For example, when I first started working with SQL Audit, I wanted to audit DDL
events, so I chose the DATABASE_OBJECT_CHANGE_GROUP . It made sense to me.
After all, I was auditing changes to database objects, right? Not exactly. When I tested
my audit by creating and dropping a table, what I saw in my audit log was not a table
being created and dropped, but rather a schema being modified. What I really wanted
to audit was the SCHEMA_OBJECT_CHANGE_GROUP .
Fortunately, the sys.dm_audit_actions view contains a row for every auditable
action (name ) and the group to which it belongs (containing_group_name ). We
can combine that with sys.dm_audit_class_type_map to see the parent group for
144
any action on any object type.
Listing 7: SQL Audit - listing the parent group for all actions on all object types.
If I'd known about these views when I started, I would have seen that the “CREATE”
action of a “TABLE” object falls in the SCHEMA_OBJECT_CHANGE_GROUP . By adding
a simple WHERE clause to this query we can see every action associated with this
group.
You will find that for some action types the other columns (Object Class , Object
Schema , and so on) are sometimes configurable and sometimes not. In general, if
we're specifying a group as an Audit Action Type , we won't be able to narrow it
down to specific objects. In this example, we can't audit events in the
SCHEMA_OBJECT_CHANGE_GROUP for a particular object within the database, such as a
table. If we script out our audit specification, it looks as shown in Listing 8 .
USE [AdventureWorks2012]
GO
Create the audit specification by running this script, or from the dialog. This
specification will mean that we'll audit all actions on all objects in the
AdventureWorks database. This is where a filter on the Audit object comes in
handy, as discussed earlier, if we wish to limit auditing to particular objects.
If we specify individual actions, like UPDATE or SELECT in the Audit Action Type ,
then we can refine our audit specification to, for example, track only SELECT
statements issued on the Employees table by any user.
145
Figure 6: SQL Audit – creating a database audit specification for a specific object.
Listing 9 shows the corresponding script (don't try to create this additional
specification; remember there can be only one database audit specification per
database for a given Audit object).
USE [AdventureWorks2012]
GO
Listing 9: SQL Audit – creating a database audit specification for a specific object.
Having created both our Audit object and a database audit specification we can
enable them by right-clicking on each one in SSMS Object Explorer, and selecting the
Enable option. To test the audit, simply recreate our MyAuditTable (Listing 2 ) n the
AdventureWorks2012 database.
Scroll right and you'll see that the audit data includes a lot more detail than SQL Trace,
including the text of the SQL statement executed.
To view more than 1,000 records, or to carry out more sophisticated operations on the
audit data, use T-SQL and the fn_get_audit_file function. This built-in function
accepts three parameters: the file pattern, the path and file name of the first audit file to
read, and the starting offset within that initial file.
Listing 10 will return the contents of all audit files for the AdventureWorks_Audit .
SELECT *
FROM fn_get_audit_file('D:\AuditLogs\AdventureWorks_Audit*', DEFAULT,
DEFAULT)
Listing 10: SQL Audit – viewing the audit output using fn_get_audit_file .
146
Listing 11 will return the contents of all audit files for the AdventureWorks_Audit
starting with a certain file and offset.
SELECT *
FROM fn_get_audit_file('D:\AuditLogs\AdventureWorks_Audit*',
'D:\AuditLogs\AdventureWorks_Audit_C3D7D531-0A9C-40FBB07B-
E315BD72174D_0_129745881197320000.sqlaudit',
1024)
Listing 11: SQL Audit – viewing audit output beginning at a specific location within a file.
This option is handy if you've already processed some records and just want to pick up
where you left off.
The bottom line is that, if you're running Standard Edition, you'll need to find another
way to audit DDL.
If you do work with a SQL Server edition that supports the full SQL Audit feature,
then it offers some great benefits. By making audit objects first-class objects, SQL
Audit offers a very holistic approach to auditing that really hasn't been available until
now. The solution is less susceptible to tampering. The audit output records all
changes to the audit definition. We can send audit output to the Windows Security log
or to a directory with very restricted access. There's also the option to shut down the
instance in the event that SQL Server can't write the audit output.
However, there are also limitations even with the full SQL Audit feature. In SQL
Server 2008, we have the ability to include specific objects in some audit
specifications, but it's difficult to exclude objects or events. An example of this would
be the SCHEMA_OBJECT_CHANGE_GROUP . It will certainly record all CREATE , ALTER
, and DROP statements we issue on database objects, but it will also record UPDATE
STATISTICS statements, which you may not want to include in your audit. The
problem is that there is no choice in the matter. Nor can we choose to audit DDL on a
specific object or in a specific schema. Certainly, we can filter out unwanted events
when reading the audit file, but this can create a lot of “noise” cluttering the audit files.
SQL Server 2012 improved this situation somewhat with the introduction of the filter
functionality, but even that isn't perfect. If we only have a handful of exclusions, we
can list them out easily enough, but this soon becomes cumbersome and difficult to
maintain. It would help to be able to store filter criteria in a table and refer to that table
in the filter predicate. Unfortunately, that's not an option at present.
147
Develop Your Own Audit: Event Notifications or
Triggers
Looking to get back in touch with your inner developer? Are the built-in auditing
methods offered by SQL Server not meeting your needs? Think you can do better?
You can always code your own audit using event notifications or triggers. If you're
willing to do a little more work up front, each of these methods offers some
advantages over traditional solutions.
148
SQL Server instance, we don't have to create a separate event notification for each
one.
The second decision to make is what to audit. Event notifications respond to a number
of events. We can track individual events, such as CREATE_TABLE or
ALTER_PROCEDURE , or we can track predefined groups, such as
DDL_DATABASE_LEVEL_EVENTS , which will capture any database-level DDL event.
The sys.event_notification_event_types system catalog view offers a
complete listing of available events, or we can refer to Books Online (HTTP ://
TINYURL.COM /2 CV 27 CG ).
USE Audit
GO
ObjectType VARCHAR(100),
Success INT,
FullSQL varchar(max),
FullLog XML,
Archived BIT NOT NULL
)
GO
149
ALTER TABLE dbo.auditLog ADD CONSTRAINT
DF_auditLog_Archived DEFAULT 0 FOR Archived
GO
BEGIN TRY
--Continuous loop
WHILE ( 1 = 1 )
BEGIN
BEGIN TRANSACTION;
--Retrieve the next message from the queue
SET @dialogue = NULL;
WAITFOR (
GET CONVERSATION GROUP @dialogue FROM dbo.auditQueue
), TIMEOUT 2000;
IF @dialogue IS NULL
BEGIN
ROLLBACK;
BREAK;
END;
RECEIVE TOP(1)
@messageName=message_type_name,
@message=message_body,
@dialogue = conversation_handle
FROM dbo.auditQueue
WHERE conversation_group_id = @dialogue;
IF ( @message.value('(/EVENT_INSTANCE/EventType)[1]',
'VARCHAR(100)') )
NOT LIKE '%STATISTICS%'
AND (
@message.value('(/EVENT_INSTANCE/TSQLCommand)[1]',
'VARCHAR(2000)') )
NOT LIKE 'ALTER INDEX%REBUILD%'
AND (
@message.value('(/EVENT_INSTANCE/TSQLCommand)[1]',
'VARCHAR(2000)') )
NOT LIKE 'ALTER INDEX%REORGANIZE%'
BEGIN
INSERT INTO auditLog
( SQLInstance ,
DatabaseName ,
EventTime ,
EventType ,
LoginName ,
DatabaseUser ,
ClientHostName ,
NTUserName ,
NTDomainName ,
SchemaName ,
ObjectName ,
ObjectType ,
Success ,
FullSQL ,
FullLog
)
VALUES (
ISNULL(@message.value('(/EVENT_INSTANCE/SQLInstance)[1]',
'VARCHAR(100)'),
@@SERVERNAME) ,
150
ISNULL(@message.value('(/EVENT_INSTANCE/DatabaseName)[1]',
'VARCHAR(100)'),
'SERVER') ,
@message.value('(/EVENT_INSTANCE/PostTime)[1]',
'DATETIME') ,
@message.value('(/EVENT_INSTANCE/EventType)[1]',
'VARCHAR(100)') ,
@message.value('(/EVENT_INSTANCE/LoginName)[1]',
'VARCHAR(100)') ,
@message.value('(/EVENT_INSTANCE/UserName)[1]',
'VARCHAR(100)') ,
@message.value('(/EVENT_INSTANCE/HostName)[1]',
'VARCHAR(100)') ,
@message.value('(/EVENT_INSTANCE/NTUserName)[1]',
'VARCHAR(100)') ,
@message.value('(/EVENT_INSTANCE/NTDomainName)[1]',
'VARCHAR(100)') ,
@message.value('(/EVENT_INSTANCE/SchemaName)[1]',
'VARCHAR(100)') ,
@message.value('(/EVENT_INSTANCE/ObjectName)[1]',
'VARCHAR(50)') ,
@message.value('(/EVENT_INSTANCE/ObjectType)[1]',
'VARCHAR(50)') ,
@message.value('(/EVENT_INSTANCE/Success)[1]',
'INTEGER') ,
@message.value('(/EVENT_INSTANCE/TSQLCommand)[1]',
'VARCHAR(max)') ,
@message
);
END
COMMIT;
END
END TRY
BEGIN CATCH
DECLARE @errorNumber INT ,
@errorMessage NVARCHAR(MAX) ,
@errorState INT ,
@errorSeverity INT ,
@errorLine INT ,
@errorProcedure NVARCHAR(128)
SET @errorNumber = ERROR_NUMBER();
SET @errorMessage = ERROR_MESSAGE();
SET @errorState = ERROR_STATE();
SET @errorSeverity = ERROR_SEVERITY();
SET @errorLine = ERROR_LINE();
SET @errorProcedure = ERROR_PROCEDURE();
IF NOT ( XACT_STATE() = 0 )
ROLLBACK;
RAISERROR('%s:%d %s (%d)',
@errorSeverity,@errorState,@errorProcedure,@errorLine,
@errorMessage,@errorNumber) WITH log;
END CATCH
END
GO
Listing 12: Event notifications – creating the Service Broker service program.
Once we have the service program in place, we can create the rest of the Service
Broker objects. For a detailed description of these objects and the syntax used to create
them, please refer to Books Online.
USE Audit
GO
151
--CREATE QUEUE
CREATE QUEUE auditQueue
WITH ACTIVATION (
STATUS = ON,
PROCEDURE_NAME = audit.dbo.auditQueueReceive_usp ,
MAX_QUEUE_READERS = 2, EXECUTE AS SELF)
GO
--CREATE SERVICE
CREATE SERVICE auditService
ON QUEUE [auditQueue]
([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification ])
GO
--CREATE ROUTE
CREATE ROUTE auditRoute
WITH SERVICE_NAME = 'auditService',
ADDRESS = 'Local'
GO
Listing 13: Event notifications – creating the Service Broker queue, service, and route.
Finally, we can create the event notification itself. Here, we're creating an event
notification at the server level (so across all databases), tracking all database-level
DDL events, and we're sending that information to the auditService service we just
created.
We've attached our service program to the auditQueue , so that it will process
messages as they arrive. Alternatively, by not attaching the procedure to the queue and
simply executing it manually or via a job, we could process messages in batches, at
scheduled intervals. This is sometimes preferred to save processing for a period when
the server isn't as busy.
Before we move on, take a last look at the CREATE ROUTE statement above and notice
the ADDRESS clause. In this example, we're specifying Local , meaning the target
service is located on the local instance. We could also send messages to a remote
service by specifying the DNS name or IP address of that server here. By doing this,
we can potentially log audit information from many servers to a central audit database.
Let's see our event notification in action. Run the statements in Listing 15 to produce
some audit output.
USE AdventureWorks2012
GO
152
VALUES ( 'Noelle' ),
( 'Tanner' ),
( 'Colin' ),
( 'Parker' ),
( 'Conner' )
GO
SELECT name
FROM EventNotificationTest
WHERE name = 'Conner';
GO
Now, we can view the audit data by reading from the auditLog table.
USE Audit
GO
SELECT EventTime ,
DatabaseName ,
EventType ,
LoginName ,
SchemaName ,
ObjectName ,
ObjectType ,
FullSQL
FROM dbo.auditLog;
GO
You should see three rows, one recording the creation of the table, one for the index
creation, and one for dropping the table. We don't see a record for the UPDATE
STATISTICS command, nor for the statistics that SQL Server automatically created
when we ran the SELECT statement, because we filtered out STATISTICS commands
in our service program.
153
custom procedural code. Unlike event notifications, however, triggers respond to a
more limited range of events and they execute synchronously, meaning that DDL or
Logon triggers execute within the scope of the firing transaction, and could affect the
performance of that transaction. Of course, we can turn this potential drawback into a
benefit, if the intention is to prevent unwanted activities, like preventing a user from
dropping a table.
Oftentimes, we use generic application logins in our databases, logins that a number of
individuals might share. These shared logins make it more difficult to track DDL
changes, or other audited activities, back to a real person. However, synchronous
execution has another advantage: it captures the client information of the firing
connection, so we can see the client host name or network login of the person who
performed the activity. That's something that SQL Audit and event notifications can't
provide.
USE AdventureWorks2012;
GO
154
VALUES ( @@SERVERNAME ,
@data.value('(/EVENT_INSTANCE/DatabaseName)[1]', 'varchar(256)') ,
@WorkStation ,
@data.value('(/EVENT_INSTANCE/EventType)[1]', 'varchar(50)') ,
@data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'varchar(256)') ,
@data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'varchar(256)') ,
@data.value('(/EVENT_INSTANCE/ObjectType)[1]', 'varchar(25)') ,
@data.value('(/EVENT_INSTANCE/TSQLCommand)[1]', 'varchar(max)') ,
@data.value('(/EVENT_INSTANCE/LoginName)[1]', 'varchar(256)')
)
GO
Listing 17: DDL triggers – creating a DDL trigger in the AdventureWorks2012 database.
USE AdventureWorks2012;
GO
Viewing the audit output is as easy as querying the changeLog table (see Listing 19 ).
Although, in this example, we're logging our audit data to a local database, we could
also send this information to a remote database by using Service Broker. People often
forget this possibility in the context of triggers, because triggers don't go hand-in-hand
with Service Broker in the way that event notifications do.
USE Audit;
GO
SELECT EventDate ,
DatabaseName ,
EventType ,
Workstation ,
LoginName ,
ObjectSchema ,
ObjectName ,
ObjectType ,
SQLCommand
FROM ChangeLog
GO
155
Ultimately, any self-made auditing solution is only as good as your coding abilities.
However, as long as you're comfortable writing your own T-SQL procedures, then
event notifications and DDL/Logon triggers offer a great alternative or supplement to
conventional auditing methods. It's easy to filter out unwanted events before they
make it to our target destination, and we have the option of writing to a database table,
rather than having to deal with external audit or trace files.
It's not all rainbows and unicorns, however. If the intention is to audit user access on a
particular table, we can't do that using either of these methods. They also won't record
failed events, in the manner of SQL Audit. Also, a word of warning: a busy system
generating many events, combined with an inefficient service program, can quickly
overwhelm a Service Broker queue. A poorly designed trigger can directly affect
users. Unless you're prepared to take ownership of the entire auditing process, perhaps
one of the other audit methods would be a better choice.
Third-party Solutions
If your boss is willing to open the purse strings, there are a number of third-party
auditing solutions on the market. The best ones require little or no change to your
existing SQL Server environment, and have minimal impact on overall performance.
Idera's SQL Compliance Manager, for example, uses a lightweight agent to capture
SQL Trace events. Other options are IBM Guardium and Imperva SecureSphere.
Third-party auditing solutions offer some distinct advantages over homegrown audits.
The biggest of these is scalability. If you have to audit many SQL Server instances,
then third-party products generally make it very easy to deploy and manage multiple
audits from a central location. You're also off the hook when it comes to supporting the
audit code. And keep in mind that, whereas you as a DBA probably have a number of
responsibilities, the folks that produce this software have one: auditing. They know it
inside and out, and are much more likely to produce a thorough and efficient solution.
Just know that licensing such software can quickly become an expensive proposition.
Conclusion
So there you have it: SQL Trace, SQL Audit, event notifications, and triggers; four
viable options for auditing many different activities in your SQL Server database.
Each offers its own unique set of benefits and comes with its own limitations. Which
one you choose will depend on what actions you want to audit, what data you want to
capture, and how you want to handle the output. Obviously, personal preference will
be an important factor, too. And don't think your options end here, either. Though I
don't go into it in this chapter, with the increased exposure of Extended Events in SQL
Server 2012, we get yet another opportunity to develop our own basic auditing
solutions. Indeed, with so many different options available, there's really no excuse for
not knowing who dropped that index .
156
SQL Injection: How It Works and How
to Thwart it
Kevin Feasel
Imagine waking up one morning and, while surfing the Internet at breakfast, seeing
news articles describing how a hacker stole your company's proprietary data and put it
up on Pastebin. Millions of customer records, perhaps Protected Health Information,
credit card numbers, or even names of sources or undercover agents, all out there for
the world to see. This may be roughly the time when you crawl back into bed and start
planning that long-dreamed-of trip to the Australian outback to start a new life as a
nomad.
Before you reach the point where you are trying to explain to an interviewer the
circumstances behind your sudden departure from your previous employment, perhaps
I should divulge the single most pernicious résumé-updating experience (security flaw
division) out there: SQL injection.
In September of 2011, Imperva's Hacker Intelligence Initiative released a report
stating that SQL injection was responsible for 83% of successful data breaches from
2005 through to the report's release (HTTP :// TINYURL.COM / PY 8 LTLF ). It should be
no surprise that the Open Web Application Security Project (OWASP) rated injection
attacks, and especially SQL injection, as the number one threat vector in 2010 (HTTP ://
TINYURL.COM / PZKRLDS ).
In every instance of SQL injection, the flaw is the same: an attacker injects SQL code
in a manner the application's developers did not anticipate, allowing the attacker to
perform unexpected and unauthorized actions. For a very simplistic example of this,
imagine a basic web page with a search box on it. The web page developer expects
users to search for a product and look up rows in a table based upon the contents of
that search box.
If the developer is not careful, the user may be able to craft a search string that returns
all products – or something much worse, like returning a list of all of the tables in the
database, or a list of patients' medical records in another database.
Now that we have a common understanding of the gravity of the problem, as well as a
basic definition of SQL injection, the rest of this chapter will go into further detail on
how to perform SQL injection attacks, how to defend against them, and how to keep
your Chief Information Security Officer from appearing on the nightly news.
157
If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know
yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the
enemy nor yourself, you will succumb in every battle .
Sun Tzu, The Art of War (HTTP :// GUTENBERG.ORG / CACHE / EPUB /132/ PG 132. HTML )
You cannot truly defend against a threat you do not understand, so the first step in
defending against a SQL injection attack is to understand precisely how one works.
Before I begin, some standard provisos. Firstly, don't perform a SQL injection attack
on any application without express, written permission. Second, don't put known
unsafe code on a production machine. Finally, don't put your testing code in a publicly
accessible location, as some bad person somewhere will probably find it eventually.
With that out of the way, the first example will be a very simple web page on top of
the ubiquitous AdventureWorks database, specifically an ASP.NET web forms
application talking to a SQL Server 2008 instance, but SQL injection is relevant across
all application and database platform combinations.
Imagine a very basic grid showing a list of product subcategories from the
Production.ProductSubcategory table in AdventureWorks . In addition to this
list, we have a name filter in which a user can type a partial, with a grid that displays
matching items. The SQL for such a query could look as shown in Listing 1 .
The query returns rows of products that include “Bike” in their name. This is the
expected behavior, and all is well. To simulate a SQL injection attack, we can try
changing the filter value from 'Bike' to 'Bike'' OR 1=1--' . Our goal as
attackers is to get “outside” the bounds of the parameter, at which point we can
manipulate the query itself or run some completely different SQL statement. In this
particular case, our goal is to extend the search surreptitiously, to return rows where
the Name is like “%Bike” or all of the results (because 1 equals 1 is always true). We
then comment out the rest of the query to prevent any syntax errors.
158
Running this first attempt at an attack will show no results, meaning our attack failed.
The reason is that we were not able to get “outside” of the parameter, so instead of
searching for “%Bike” or where 1=1 , we are actually searching for a product
subcategory whose name is like “Bike' OR 1=1--” and naturally, there are no
product subcategories which match that name.
What we did was attempt to perform SQL injection against a static SQL query, which
simply is not possible. SQL injection is only possible against dynamic SQL, either
through an ad hoc statement put together by an application, which communicates with
SQL Server, or through SQL Server's built-in dynamic SQL capabilities. Listing 2
constructs a basic dynamic SQL query that returns the same results as Listing 1 , when
used as intended.
However, try the same attack (substituting in SET @Filter = 'Bike'' OR 1=1--
'; ), and we see very different results. The query returns all of the subcategories,
including entries such as Handlebars and Brakes . This is certainly not something
that the procedure writer expected, and can have considerable negative ramifications.
For example, if we change the filter as shown on Listing 3 , we can see all of the table
schemas and names.
From there, we can perform reconnaissance on various tables and even do entirely
unexpected things like inserting our own product subcategories or, even worse,
dropping tables. In Listing 4 , we take advantage of our reconnaissance work to insert
a new product subcategory. We know that a product subcategory has four non-nullable,
non-identity attributes: ProductCategoryID (of type integer ), Name (a varchar
), rowguid (a uniqueidentifier ), and ModifiedDate (a datetime ). The code
in Listing 4 fills in all four columns with appropriate values, so that our malicious
159
insert statement succeeds.
If the account running the query has elevated privileges, it could possibly have access
to other databases, allowing an attacker to collect information far more valuable than
product subcategories.
Attacking Websites
In the previous section, we envisioned a basic site. In this section, we will build the
site and attack it directly, applying what we learned in Management Studio. The full
code for this project is available in .zip format, as part of the code download for this
book.
Our website will be a rather simple ASP.NET web forms application. Listing 5 shows
the base HTML.
<div>
<asp:TextBox ID="txtSearch" runat="server" Text="Enter some text here." />
<asp:Button ID="btnClickMe" runat="server" Text="Click Me" OnClick="btnClickMe_
Click" />
</div>
<br />
<div>
You searched for: <asp:Label ID="lblSearchString" runat="server" />
</div>
<br />
<div>
<asp:GridView ID="gvGrid" runat="server" AutoGenerateColumns="true">
<Columns>
<asp:TemplateField>
<HeaderTemplate>Name</HeaderTemplate>
<ItemTemplate><%# Eval("Name") %></ItemTemplate>
</asp:TemplateField>
</Columns>
</asp:GridView>
</div>
Create a project in Visual Studio, and copy and paste this into a new page. Then, go to
the code-behind. This code-behind is simplistic with a number of major errors, so
please do not think of this as production-worthy code; neither should any of this be run
in a production environment. With that said, Listing 6 provides sample C# code to
query SQL Server based on an input search string.
160
if (!IsPostBack)
{
//Using .NET 4.0 here. If you want to use 2.0/3.5, change to
//IsNullOrEmpty.
if (!String.IsNullOrWhiteSpace(Request.QueryString["search"]))
LoadData(Request.QueryString["search"]);
}
}
Using this code, we now have a functional website we can use for SQL injection
attacks. This website's flaws offer us two basic means for injection: through the
querystring , or through a text-box and button. Both of these eventually call the
LoadData function in the code-behind, which runs a SQL query to pull back a list of
product subcategories, given a filter. Fire up a debug session in Visual Studio and start
attacking the site through either mechanism.
Before performing an attack, it is typically a good idea to understand normal behavior.
Type “bike” into the text box and click the button. You should see a grid with a list of
five bike-related subcategories appear, as per Listing 1 , previously. Unfortunately,
with just a few more characters, we can begin to perform unexpected actions. Now
search for the following:
161
'bike' OR 1=1-- '
After entering this code snippet into our text-box, we can see the entire list of product
subcategories. From there, the world, or at least the database, is our oyster. We can use
this page to perform reconnaissance on the SQL Server instance, searching for
databases, tables, and even data. For example, the following query will show all of the
databases on the instance, as well as their IDs and SQL Server compatibility level:
The latter two pieces of information are not necessarily important for an attack, but we
needed something to ensure that the second half of our UNION ALL statement has a
column structure that is compatible with the first half. This particular column structure
includes one varchar and two integer fields, and so we follow along to match the
schema and prevent an error from being returned.
Aside from running select statements that match the “expected” query's schema, we
can perform other types of queries, simply by typing them into the available text box.
For example, it would be possible to enable xp_cmdshell , get the server to connect
via FTP to another server and download malicious software, and then use
xp_cmdshell again to install this malicious software, thereby giving an attacker full
control over a database server.
Alternatively, we could use xp_cmdshell to open a web browser and navigate to a
specific page that attempts to install malicious software on the server using JavaScript
or an ActiveX component. Both of these attack mechanisms would have the same
effect in terms of potentially compromising a database server. What follows is an
example, showing the first step of this attack, turning on xp_cmdshell (in case it is
not already enabled).
Defending Websites
Now that we have an idea of how to attack a website, we can start to formulate ideas
for how to defend one. We'll consider three:
• Blacklists , and why they cause problems and don't work.
• Whitelists , and why they are essential in some cases, but don't work in others.
• Parameterized Queries , the best and most effective form of protection from SQL
162
Injection.
Blacklists
The first line of defense that many developers come up with is a blacklist: we know
that keywords like “select,” “insert,” and “drop” are necessary to perform a SQL
injection attack, so if we just ban those keywords, everything should be fine, right?
Alas, life is not so simple; this leads to a number of problems with blacklists in
general, as well as in this particular case.
The second-biggest problem with blacklists is that they could block people from
performing legitimate requests. For example, a user at a paint company's website may
wish to search for “drop cloths,” so a naïve blacklist, outlawing use of the word “drop”
in a search would lead to false positives.
The biggest problem is that, unless extreme care is taken, the blacklist will still let
through malicious code. One of the big failures with SQL injection blacklists is that
there are a number of different white-space characters: hex 0x20 (space), 0x09 (tab),
0x0A, 0x0B, 0x0C, 0x0D, and 0xA0 are all legitimate white-space as far as a SQL
Server query is concerned. If the blacklist is looking for “drop table,” it is looking for
the word “drop,” followed by a 0x20 character, followed by the word “table.” If we
replace the 0x20 with a 0x09, it sails right by the blacklist.
Even if we do guard against these combinations, there is another avenue of attack.
163
parameters into a list, ignore all but the first instance of a parameter, ignore all but the
last instance of a parameter, display an error on the web page, or take any of a number
of other actions. ASP.NET puts the different terms into a comma-delimited string.
Thus, if we change the above URL to: http://[website].com/SomePage.aspx?
SearchTerm=Elvis&SearchTerm= Evil&MemorabiliaType=Clock
Request.QueryString[“SearchTerm”] would return “Elvis,Evil” instead of
just “Elvis” or just “Evil” .
Armed with this knowledge, an attacker can use HTTP Parameter Pollution in a SQL
injection attack to get around basic blacklist filtering. For example, suppose that the
website does in fact filter out the word “drop” followed by any of the seven white-
space characters, followed by the word “table.” In that case, we could still perform a
query-string-based attack and drop a table by putting /* and */ around our HTTP
query parameters to comment out the commas.
This leaves us with a valid SQL statement:
SomePage.aspx?SearchTerm= drop /*&SearchTerm= */table dbo.Users;--
There is no white space after the “drop” keyword, or before the “table” keyword,
so this attack gets right around the blacklist rule described above; best of all, because
HTTP Parameter Pollution is not well known among developers, they probably have
not even thought of this particular behavior, leaving the website exposed while
creating a false sense of security.
This returns a long hexadecimal string. Copy and paste that string in Listing 8 and you
have an attack.
DECLARE @i VARCHAR(8000);
SET @i = CAST([long hex string] AS VARCHAR(8000));
EXEC(@i);
164
This attack, combined with HTTP Parameter Pollution, means that attackers can
bypass more blacklists. In this particular case, the result would not be interesting,
because this query would run separately from the SQL query that .NET runs, so the
results would not be in the same data set and would not appear on our grid. With that
said, however, there have been several attacks that used the execution of binary
representations of SQL queries to update all textual columns on databases, inserting
code into each row that tries to open malicious JavaScript on rogue servers.
Whitelists
Blacklists are an untenable option, so the next thought might be to switch to whitelists.
A whitelist is the opposite of a blacklist: instead of expressly prohibiting specified
elements, and allowing anything not on the list, a whitelist allows specified elements
and prohibits everything else.
In some cases, whitelists are essential. For example, with a radio button list, the server
side should check whether the form value returned is valid. If the only possible values
are “0,” “1,” and “2,” but the value “monkey” is received back, somebody has, well,
monkeyed with the form; from there, the server could either use a default value or
throw an exception. Also, if the structure of a text field is known, such as a Social
Security number, credit card number, amount of money, or date, a whitelist can accept
certain patterns, rejecting all others. This helps detect data entry errors and also
protects against SQL injection in these particular form fields, at least until “'; drop
table Users-- ” becomes a valid Social Security Number.
This kind of whitelist breaks down for general searches, though. With free-form text
fields, there is no necessary pattern, and so we cannot create a regular expression
against which to check the data. Thus, a whitelist is not a valid option in all
circumstances.
Parameterized queries
Forget about blacklists, and move whitelisting into the world of data validation rather
than SQL injection prevention. There is only one effective method for preventing SQL
injection through an application: query parameterization.
In our sample code, we used a SqlCommand object to execute a SQL statement and
return the results in a SqlDataReader . This worked but, as we have subsequently
learned, it is not a safe way of doing things. The reason is that the Filter variable
was not parameterized correctly. Instead, our data access code concatenated various
strings, including the Filter parameter, to create one SQL query. The correct
alternative to developer-led concatenation is to build a SqlParameter , put the
contents of the Filter variable into the parameter, and pass that off to SQL Server as
a contained parameter rather than simply being part of the text. This change is
relatively simple, making our data access code look as shown in Listing 9 .
165
using (SqlConnection conn = new SqlConnection("server=localhost;database=AdventureW
orks;
trusted_connection=yes"))
{
string sql = String.Empty;
sql = "select Name, ProductSubcategoryID, ProductCategoryID
from Production.ProductSubcategory
where Name like '%@Filter%' order by ProductSubcategoryID;";
using (SqlCommand cmd = new SqlCommand(sql, conn))
{
//create a parameter for @Filter
SqlParameter filter = new SqlParameter();
filter.ParameterName = "@Filter";
filter.Value = Filter;
//attach our parameter to the SqlCommand
cmd.Parameters.Add(filter);
cmd.CommandTimeout = 30;
conn.Open();
SqlDataReader dr = cmd.ExecuteReader(CommandBehavior.CloseConnection);
gvGrid.DataSource = dr;
gvGrid.DataBind();
gvGrid.Visible = true;
}
//Continue along with our code
}
With this minor change in data access code, we eliminate the possibility of a SQL
injection attack on this particular query. We can try any combination of characters or
attack techniques but it will be to no avail: we are unable to get “outside” the
parameter.
The way this works is that, when we have a SqlCommand object with associated
SqlParameter objects, the query sent to SQL Server actually changes somewhat.
Open up an instance of SQL Server Profiler and start a new trace using the T-SQL
template, and then return to the sample web page. This web page simply creates a SQL
string and does not parameterize the query. Using our classic “bike' or 1=1--”
SQL injection code, we can see the end result as follows:
This is just as we would expect; the query runs, and the user's unexpected SQL code
changes its basic structure. In comparison, on a web page that parameterizes the query
correctly, there is quite a different result.
166
of the parameter, has been doubled up, thus making it safe. There is absolutely no way
to perform a SQL injection attack here; each apostrophe is doubled up and the query
will remain safe. Despite that, here are a few more words of wisdom for added
protection and perhaps even better system performance.
Be sure to match field sizes to the size of the character data types, wherever possible.
For example, suppose there is a column on a table defined as a varchar (10). In that
case, the text box should only allow 10 characters of text. Even if the query were still
susceptible to SQL injection, there are only a limited number of attacks possible with
just 10 characters. You should also use matching field sizes in SqlParameter objects,
setting a fixed size. As seen above, because we did not use a fixed size for the filter,
the SqlParameter object's size was set to the length of the string: 14 characters. If
somebody else types in 15 characters-worth of text, this creates a new execution plan
and both end up in the plan cache. This is potentially a waste of precious plan cache
space so, by having one specific size, SQL Server generates one plan in the cache,
saving room for the plans of other queries.
All of the attack attempts shown in Listing 11 will fail when run in Management
Studio.
167
There is no way to “break out” of the parameter and so our query is safe from SQL
injection. This is not a trivial result; it means that we can replace ad hoc SQL in our
data layer with calls to stored procedures. This does not mean that ad hoc SQL is
necessarily unsafe. The parameterized query in Listing 9 uses ad hoc SQL, and is
immune from SQL injection. For this reason, I would not use the threat of SQL
injection as a core reason for supporting stored procedures over ad hoc SQL in a .NET
environment.
However, stored procedures do force you to use parameterization, whereas a developer
might accidentally forget to parameterize an ad hoc SQL query. Unfortunately, there
are ways to abuse stored procedures, such as using a .NET 1.1 SqlDataAdapter .
This is an old and terrible method for getting data, and yet there is sample code out on
the Internet that still uses it. Listing 12 shows an example of a dataset populated by a
SqlDataAdapter .
sda.Fill(ds);
gvGrid.DataSource = ds;
gvGrid.DataBind();
}
Code like that is just as susceptible to SQL injection as ad hoc SQL, because this
stored procedure is essentially a vessel for executing ad hoc SQL. In fact, it is actually
worse, because at least with legitimate ad hoc SQL, we can still parameterize the
168
queries in .NET and turn them into good dynamic SQL calls, whereas with this
procedure, we do not even get that benefit.
The stored procedure in Listing 13 is, admittedly, ridiculous, but more complicated
stored procedures often make the same basic mistake. Take, for example, a stored
procedure that retrieves a set based on some partial search criteria, or a procedure
which takes as input a list of values rather than a single item, and returns all elements
in that list. Listing 14 shows an example of the former.
In general, attacking dynamic SQL is the same as attacking ad hoc SQL through a
website: escape out of the current query, perform the attack, and comment out the rest
of the expected query. We can perform a similar attack on this procedure through SQL
Server Management Studio (or any other client that connects to the database), except
this time, we have to double up the apostrophe used to escape out of the parameter, as
shown in Listing 15 .
--try it out
169
EXEC Production.NotSecure N'''Mountain Bikes'', ''Road Bikes''';
Listing 16: Bad list search using dynamic SQL and sp_executesql .
Unlike the correctly parameterized queries that ASP.NET generated before, this
particular way of using sp_executesql is not safe. The reason is that, even though
we pass @NameList as a query parameter, we actually make use of @NameList before
sp_executesql has a chance to run. The sp_executesql procedure will not throw
an exception or generate any type of error if a parameter is listed that is not actually
used in the SQL query, meaning that developers and DBAs must remain vigilant when
it comes to using sp_executesql correctly. The consequences are potentially
devastating.
In this case, we can pass in a specially crafted string and get around sp_executesql
's built-in SQL injection protection mechanism. On the web form, we would need to
double up the apostrophe after “MountainBikes” and add a closed parenthesis before
using our “OR 1=1” statement. Getting the correct syntax may take a few tries for an
attacker who does not have access to source code, but it is fundamentally no more
difficult than the basic examples we have covered already. Once the attacker has the
correct syntax, the results are the same, sp_executesql or no sp_executesql .
The fact that we were able to exploit an SQL injection vulnerability in a dynamic SQL
statement that uses sp_executesql , however, is certainly not the fault of the system
stored procedure; rather, it is the fault of the developer who did not use
sp_executesql correctly in the first place. The sp_executesql system stored
procedure translates parameters and makes them safe for use, but we did not actually
pass any parameters into the command; instead, we translated the parameters first and
passed in a basic string of text to the sp_executesql procedure. Listing 18 shows a
more appropriate use of sp_executesql .
170
AS
DECLARE @sql NVARCHAR(250);
SET @sql = N'select * from Production.ProductSubcategory where Name like ''%@
Filter%'';';
EXEC sp_executesql @sql, N'@Filter varchar(50)', @Filter;
GO
Listing 18: Good list search using dynamic SQL and sp_executesql .
Before wrapping up this chapter, I would like to touch upon a few additional areas of
interest with regard to SQL injection. It would be easy to write entire chapters on
these, but I will have to settle for a few sentences on each topic.
171
• Bad Habits to Kick: Using EXEC() instead of sp_executesql
HTTP :// TINYURL.COM / P 5 UJW 9 X
In this particular debate, I side with Bertrand, which is why I tended to use
sp_executesql above instead of exec with QUOTENAME and REPLACE . Although
sp_executesql is often slower than simply running exec with appropriate use of
QUOTENAME and REPLACE , it is also a lot easier to get everything right. This is
especially true for application developers writing stored procedures that use dynamic
SQL. In that case, I would not automatically trust them (or even myself!) to get it right
and, instead, would focus on the easier safe method, at least until performance simply
is not good enough.
Appropriate permissions
Throughout this chapter, I have assumed that the account running these stored
procedures and ad hoc SQL statements has some hefty rights, probably db_owner ,
and maybe even sysadmin . Sadly, this is usually a safe assumption.
If a procedure runs a SELECT statement that only hits three tables in the Adventure-
Works database, it does not need permission to insert records, create user accounts,
run xp_cmdshell , or view the INFORMATION_SCHEMA or sys schemas! Erland
Sommarskog's outstanding essay, Giving Permissions through Stored Procedures
(HTTP://WWW.SOMMARSKOG.SE/GRANTPERM.HTML ), describes in detail various methods
available for granting appropriate permissions for stored procedures (see also, the
previously referenced Kimberly Tripp blog post on the EXECUTE AS functionality in
SQL Server, and how to use that to allow only limited access to SQL stored
procedures).
Even if the resources are not available to create appropriate permissions and signed
certificates for each stored procedure, at least create limited-functionality accounts and
run stored procedures through those accounts. Reader and editor logins that use the
sp_datareader and sp_datawriter security roles, respectively, would at least
protect against certain shenanigans, such as attempts to drop tables, or to use what a
developer intends to be a SELECT statement to insert or delete records.
Automated tools
A number of tools can help discover possible weaknesses in code, and so prevent
attacks. One of my favorites is sqlmap (HTTP :// SQLMAP.ORG / ), which allows SQL
injection attack attempts against a number of database vendors' products, not just SQL
Server. It also lets the user perform advanced SQL injection techniques not covered in
this chapter, such as timing-based injection attacks.
In addition to those features, it can tie into Metasploit (HTTP :// METASPLOIT.COM ), an
outstanding penetration-testing suite. Inside Metasploit, there are SQL Server-specific
modules such as one that tries to guess a SQL Server instance's sa password. If it is
172
successful, the module automatically tries to create a Metasploit shell, giving the
attacker full access to the database server. Both tools come with thorough
documentation and can be automated for enterprise-level penetration tests.
Additional Resources
Thanks to the prevalence of SQL injection attacks, finding examples of attacks is a
simple exercise. As of this chapter's publication, a few recent attacks include:
• Stratfor – HTTP :// TINYURL.COM /S TRATFOR SQL I
• IRC Federal – HTTP :// TINYURL.COM /IRCF EDERAL SQL I
• Sony – HTTP :// TINYURL.COM /S ONY SQL I
• Arthur Hicken of Parasoft – a curated list of high-profile SQL injection attacks,
located at HTTP :// CODECURMUDGEON.COM / WP / SQL-INJECTION-HALL-OF-SHAME / .
In terms of thwarting SQL injection, I recommend the following additional resources:
• Bobby Tables – a compendium of ways to parameterize queries in different
programming languages, including technologies outside of ASP.Net and SQL Server
HTTP :// BOBBY-TABLES.COM /
173
Using Database Mail to Email Enable
Your SQL Server
John Barnett
The job of a DBA team is, in essence, to maintain a “steady ship,” ensuring that the
database systems in their care are available, reliable and responsive. When problems
do arise, the team must respond quickly and effectively. One of the keys to a fast
response time is swift notification of problems as they arise and, for this purpose, we
have a solution, Database Mail, built directly into SQL Server.
We can use Database Mail to:
• send email from T-SQL containing, for example, the results of a T-SQL statement
and including reports and other necessary files as attachments
• report on the success or failure of SQL Server agent jobs
• provide a real-time notification system for SQL Server alerts.
This chapter describes how to enable, configure and use Database Mail in your
applications. I'll also provide advice on troubleshooting Database Mail, and
maintaining the msdb log table, where SQL Server retains details of email status
messages, errors, and so on.
The examples in this chapter will show how to send automated emails from SQL Server to your staff,
customers, and suppliers. This could have legal implications. Refrain from sending near-identical content
to large numbers of people, unless absolutely necessary. Even if you only send these emails within your
organization, get agreement with staff, and offer opt-outs. Exercise special care if you use Database Mail
to send to external recipients. If not handled correctly, the company hosting the server could be blacklisted.
To set up Database Mail, and work through the examples in this chapter, you'll need
any version of SQL Server 2005 and later, and any edition except the Express editions.
Database Mail uses SMTP (Simple Mail Transport Protocol), the Internet standard for
transmission of email, so you'll need access to an SMTP server, along with its fully
qualified DNS name for setting up the Database Mail profile. It will look something
like smtp.example.com . Depending how security is configured on the SMTP server,
174
the email administrator may also need to:
• supply a username and password for connecting
• grant permissions for connections to come from your SQL Server's IP address.
You can download the code examples as part of the code bundle for this chapter, all
tested against SQL Server 2005 and 2008 Developer Editions with the
AdventureWorks sample database for that version.
All screenshots were taken using SQL Server 2008 Developer SP3 running on
Windows 7 Professional; minor changes may exist due to differences in operating
system or SQL Server version. Lastly, in the walk-through, I assume the reader has a
basic understanding of both SQL Server and email client configuration.
175
Alternatively, we can enable Database Mail using the SQL Server Surface Area
Configuration tool from the Start menu group, use the sp_configure system stored
procedure, or enable it through the configuration wizard, as discussed next.
Click Add to create a database mail account to add to your new profile. You need to
provide an account name and description (narrative text for future reference) along
with details about the SMTP server, obtained from the email administrator.
176
Figure 3: Configuring Database Mail – creating a new Database Mail account.
In the E-mail address field, enter the sender (From: ) address. In the Display name
field, enter the name for the sender that will appear alongside their address. Optionally,
we can provide a Reply e-mail to populate the “Reply to” email header.
Test email accounts
If you need some test email addresses, example.com is a safe domain. IANA maintains the site specifically for
documentation and training materials (see HTTP;//WWW.EXAMPLE.COM for more information).
In the Server name box, enter the DNS name of the server. For the remaining options,
port number, SSL and authentication, accept the defaults, unless instructed otherwise
by your email administrator. Click OK to close the New Database Mail account
screen and you will see the newly created email account added to the profile (Figure 4
).
177
Figure 4: Configuring Database Mail – an email account added to a profile.
Click Next to move to the Manage Profile Security screen. Security in Database Mail
uses the concept of profiles, and on this screen are two tabs, Public Profiles and
Private Profiles .
Public profiles are accessible by any SQL user with permissions within the msdb
database, whereas a private profile is accessible only to specific named users within
msdb . Members of the sysadmin role automatically have access to all profiles.
The guest database user has access to a public email profile, which means that any
SQL Server Login can access the public profile and send email, without the DBA
configuring a database user account for its explicit use. For obvious security reasons, I
advise against this; SQL Server is a database server and should only send email to
known recipients, not act as an email relay for unauthorized traffic.
Therefore, leave the Public Profiles tab untouched and switch to Private Profiles , to
create our testprofile as a private profile. You can use the User name drop-down
to grant access to the profile to specific users within the msdb database.
You can tick the Access box on the left-hand side to grant access to specific profiles to
users within the msdb database. Optionally, you can set the Default Profile to Yes to
designate a maximum of one default profile per user. Setting the default profile option
to Yes means that if a database user within msdb attempts to send an email without
specifying the @profile_name option, SQL Server will use the default private
profile. Setting it to No , as in Figure 5 , stipulates the use of a specific profile to send
a notification.
178
Figure 5: Configuring Database Mail – granting the test user access to testprofile .
Click Next to advance to the Configure System Parameters screen, shown in Figure
6.
The following system parameters for Database Mail can be configured. Mostly, I use
the default options for each of these, but there may be good reasons to change them:
• Account Retry Attempts – number of attempts Database Mail will take before
recording a message failure. If you have a slow network or a heavily loaded SMTP
server, you might consider increasing this value.
• Account Retry Delay (seconds) – time Database Mail will wait between send
retries.
• Maximum File Size (Bytes) – largest file size it will attempt to transmit; increase
the default only if you really need to send large files.
179
• Prohibited Attachment File Extensions – list of file extensions it won't send;
some secure environments will have tighter restrictions, and you may need to add
extensions such as ps1 , com and bat to the prohibited list.
• Database Mail Executable Minimum Lifetime (seconds) – shortest time the
Database Mail process will remain running before closing down.
• Logging Level – the amount of information recorded in the logs; you may
occasionally need to use a more verbose level when troubleshooting.
Clicking Reset All will revert settings to the default.
When you click Next , and then Finish , SQL Server will configure Database Mail as
specified, including security, access rights, and parameters.
Security requirements
Once configured, any user with sysadmin privileges can utilize Database Mail.
180
Otherwise, the user must be a member of the DatabaseMailUserRole role in the
msdb database. This grants it access to the sp_send_dbmail stored procedure as well
as the logs and maintenance stored procedures, all detailed later in the chapter.
To add to the DatabaseMailUserRole a user that exists in msdb :
• Log in as a user with sysadmin rights.
• In SSMS, go to the msdb database and select Security | Roles | Database Roles .
• Double-click DatabaseMailUserRole and use the Add button at the bottom to add
the database user to the role.
To add to DatabaseMailUserRole a user that is not present in msdb :
• Log in as a user with sysadmin rights.
• In SSMS, go to the msdb database, navigate Security | Users , right-click and select
New User .
• On the General page, create a new user from a login.
• On the Membership page, tick the DatabaseMailUserRole box.
The code in Listing 1 calls the sp_send_dbmail system stored procedure to send an
email from SQL Server, using our new testprofile database mail profile. It is the
T-SQL equivalent of clicking the Send Test E-Mail button in the previous step.
If successful, the output, “Mail queued” will appear in the results pane, and the
message will arrive in your mailbox a few seconds later.
181
In this manner, we can send users the results of a pre-generated report.
To send multiple file attachments, simply separate the file paths with a semicolon.
Obviously, each file must exist at the specified location. In addition, ensure that there
are no leading or trailing spaces between the end of one path, the semicolon and the
start of the next as, otherwise, SQL Server tries to interpret it as part of the pathname
for the relevant file and you will get “Attachment file invalid” error messages.
There are a few other caveats to this:
• Database Mail uses impersonation to determine if the Windows network account of
the connected user is able to access the files specified. Therefore, users authenticated
to SQL server using SQL logins won't be able to use the @file_attachments
option. This also ensures that Windows authenticated logins can only send files that
they have permission to read using normal permissions.
• As discussed earlier, in relation to Database Mail system parameters, we can
configure Database Mail to:
• prohibit file attachments with specific extensions; files must not have an
extension that is on this blocked list
• prohibit file attachments over a maximum file size; files must be this size or
smaller.
In addition, it is worth noting that email client configurations may impose further
restrictions on access to file attachments, on the receiving end. This normally relates to
files that could contain executable code or scripts and pose a risk to your computer's
security.
182
return data with error statuses that may indicate further action is required.
Listing 4 shows how to send the results of an SQL statement in a message body. Note
that there is no @body parameter and the addition of the
@attach_query_result_as_file parameter is set to 0 .
Listing 4: Sending an email with a message body containing the results of executing a T-SQL statement.
Listing 5: Sending an email with a file containing the results of executing a T-SQL statement.
In both cases, you will notice that the results look untidy when viewed using a variable
width font (where different characters take up different widths of the page, for
example, the letter “I” takes less space than the letter “W”). While this may be
acceptable for a quick process completion spot check, sent to IT staff, it won't be
acceptable for emails to end-users or customers.
To get around this, we can use HTML email with a layout using CSS or tables, or
build up the result set for the recipient using text concatenation, then set the message
body to the result set. Both techniques are complex, but shortly I'll demonstrate the use
of the latter to send customized email alerts.
183
Of course, all recipients get the same email. In some cases, we need to customize our
emails depending on the recipient. In the next section, I show how to do that. An
alternative to this is to send to a distribution list.
USE AdventureWorks
GO
/* Declare variables */
DECLARE @empid INT -- Current employee ID
DECLARE @message NVARCHAR(MAX) -- Message body text
/* Initialise employee ID, manager ID and message content - can be replaced with
DECLARE and initialise above if running under SQL Server 2008 or newer */
SELECT @empid = 0
SELECT @manager_id = 0
SELECT @message = ''
184
WHILE ( @empid IS NOT NULL )
BEGIN
SELECT @empid = MIN(EmployeeID)
FROM HumanResources.Employee
WHERE ManagerID = @manager_id
AND EmployeeID > @empid
-- Get employee details
SELECT @message = @message + ''
+ Person.Contact.FirstName + ''
+ Person.Contact.LastName + CHAR(13) + CHAR(10)
FROM Person.Contact
WHERE Person.Contact.ContactID = @empid
END
-- Complete the message.
SELECT @message = @message + CHAR(13) + CHAR(10)
+ 'Please submit them to HR by the end of next week.'
+ CHAR(13) + CHAR(10)
SELECT @message = @message + 'Kind Regards,'
+ CHAR(13) + CHAR(10)
+ CHAR(13) + CHAR(10)
SELECT @message = @message + 'HR Director.'
-- Send the email
/* Put your Database Mail profile name here */
EXEC msdb.dbo.sp_send_dbmail @profile_name = 'testprofile',
/* normally you would use @manager_email here */
@recipients = N'[email protected]',
@subject = N'Signed timesheets required from your staff',
@body = @message
END
GO
If run with the default data, 48 “Mail queued” messages will be the result, and 48
emails turn up in your inbox.
You can run this stored procedure on a recurring basis from a SQL Server Agent job
step, but you should add functionality to ensure that it doesn't run over public holidays,
and you should offer staff the facility to opt out of receiving these email alerts.
Why use this technique rather than a more conventional report writer (such as SSRS)
with an email recipient? Many of the software companies that provide such tools
charge license fees by the number of report recipients; this method removes this
expense.
185
Enable a database mail profile for alert notification
In SSMS, right-click on SQL Server Agent and choose Properties and select the
Alert System page. Configure the screen as appropriate but at a minimum, you must
enable the alert system, choose Database Mail as the mail system, and select the Mail
profile .
Figure 8: Enabling our testprofile database mail profile for SQL Server Agent.
Leave the rest of the screen as is. We can ignore the section relating to Pager
notification.
Why no Test…option?
The Test… button next to the Mail Profile box (at least up to SQL Server 2008) is disabled when Database
Mail is selected as the mail system. According to Books Online, this feature is only available if using SQL
Mail (the older, deprecated email functionality in SQL Server).
Configure an operator
The next step is to configure an operator (the recipient of the email notifications).
Navigate SQL Server Agent | Operators , right-click and choose New , then add a
description for the operator, their email address and tick the Enabled flag as per
186
Figure 9 . Using a distribution list tends to work better than a comma-separated list of
operators, if multiple recipients need to receive the message.
MSDN articles and Books Online indicate that the Net send and Pager notification
types have been deprecated from SQL Server 2008, so I would advise against using
them (reference: HTTP://MSDN.MICROSOFT.COM/EN-US/LIBRARY/MS179336.ASPX ).
We can ignore the rest of the screen, relating to specifics for pager notification setup.
187
In SSMS, expand SQL Server Agent | Alerts , select New Alert and configure alerts,
as recommended by Microsoft, for error severities 18, 19, 20, 21, 22, 23, 24 and 25, as
shown in Figure 11 .
Microsoft provides further guidance on Error Message Severity levels. The article
dates back to SQL Server 2000 but the principles apply equally well to newer versions.
(HTTP://TINYURL.COM/PYQGYMT )
To avoid recipients receiving an email every time the same error occurs until it is
resolved, go to the Options tab and set an appropriate delay between responses. You
can also include extra information in the message text. Figure 12 shows the settings to
send an email notification once an hour, rather than every time the alert fires.
It is just a matter of tweaking the minutes and seconds delays for each alert property to
suit your requirements.
Next, set up the operators, exactly as described in the previous section, Reporting on
success or failure of SQL Server Agent jobs .
188
Troubleshooting Database Mail
Sometimes Database Mail doesn't work as expected. In this section, I recommend
some simple checks that will often reveal the root cause. If you need to look deeper,
you can use the Database Mail-related system views.
Common problems
bleUnless you are aware of connectivity issues between the SQL and SMTP servers,
the first step should be to rerun the initial test from SSMS. If this test still works,
perform a simple test call to sp_send_dbmail and, if applicable, check permissions
for file attachments. If a simple call does not work, then check sending email via the
SMTP server using a normal email client and see if that works.
In my experience, the vast majority of problems with Database Mail resolve into one
of four reasons, as summarized in Table 1 .
189
failing.
Table 1: Troubleshooting steps for common Database Mail problems.
190
SELECT @tempdate = DATEADD(dd, -1, @tempdate)
-- send email
EXEC msdb.dbo.sp_send_dbmail @profile_name = 'testprofile',
@recipients = '[email protected]',
@subject = @tempsubj, @query = @tempsql,
@attach_query_result_as_file = 0;
GO
--Example 2: Remove all sent messages over 6 months old regardless of status
DECLARE @tempdate DATETIME
SELECT @tempdate = DATEADD(m, -6, GETDATE())
EXEC msdb.dbo.sysmail_delete_mailitems_sp @sent_before = @tempdate,
@sent_status = NULL;
191
correspondence or communications. Appropriate advice from a qualified professional
should be sought if there are any questions about the legalities of keeping or removing
this data.
Summary
I hope that this chapter has provided everything you need to get started with Database
Mail and shown you how to use it to send a single email, develop a multi-user custom
email notification system for applications, or as a notification system for monitoring
and troubleshooting error conditions on your SQL Servers.
I've used Database Mail for tasks ranging from email notification of overnight data
transfers between servers to sanity checking overnight jobs, to batch sending email
alerts for a commercial costing and accounting system. With careful use, taking care
not to deluge people with mail, you can build on the foundation I've provided here to
create a very effective means to receive swift notification of any issues with the SQL
Server instances in your care.
192
Taming Transactional Replication
Chuck Lathrope
SQL Server Replication is a set of technologies that allows us to copy our data and
database objects from one database to another, and then keep these databases in sync
by replicating any data changes from one to the other. We can replicate the changes in
near real-time, bi-directionally, if required, while always maintaining transactional
consistency in each database. In a typical case, we might offload a reporting workload
from our primary server (the Publisher, which produces the data) to one or more
dedicated reporting servers (Subscribers). After the initial synchronization, we can
replicate the data in near real-time; in the best cases, replication can offer a transit
latency of just a few seconds. In another case, we might want to replicate schema
changes from one database to one or more databases on another server, including
partially offline SQL instances.
SQL Server offers several flavors of replication to support these and many other tasks,
depending on your exact requirements. Fortunately, by far the most common,
transactional replication with read-only Subscribers , is also the easiest type of
replication to understand, and is the focus of this chapter; and many of the definitions,
concepts, and performance tuning tips in this chapter apply to all types of replication.
SQL Server Replication has a reputation with some DBAs for being “difficult” to set
up and maintain, and prone to “issues.” However, having implemented and supported
replication environments in very large enterprise deployments, I've come to view it as
a remarkably fault-tolerant and versatile technology. My goal in this chapter is to offer
not only a solid overview of transactional replication and how it works, but also to
impart some tribal knowledge that only real-world experience provides, covering
topics such as:
• Good use cases for replication – and, equally important, when not to use it.
• Requirements – and a practical example of how to set up transactional replication.
• Monitoring replication – using ReplMon and custom email alert alternatives.
• Performance tuning – server and replication-specific tuning for optimal
performance.
• Troubleshooting replication – common issues I've encountered and how to
troubleshoot them.
Having read the chapter, I hope you'll be able to put the technology to good use in
your environment, while avoiding common issues related to poor configuration,
unrealistic requirements, or plain misuse.
193
Transactional Replication Architecture
The components in replication are defined using a publishing industry metaphor: a
Publisher , a SQL Server instance, produces one or more publications (from one
database). Each publication contains one or more articles , with each article
representing a database object such as a table, view or stored procedure. A Distributor
, typically a separate SQL Server Instance, distributes articles to any Subscriber (one
or more SQL Server instances) that has a Subscription to the publication articles.
Figure 1 highlights the main components of transactional replication, which we'll be
discussing in more detail over the coming sections.
Various Replication Agents , run by SQL Server Agent jobs, control each part of the
process of moving the articles from Publisher to Distributor to Subscriber.
194
Figure 1: Core transactional replication components.
A Snapshot Agent (not shown in Figure 1 ), running on the Distributor, copies the
initial data and schema objects from the publication database to a snapshot folder that
is accessible to all Subscribers. Once all Subscribers receive the snapshot, the
transactional replication process begins, with a Log Reader Agent reading every row
of the transaction log of the publication database to detect any INSERT , UPDATE , and
DELETE statements, or other modifications made to the data in transactions that have
been marked for replication.
The Log Reader Agent will transfer these modifications, in batches, to the distribution
database, marking the point it reached in the transaction log with a checkpoint entry.
The Log Reader Agent runs continuously by default, which means the Subscribers
195
should “lag” behind the Publisher by only a short period (referred to as the latency).
Alternatively, we can run it on a set schedule, such as hourly, so the Subscribers have
data from the time of the last execution of the log reader job.
Continuous replication performs better, as big batches are difficult for replication to
process. It requires careful monitoring of log file growth on the publication database,
as the presence in the log file of commands awaiting replication (not yet committed to
the distribution database) could delay the log file truncation, and cause it to grow
rapidly.
A Distribution Agent applies the transactions stored in the distribution database to the
log of the subscription databases and, consequently, the destination tables. The
Distribution Agent can run on the Distributor, pushing the changes to Subscribers
(called push subscriptions ) or on the Subscriber, pulling the changes from the
distribution database (called pull subscriptions ). The Distribution Agent runs
continuously, by default, or we can set the schedule to run on a periodic basis, to delay
data delivery to the Subscribers. As for the Log Reader Agent, continuous delivery to
Subscribers tends to perform better there. This is because big batches are more
difficult to push through and you need to monitor database file growth on the
distribution database and possibly the Subscriber, depending on the Subscriber's
database recovery mode. It is fine to use a delay, but make sure to closely monitor file
sizes for a period after implementation.
The typical replication method used is transactional read-only (though other methods
are available, which we'll cover briefly later). This means the Subscribers are read-
only with respect to replication. No changes will ever make it back to the Publisher,
even though nothing prevents users from updating the Subscribers. However, doing so
can destabilize the replication environment and this is a common mistake made by
those new to transactional replication. I'll cover how to fix such issues later, in the
Troubleshooting Replication section, but the way to avoid them is to secure object
permissions to the replicated tables on the destination Subscriber database such that
only replication has the permission to make changes.
We will discuss all of the replication components in in more detail later, but I hope this
paints the high-level picture of how transactional replication works. Let's now move
on to consider some good use cases for the technology, take a brief look at other types
of replication, and then consider a few situations where replication may not be the best
fit.
196
include:
• Improving scalability and availability (by offloading work onto Subscribers).
• Replicating data to data warehousing and reporting servers.
• Integrating data from multiple sites.
• Integrating heterogeneous data (such as Oracle and SQL Server data). Note :
Microsoft is deprecating heterogeneous replication, Oracle publishing, and updatable
transactional publications in SQL Server 2012, in favor of Change Data Capture and
SSIS. See HTTP :// MSDN . MICROSOFT . COM / EN - US / LIBRARY / MS 143323. ASPX for
more information.
The most common use case for transactional replication is to offload a read-only query
workload from the application's primary “source of record” server, which may be
overwhelmed, to another server, or to several remote servers. Scaling by distributing
workload to many Subscribers is a scale-out deployment, as opposed to scale up,
which is just purchasing a bigger and faster server (a.k.a. throwing money at the
problem to make it go away).
Typically, my goal when implementing replication has been to reduce the workload on
the Publisher, scaling out by offloading, to Subscribers, time-consuming or complex
processes. For example, in one case, I used transactional replication to offload
authentication responsibilities (encrypted passwords) to many Subscribers, so that our
publication server didn't have to deal with the workload of tens of thousands of queries
that utilize native SQL Server encryption. In another, I replicated data, using a
dedicated remote Distributor, from one publication to fifteen Subscribers, spread
across four datacenters, to provide data to a near real-time application that gets an
aggregate of one billion hits a day. Try doing that with some other technology as easily
as you can with replication!
Another common use case is consolidating data, or sending data to a data warehouse
or reporting server. In such cases, we can tune the Subscriber differently than the
Publisher. For example, on the Subscriber we can “over index,” change up the security,
create a dedicated batch processing system, and utilize many other optimization
techniques for fast processing of the Subscriber data, as appropriate for a data
warehouse or business intelligence scenario.
197
Figure 2: A scale-out replication topology for a medium-to-large deployment (simplified).
As depicted in Figure 2 , the Publisher accepts writes from webservers or other SQL
Servers, and publishes out through the Distributor to one or more Subscribers that are
optionally part of a load-balanced set. The advantage to using a load balancer, such as
Microsoft load balancing or a network hardware-based load balancer such as BigIP or
Netscaler, is to provide, to your application or users, a single VIP name, that is a
highly available data repository, and offload from the Publisher as much of the read-
only query workload as possible. It also enables easy maintenance opportunities with
the Subscribers.
Snapshot replication
We discussed earlier how the process of transactional replication starts with capturing
a snapshot of the data and objects. Snapshot replication is simply a version of
replication that does this on a scheduled basis. We create a new, full snapshot of the
publication on a periodic basis, typically nightly, and apply it to a Subscriber
(typically, a data warehouse). This form of replication works well if the Publisher can
handle the workload and the maintenance window can accommodate the time required
to perform a nightly, full snapshot of the data, and replicate it to a data warehouse. It
must also be acceptable that the Subscriber data will be stale (for example, out of date
by one day).
If for some reason we need to modify the Subscriber data, after applying the snapshot,
we can do so without worrying about breaking replication.
Merge replication
198
This is a two-way change replication process, appropriate for use with offline
Subscribers, which synchronize when a network connection is available. It tracks
changes with triggers and only needs to synchronize the last state of the data, instead
of every change, as with transaction replication. We can also filter data so the
Subscribers receive different partitions of data.
199
requirements, and then a full walk-through for setting up a simple transactional
replication environment.
The Distributor
In many ways, the Distributor is the beating heart of transactional replication. Its role
is to store metadata, history data and, in the case of transactional replication, it holds
temporarily the transactions we wish to replicate to the Subscribers. The Distributor
will hold the data in a system database, by default called Distribution , and it will
run the Log Reader Agents (discussed in more detail later) that read the log files from
your publications.
We assign a Publisher (SQL Server instance) to one Distributor, so we can't have
200
different databases on a single Publisher using different Distributors. However, we can
have multiple distribution databases, although only the very largest replication
environments, the 0.1% minority, will need this.
Given that most of the replication agents responsible for the various data transfer
processes run on the Distributor and access the distribution database, the location and
configuration of the Distributor instance, and the distribution database, are key factors
in determining transactional replication performance.
If the Distributor is local to your publications, it is a Local Distributor; otherwise it is a
Remote Distributor. When implementing transactional replication on a busy Publisher,
and especially if you have many Publishers on different SQL Server instances, I
recommend using a dedicated remote Distributor (SQL Server instance on its own
hardware), rather than a local Distributor. If your replication is small-scale, and going
to stay small, it is fine to have the Publisher and Distributor on the same SQL instance.
You can even use SQL Server Standard edition on your Distributor, with the Publisher
using Enterprise, but with the proviso that, at minimum, the major version of SQL
Server on the Distributor must match what you use on the Publisher. For example, let's
say you have a Publisher running SQL Server 2008 R2 SP2; 10.50.4000 is the full
build number and 10 is the major version. In this case, your Distributor must run SQL
Server version 10 (SQL Server 2008) or higher.
Keeping track of SQL Server builds
HTTP :// SQLSERVERBUILDS . BLOGSPOT . COM is my go-to site for build numbers .
The Subscribers
A Subscriber is any SQL Server instance that subscribes to, i.e. has a subscription to,
a publication, and so receives the articles from the publication. A SQL Server instance
can be a Subscriber, Publisher, and Distributor at the same time, although this is
typical only of development or small production environments. As replication
environments get larger, we would install each component on a separate SQL Server
instance.
Subscribers can subscribe to many publications, making sure not to overlap data,
which is great for consolidating data to a reporting server when there are many sources
of data. They are not limited to use of one Distributor, like the Publisher.
Some DBAs forget that we can tune the subscription database on the Subscriber
differently from the publication database, adding indexes and so on, as appropriate for
the workload for the Subscriber. We can also republish some of the data from the
Subscriber, for example using an intermediate server to denormalize the data and turn
complex queries into simple ones, without affecting the main replication process. We'll
discuss this in more detail later, in the Tuning Query Performance on the Subscriber
section.
201
The Replication Agents
SQL Server Agent controls the execution of separate external applications that
comprise the replication feature of SQL Server. Each agent controls a part of the
process of moving data between the various replication components. In other words,
SQL Server Agent jobs run replication, not the SQL Engine.
Snapshot Agent
In all forms of replication, the Snapshot Agent (snapshot.exe ), executed on the
Distributor, is responsible for producing the initial snapshot of the Subscriber's schema
and data. It produces bulk copy and schema files, stored in the Snapshot Folder
defined by the Distributor and consumed by the distribution agent (discussed later).
Typically, the snapshot folder will live on a Windows file share, so that all Subscribers
can access it and pull data. Alternatively, in cases where most, or all, of the tables are
included in the publication, we can use a backup and restore process to initialize
replication on the Subscriber, but we have to script this part as there is no wizard (for
more information, see HTTP :// MSDN . MICROSOFT . COM / EN - US / LIBRARY / MS
147834%28SQL.90%29. ASPX ).
SQL Agent cleanup jobs, on the Distributor, will purge the snapshot folder
periodically, based on a defined retention period (72 hours, by default), or when,
according to metadata stored in the distribution database, no Subscriber still requires
the snapshot.
Distribution Agent
The Distribution Agent (distrib.exe ) moves the snapshot (for transactional and
snapshot replication) and the transactions held in the distribution database (for
202
transactional replication) to the destination database at the Subscribers. By default, the
distribution agent polls the distribution database every 60 seconds for data to replicate.
The distribution agent will reside on the Subscriber, for pull subscriptions, and on the
Distributor, for push subscriptions.
Other Agents
Several other replication agents exist that we won't touch on further in this chapter. For
example:
• Merge Agent ( ReplMer.exe ) – used only in merge replication to send snapshot
data on initialization and for sending changes to Subscribers.
• Queue Reader Agent ( QrDrSvc.exe ) – used to read messages stored in a SQL
Server queue or Microsoft Message Queue and apply those messages to the
Publisher. It can be used with transactional and snapshot replication.
203
Table 1: SQL 2008 Edition capability map.
204
Figure 3: Configure Distribution context menu in SSMS.
If you did not install the replication components when you set up SQL Server, you will
see the error message in Figure 4 .
Figure 4: Warning error message (SQL 2008 and later) if you didn't install replication components.
To solve this error, you just need to install the replication components with the SQL
Server setup tool, as shown in Figure 5 .
205
Figure 5: Install replication components.
Click Next , and you'll reach the dialog where you can set up the Snapshot folder. By
default, this is located next to your SQL Server Logs folder, in a folder called
ReplData . However, if you have remote Subscribers, it's best to pre-create a windows
share called Snapshot (for example, \\sqltestsvr1.prod.local\snapshot ).
For a multi-domain environment, you will want to fully-qualify the server name.
206
The next step is to name the distribution database and specify a location for its data
and log files. By default, it's called Distribution , and I recommend you don't
change the name, as it can be confusing and make custom coding more difficult.
Furthermore, validate the file locations, and do remember that the Distribution
database, like any other user database, will adopt the size and growth characteristics of
the model database. You may need to alter them for optimum efficiency. A general
guideline for a medium-to-large replication environment is to size the distribution
database between 5–10 GB for the data file and 500 MB for the log file, with growth
rates of 2 GB and 800 MB respectively. Be sure to back it up and monitor the file
sizes.
Finally, we enable which Publishers (SQL Server instances) can use this Distributor.
The current SQL Server instance is all we need for this simple setup.
Enabling replication
We need to enable the replication feature in our database before we can create the
publication. In Figure 7 , we select Publisher Properties from the replication context
menu of the Publisher (notice, in the figure, our new distribution system
database).
207
Figure 7: Publisher Properties context menu in SSMS.
Now that it is enabled for replication, we can create a new publication using the New
Publication Wizard by selecting Replication | Local Publications – New Publication
. Over the next two screens, select AdventureWorksDW2012 as the database from
which we wish to publish, and choose Transactional publication .
Selecting articles
208
Next, we select the articles to publish, in this case the FactInternetSales table, as
depicted in Figure 9 . For optimal performance, don't select all the columns; be precise
about what data you want on the Subscribers and only publish the columns these
Subscribers really need.
As an aside, take care when modifying the schema of a published table on your
Publisher. When you add the column, use the ALTER TABLE…ADD COLUMN T-SQL
syntax, and if you use a tool to add the column (such as a schema comparison tool),
make sure it does the same thing, rather than re-create it and then drop the original
table. If you append a column to the end of a published table, replication will add the
column without intervention on your part, which is a nice feature.
Article properties
209
Next, we want to set the properties of our published articles, as shown in Figure 12 ,
which determines exactly what data we replicate to our Subscribers.
Figure 10: New Publication Wizard – set properties for all articles or individual articles.
This is a confusing interface, but we can use it to set the properties of all articles of a
given type (such as tables), or to set the properties of each individual article. Figure 11
shows all the configurable article properties in SQL 2008 R2 and SQL 2012.
210
Figure 11: New Publication Wizard – article properties.
By default, replication copies over the clustered index only, not the nonclustered
indexes. We can simply set the Copy nonclustered indexes property to True if our
Subscribers need all of the indexes. We can also use the post-snapshot script option
(available on the publication's properties dialog, post-creation) to create a custom
indexing strategy for the Subscriber, which we'll discuss in more detail later, in the
Tuning Query Performance on the Subscriber section.
Filters
Next, we can add filters to our publication tables, as depicted in Figure 12 . For
example, we might want to limit rows of data to certain status values such as WHERE
OrderStatus IN (5,6,7) . My advice is to keep filters simple, or don't filter at all,
as SQL Server evaluates every row in the table to find those that meet the filter
211
criteria.
212
Figure 13: New Publication Wizard – configure the Snapshot Agent.
Next, we select the security accounts for both the Snapshot Agent , and the Log
Reader Agent . I've never found a reason to use different accounts, so I have always
used the same account.
We need to specify the process account that the Snapshot Agent process runs under.
You can use a dedicated Windows account to run the Snapshot Agent process, or select
the SQL Server Agent service account, as depicted in Figure 14 . Since this is a
simple, stand-alone install, I chose to use the SQL Server Agent service account. The
account needs to be a member of the db_owner database role on the distribution
database, which SQL Server Agent will have, in our example, so I am not breaking
any best practices. However, in multi-server replication environments, running these
agents under a separate account from the SQL Server Agent will grant the least
privilege access and be the most secure configuration.
We also specify the account used to connect to the Publisher. Again, this Windows or
SQL Server account must be a member of the db_owner fixed database role in the
publication database and, again, I've chosen to make the connection under the SQL
Server Agent service account.
213
Figure 14: New Publication Wizard – Snapshot Agent Security.
In the final window, give your new publication a name, I chose Sales . That's it for
publication setup! Now we just need to set up some Subscribers to our new
Publication.
214
which case the Distribution Agent runs on the Subscriber, or the Distributor can push
articles to the Subscribers (push subscriptions ), in which case the Distribution Agent
runs on the Distributor.
The default, as shown in Figure 15 , is to use push subscriptions, but in larger
replication environments, the use of pull subscriptions is more scalable, since it
spreads the work across the Subscribers rather than on a single Distributor. This is
especially relevant on WAN connections, as when using push subscriptions, the
Distributor will need to wait for a Subscriber to receive the data before continuing to
process its queue of commands (i.e. the log records, containing details of the
transactions) to replicate.
The general advice is to use push subscriptions if you have fewer than ten Subscribers,
all located in same local datacenter, or pull subscriptions otherwise. You can mix and
match push and pull subscriptions to meet your needs.
Since we have fewer than ten Subscribers and all in same local datacenter, we'll stick
with the default and create a push subscription.
215
Figure 16: New Subscription Wizard – Subscriber selection.
This brings up the Distribution Agent Security windows, where we can establish the
accounts to use. For this example, on a single, stand-alone SQL instance, I chose to
use the SQL Server Agent service account for synchronizing the subscription. In larger
environments, you may want to review the TechNet article, which describes in more
detail the account(s) requirements for push or pull subscriptions: HTTP :// TECHNET .
MICROSOFT . COM / EN - US / LIBRARY / MS 189691. ASPX .
216
Figure 18: New Subscription Wizard – Distribution Agent Security.
217
Figure 19: New Subscription Wizard – initializing the Subscribers.
That's the last major step. Once you exit the Wizard, your replication node in SSMS
should look as shown in Figure 20 .
218
program, called sqlmoniter.exe , and you can run it from anywhere you have the
SQL client components installed. To open it from SSMS, expand any server, right-
click Replication and select Launch Replication Monitor .
ReplMon displays replication metadata and metrics for your entire replication
environment. We can also use it to initiate replication processes, such as starting the
Snapshot Agent (very handy). We can add and group servers just like in the SSMS
interface.
Before we start, note that Replication Monitor can be a performance hindrance to the
Distributor, although it does cache all the metadata to help mitigate this. However, if
you have a large number of publications or subscriptions, consider setting a less
frequent automatic refresh schedule for the user interface (under the Action Menu). I
have seen cases where so many people were using ReplMon, that it caused
performance issues.
219
Figure 22: Replication Monitor Subscription history example.
We can script tracer tokens, but I typically just use ReplMon to create and view them.
They add an entry mark into the log file, watch each point in the replication path, and
show the time it takes to get there. You always want to see seconds in all three fields.
If you have an issue, typically one will be waiting on a response and have a much
larger number than the other, and the latency data will give you a starting point for
your troubleshooting.
Alternatively, we can look at the Distributor to Subscriber latency in the
Undistributed Commands tab, as shown in Figure 24 . I watch this tab after I have
220
noticed a latency problem, as the estimated time to apply remaining replicated
transactions is quite accurate if you refresh the page (F5 ).
221
Figure 25: ReplMon limitation with Subscriber errors – in retry mode and all is green.
Figure 26 shows the error after the first set of retries and provides some useful
information. Firstly, the Action Message area reveals that the distribution agent found
a missing row at the Subscriber, and then tried to break the batch down into smaller
batches, but it still failed. Finally, the error details section shows the Transaction
sequence number , and we will discuss how to use this to find out which row it failed
on and what it was attempting to do, in the later section, Troubleshooting Replication .
222
Figure 26: Replication Monitor – what a Subscriber error looks like.
Then it retries and we are seeing green status icons (retry is always green).
Figure 27 depicts the Distribution Agent history, as it goes through its retry process
(see the Agents tab in Figure 25 ). We can click on previous rows in the history to see
their details, or query the distribution database errors table, as we will discuss in the
Troubleshooting Replication section.
Configuring Alerts
To avoid having to watch ReplMon all day long, we need to configure some
replication alerts, to receive automatic warnings of any problems. Out of the box, SQL
Server offers many SQL Agent alerts, which we can configure with the SQL Agent
223
Alerts dialog in SSMS. Alternatively, we can configure replication alerts within
ReplMon, by selecting Configure Replication Alerts from the Action Menu.
Select the alert and click the Configure button, to bring up the standard SQL Agent
Alert dialog window. The alert will be disabled by default, so check the Enable check
box.
Figure 29: SQL Agent Alert configure for Replication: agent failure.
Configure the remaining pages of the alert with the desired method to alert, taking
note that you may want to set the delay between responses to be higher than the typical
retry interval of three minutes, unless you like spam alerts. Following is the alert text
generated when I deleted the row at the Subscriber.
224
AdventureWorksDW2012-Sales-SQLTESTSVR1-1 failed. The row was not found at the
Subscriber when applying the replicated command.
COMMENT: (None)
In my custom email, I provide a lot more information than the built-in alerts. I display
a T-SQL query to use to see the errors with the ID of the error, which is very handy
and about which we will talk more in the Troubleshooting Replication section. After
that is the summary of the first hundred characters of the recent error messages since
the last email, as I typically run it every 30 to 60 minutes.
The replication latency customer email alert (Figure 31 ) shows the status of the
distribution agent job and current latency.
225
Figure 31: Custom replication status email.
Performance Tuning
Since this is a chapter on replication, we're mostly concerned with specific tips for
tuning the performance of the replication process. However, certain broader server
configuration issues will also affect its performance so let's start there.
• Pre-grow all your databases to the size you expect they will become in the near
future with optimized growth settings. Default of 1 MB growth will hurt
226
performance.
• Watch for high VLF counts (>200) on replicated databases – a fragmented log can
degrade the I/O performance for any process that needs to read the log file, such as
the log reader agent. This can be a hidden performance killer – Kimberly Tripp has
the best advice on the subject at HTTP :// TINYURL . COM / O 49 P 5 MV .
• Keep the snapshot folder on a separate array (4 KB cluster format) from the
database files.
• Set limits of how much RAM SQL Server can use so that Windows doesn't page to
disk. A good starting point for this is to be found at HTTP :// TINYURL . COM / OFHD 9
RF .
Tuning replication
If we set up replication with all its default settings, it will work fine for small
implementations that don't require high transactional volume. Microsoft designed the
defaults for typical low-volume use cases. However, once we start to scale up to many
Subscribers, some Subscribers residing miles away over the WAN, and to high
transactional volumes, we need to think about our overall replication architecture, and
about changing some of the default values. All the replication components in the full
end-to-end path have to be in peak shape. Here are a few tips (some of which we've
discussed along the way):
• If there are >10 Subscribers, use pull subscriptions, as discussed earlier.
• Minimize the use and complexity of publication filters.
• Run distribution agents continuously instead of infrequently, if data changes all day
long. It is better to spread the load out evenly than having big pushes at the end of
the day.
• Consider using the SubscriptionStreams option on the distribution agent, which
uses multiple threads to write to the Subscriber. There are a few drawbacks when
using SubscriptionStreams , so read BOL carefully on the subject.
• Evaluate any triggers on the publication database, and decide whether they need to
run on Subscribers. Use the NOT FOR REPLICATION clause to exclude them.
227
Start with the Distribution Agent, since it has settings that affect the Subscriber and
Publisher, 4 so this is where tailor-made setting will reap the most benefit. Figure 32
shows the settings for the custom Distribution Agent profile that I use in a WAN
environment.
228
to push it through. It will break the transaction into parts with the size you specify.
This adds overhead, so use it only if you frequently make large data changes.
• CommitBatchSize (1000) – number of transactions issued to the Subscriber
before issuing a COMMIT .
• CommitBatchThreshhold (2000) – approximate max total commands for all
batches.
• HistoryVerboseLevel (1) – BOL has a definition of all four possible values,
but if no errors are occurring, limit verbosity, possibly even choosing 0 (no history)
for even greater gains in performance.
• MaxBCPThreads (4) – used only when a snapshot is created or applied, so that it
won't be single threaded. Warning: if you set this number too high, it could degrade
the server's performance, so don't exceed the number of processors on the
Distributor, Subscriber or Publisher.
• PacketSize (8192) – this is on a good network. Adjust it up or down in 4096
increments until SQL Agent Job doesn't crash – pre-SQL 2005 SP3 there is a bug
with large packet sizes. Windows 2008+ network stack is greatly improved, so this
really only helps lower versions of Windows Server.
• QueryTimeout (4000 ) – to ensure it has enough time for a large batch.
• TransactionsPerHistory (1000) – limits the frequency of history updates you
see in ReplMon; adjust to your preference with higher numbers providing better
performance.
229
Figure 33: Dialog window for changing replicating execution of stored procedures.
immediate_sync subscriptions
As discussed during the walk-through, when configuring the Snapshot Agent (Figure
13 ), we may want to enable the immediate_sync property of publications for the
convenience it offers when setting up future Subscribers or re-initializing existing
Subscribers without the need to generate another snapshot. However, it is a hidden
performance killer in bigger replication environments.
When enabled, instead of removing data as soon as all the Subscribers have it, SQL
Server preserves all the replicated data for the default retention period (72 hours). For
example, if we move ten million rows a day, we would store at least 30 million rows in
the distribution database, and this could cause issues with very large Distribution
database file sizes, and with frequent failures of the Distribution Agent Cleanup job,
since it has to look through so much data to find commands it can delete.
With immediate_sync enabled, we also can't simply add a table to a publication and
create a snapshot for the table; instead, we need to generate a full snapshot for the
Subscribers.
ReplTalk MSDN blog post has a lot of good information on how immediate_sync
works: HTTP :// TINYURL . COM / OBGT 7 V 2 .
Listing 1 shows how to check for use of immediate_sync and disable it.
230
@value = 'false'
Troubleshooting Replication
In my experience, SQL Server Replication is quite a resilient feature. For example,
replication in SQL Server 2005 and later is resilient to SQL Server restarts; replication
will just come back online and restart where it left off.
As noted earlier, though, this resilience can make troubleshooting harder. If a
replication process fails to complete, SQL Server will just retry until the subscription
expires and in the meantime, as discussed earlier, when viewing ReplMon, it's easy to
miss the fact that anything is wrong (use alerting to combat this).
The coming sections describe some common issues I've encountered in my replication
environments, along with troubleshooting steps and resolution. Remember, the
distribution database stores the replication metadata, SQL Agent jobs control the
replication process, so examining them is key to finding issues and error messages.
The error row will provide the Transaction Sequence Number and the Command ID
that caused the failure. Listing 3 shows you how to use the sp_browsereplcmds
replication stored procedure to browse all the commands pending distribution for that
transaction batch; find the row with a value of 1 for the command_id column, as
defined in our error example in Figure 26 .
These steps will reveal the primary key of the missing row that caused the replication
command to fail, and then we can manually insert the row into the Subscriber, or use a
third-party data sync tool, such as Red Gate's SQL Data Compare.
231
Another option, to get past an error, is to skip the latest transaction batch and restart
the Distribution Agent job. I like to use this technique in response to the error, “Row
already exists at Subscriber.” I learned this from a prominent CSS support engineer
blog post: HTTP :// TINYURL . COM / N 94 DXM . The other option is to remove the row
from the Subscriber manually, but you could have issues with the entire replication
batch and spend a long time finding the rows with sp_browsereplcmds to delete.
--Total by table
SELECT ma.article ,
232
s.article_id ,
COUNT(*) AS CommandCount
FROM msrepl_commands cm ( NOLOCK )
JOIN dbo.MSsubscriptions s ( NOLOCK )
ON cm.publisher_database_id = s.publisher_database_id
AND cm.article_id = s.article_id
JOIN MSpublisher_databases d (NOLOCK)
ON d.id = s.publisher_database_id
JOIN msarticles ma ON ma.article_id = s.article_id
--and subscriber_db = 'analytics'
--and d.publisher_db = 'AdventureWorksDW2012'
GROUP BY ma.article ,
s.article_id
ORDER BY COUNT(*) DESC
Reinitializing Subscribers
There are a few reasons why you might want to re-initialize a Subscriber, a process of
resetting the Subscriber with a new snapshot of data. Usually, it's in response to
latency issues. We discussed earlier the use of tracer tokens to help identify the cause
of latency issues, but sometimes the issue is so bad that it's easier to “give up” and
start over. Similarly, you might need to push out to your publication a massive update,
knowing it will cause latency issues the business can't handle. In such a case, it's often
better to perform the update and then re-initialize the Subscribers. Finally, you might
occasionally meet a situation, due perhaps to very high latency or failure on the
Distributor, where a snapshot is removed, because the retention period, 72 hours by
default, is exceeded before a Subscriber receives it. When this happens, the
Distribution Agent will mark the subscription as expired and you must reinitialize the
Subscribers.
To reinitialize a Subscriber through SSMS, right click on the Subscriber in SSMS
Replication node and select Re-initialize . Alternatively, we can use a system
procedure sp_reinitsubscription as shown in Listing 5 .
233
Summary
You made it to the end of what I know was a long chapter, but I hope you learned
enough about replication to give it a try, or to improve your existing replication
infrastructure.
Once you understand how transactional replication works, you realize that tuning it is
just like tuning any new application running on SQL Server that uses SQL Agent jobs.
Small implementations will generally work fine out of the box, but sometimes we need
replication to scale very high, even to the billions of rows mark. If you run into issues
as you scale up, break open this chapter again, and use it alongside the same tools you
use to monitor for general SQL Server performance issues such as profiler, extended
events, and DMV queries. After all, transactional replication reads and writes data
with stored procedures, with SQL Agent jobs running a couple of executables (Log
Reader and Distribution Agent) to move around that data.
If you are stuck, search the web and ask questions in forums; there are usually many
people who like to help. Here are a few links I've found consistently useful when
trying to solve replication-related issues:
• The Microsoft SQL Server Replication Support Team's blog
HTTP :// BLOGS . MSDN . COM / B / REPLTALK
234
Building Better Reports
Stephanie Locke
A great report is one that, firstly, presents accurate information, and secondly, strikes
the perfect balance between satisfying user requirements and implementing best
design practices. Without correct information in a readily understood layout, reports
are at best a hindrance and at worst a disaster. There are numerous examples of how
bad information and bad report design have had a huge impact on companies and
government organizations. If you don't believe me, read through a few of the horror
stories the European Spreadsheet Risks Interest Group website (HTTP://BIT.LY/QWRSZE
).
In my experience, three key factors influence the overall success and quality of your
reports:
• Your overall skills as a report developer – not just report-building skills, but
communication skills, knowledge of the business and versatility.
• Report design – good design turns data into information that business can use to
make decisions.
• Report platform and tools – understanding the right tool for the task, and a
willingness to fit the tool to the report user, rather than vice-versa.
In this chapter, I'll explore all three areas. You won't find in here technical tips for how
to make specific styles and types of report look good, but what you will find are the
skills I believe you need to make sound report development decisions, based on well-
understood user requirements, and then produce clean, easily understood reports.
235
a report are the source OLTP (Online Transaction Processing) system, the ETL
(Extract, Transform, Load) system, the data warehouse, the network, and the reporting
platform. In short, a reporting system has one of the longest lists of dependencies of
any piece of software in the company. Many things can go wrong. Coupled with this,
our reporting system is also one of the most visible systems in a company; the
proverbial “canary” that provides the first warning of the presence of one or more of
these many possible underlying problems.
Along with sys admins and DBAs, report developers are the often-underappreciated
masses keeping chaos from reigning supreme. To fight the good fight, our strongest
weapons, alongside our report-building skills, are our people and communication
skills, our knowledge of the business and our versatility.
Communication
Over the course of developing and maintaining reports, and dealing with the various
problems that can derail them, you will need to communicate with people right across
the company. You cannot sit in a cubicle, safely isolated from “the business” and
expect to do a good job. The following sections cover just some of the people with
whom a report developer should communicate regularly.
IT department
Strong working relationships within the IT department are a necessity. Each person in
IT contributes in one way or another to the quality of our reports. Project managers
and Business Analysts may gather requirements or make promises on our behalf;
developers will build and maintain the systems on which we need to report; DBAs
look after the databases and ensure they run well; system administrators maintain the
hardware and networks that ensure the availability and performance of the reports.
You may be one, many or all of these people within your organization, but there is still
value in keeping the roles divided, if only to provide an easier way of determining
impacts when making changes.
236
Developers
Every time a developer changes a system, it will likely affect the data feeding into the
data warehouse. Ensuring reports are adapted to accommodate such change is the
responsibility of the report writer, and developers have other priorities than thinking
about what happens to reports after they've delivered their changes, so keep a close
eye on what they're doing, to ensure that necessary report amendments get picked up
sooner rather than later.
Working closely with developers has the added benefit that the development team's
work will benefit from the report developer's knowledge of the business process that
needs to be automated or amended. In this respect, the report developer provides a
useful “halfway house” between technical and business knowledge.
Network/infrastructure admins
The performance and availability of reports rely on many different systems, but report
writers, unless we also happen to be the DBA or system administrator, are unlikely to
be on the email distribution lists that receive news of system problems, like running
out of disk space on the OLTP server.
By regularly popping down to chat with the network admins and DBAs, it's possible to
gain forewarning of issues that may affect report performance, or mean that users can't
reach the front-end via IP for a certain period, and so on.
In addition, the report writer might be the person on the receiving end of an angry
phone call because reports aren't loading, or due to some other issue, which is actually
an early warning that the whole network is about to crumble. Knowing who to go to,
and having a good relationship with them, could mean the difference between the issue
being fixed before it affects other systems, or the whole company network falling over.
End-users
Finally, of course, we need to work closely with the consumers of our reports. A report
is just a vehicle for transferring information, and the most important thing is ensuring
that it succeeds in helping the user do what they need to, as quickly as possible. In
237
order to do that, understanding exactly what they need, and when, comes in very
handy. I'll cover requirements gathering a little later, but here are a few general tips for
smooth relations with your users:
• Avoid detailing how something was achieved unless asked; success is what matters,
not how hard you worked (also, any difficulties or complexities that would delay
delivery should already have been raised).
• When explaining concepts and the work required, use the simplest terms and
examples that can be used.
• Understand a person's overall objectives and how their request fits into them.
• Frame conversations in terms of how their objectives will be met.
• Always be willing to help with computer woes as it builds a good relationship and it
is a great opportunity to educate people.
• Try to discuss topics in person at their desk or over the phone.
• Remain honest about deadline feasibility. It is much worse to be told at the deadline
that something won't be delivered than it is if you know a few days in advance.
• Be tactful and nice if they screw up – not only does it prevent you from burning
bridges but often a lot of value can be derived from preventing similar mistakes in
future.
• Keep things documented – send confirmation emails of discussions with actions and
key agreements.
One thing we report writers quickly come to dread is the phone call or support ticket
that starts with “Your report is wrong/broken ” (sometimes, that's practically all the
support ticket says). It can lead to frustration because we tend to hear the dreaded
phrase for users regardless of whether there really is a problem with the report, or the
data. More often, the problem will be with the underlying servers or source systems, or
email, or the report user's typing (GIGO, as developers say) but, still, at the moment
the report stops working, it transitions from being their report, to being your report,
and they will expect you to be able to help.
It is worth remembering in times of frustration that the user came to you for help.
They may be stressed, under pressure, or simply not know how best to help you help
them, but they do need your help. Take as many deep breaths as required to let the
stress go, and then talk to the user sensibly and calmly.
Our primary job is not to write SQL to collect the required data, marshal it into some
form of report, and deliver that report to the users that need it. Our ability to write
intricate reporting queries may be unimpeachable, but this is only part of what it takes
to build good reports.
238
The job also requires us to be able to triage report requests, communicate with the
business, investigate a myriad of network, source data and desktop issues, design and
test reports, move data around environments and any number of other tasks.
As noted earlier, as soon as there is a problem with a report, the user will call the
person who made the report. As such, we frequently need to tackle an issue outside of
the “main scope” of our role. Being able to converse intelligently on a broad range of
topics and technologies really helps because it means the user gets a smooth, seamless
resolution, handled, or at least coordinated, by us. This can breed reliance but that is
the price of competence, and it builds significant respect amongst the team.
Table 1 lists the topics on which I would expect a report developer to be able to
converse intelligently, and the minimum level of understanding I feel they should work
to attain. It isn't exhaustive, and I'm sure many of you would debate its contents (see
the end of the chapter for my contact details!).
239
Table 1: Range of technical knowledge for report developers.
Ultimately, mastering all of these technologies is a means to an end, and that end is to
excel at conveying information . Having a broad skill-base and an open mind makes
it much easier to acquire any extra technical and practical knowledge needed, in order
to deliver consistently reports that convey information, in a form the company can
understand and on which it can reliably base business decisions.
240
Requirements gathering
Requirements are rather like Prometheus' liver. It's painful to expose them to the light
of day and they are always growing. We cannot expect to capture all requirements in a
single conversation with our users. We can do our best to get a good set of starting
requirements, but more often than not reports are iterative and the first draft is usually
a prototype that we will throw away entirely.
Requirements gathering meetings, done in the typical “big design up front” style, can
be interminable, stodgy affairs. Avoiding them is reason enough for some teams to
adopt a more iterative approach, often termed “Agile.” While not convinced entirely
by Agile (see Dilbert on Agile ,
http://search.dilbert.com/comic/Agile%20Development ), I do think prototyping,
constant communication with users, and rapid turnaround, are good ideas. Certainly,
while requirement gathering, I avoid formal meetings in favor of informal chats that
don't take up too much time and don't result in reams of paper. Try to meet at the user's
desk or a set of sofas nearby, to ensure that it's quick and easy for them to show an
example or scribble a diagram.
Some users shun all meetings, formal or otherwise. We can either say, “so be it,” or we
can try to counter the behavior by “just popping down” or “just calling” to ask a quick
question. Not every request needs a meeting, particularly if a user is more conversant
with reporting and it's a simple change, but a quick phone call can give the personal
touch that keeps the relationship strong and ensures reports are more likely to be
correct first time.
There's no hard and fast set of rules when requirement gathering for reports, but we
need to ask intelligent questions and, most importantly, listen to the user. Distil their
responses and repeat them, asking the user to confirm that you've understood what he
or she really needs.
The five Ws are a very good foundation for solid requirements gathering:
• Who – the report audience.
• What – the criteria and content of the report.
• Where – the format in which the report needs to look at its best.
• When – data update frequency, availability, delivery.
• Why – what questions does the report need to help answer? Which business
decisions will it inform?
The most important question is “Why.” Asking questions that tease out of the user the
reasons for a request, and give them a chance to explain what it is they really need. It
also gives us the chance to listen to the actual business requirement and evaluate the
best solution.
Most users find it hard to articulate their requirements and, often, they won't have
241
thought them through very thoroughly. These informal conversations are your best
chance to get them thinking about what they really want, and provide an opportunity
to discuss any potential changes to systems or processes it might engender.
Initial visualization
Once I understand the requirements fully, it's time to start prototyping. I'll write some
quick and dirty SQL to gather the sort of data I think I'll need and mock up a basic
report in Excel. Once I've finished prototyping I typically throw away my report query
and start from scratch, to ensure that it runs as quickly as possible.
I use Excel for all prototyping. Firstly, it's usually a very quick process to grab the data
and apply the necessary pivot tables, and secondly because it's a familiar environment
for most users, and they can often make their own amendments, or add notes.
While we're not overly concerned with table and graph design formalities at this stage,
it's easy to use Table styles (at least in Excel 2007 and later) to produce good-looking
reports in Excel and apply a few basic formatting principles (see, for example, 15
Spreadsheet Formatting Tips , http://www.powerpivotpro.com/2012/01/guest-post-15-
spreadsheet-formatting-tips/ ). If even your prototypes look good, it will help ensure
you get early, high-quality feedback from the users.
My design process starts with an initial raw data table, and from there I spawn further
graphs and summary tables based on the requirements, both the originally received
ones, and new ones arising from the user being able to see the data, until I've captured
everything needed for the user to achieve their aim.
I also take the time at this early stage to check for any trends or issues, knowledge of
which might add value, even if it extends a little beyond the stated scope of the report,
or that might cause unexpected political repercussions. For example, if your report
exposes the fact that call center agents are “dodging calls,” then we need to be
prepared for impact of that revelation.
242
Show Me the Numbers , for example, offers rare insight into the concepts that underpin
great report design, and provides many examples. In the following sections, I'll offer
some tips on how to present your graphs and tables, but the most important thing is to
experiment. You probably know of some particularly bad spreadsheets in your
company. Take a copy and give it a facelift, consider showing it to key users and
discuss whether making the changes to the real report would be of benefit. Since all
change has a cost, if only just the time needed to learn a new layout, it may be that the
shiny new report doesn't add enough benefit to be put live, but that makes it a perfect
opportunity to practice those requirements-gathering skills.
Tables
[A perfect table] is achieved, not when there is nothing more to add, but when there is nothing left to
take away – Antoine de Saint - Exupéry
The only good way to demonstrate good design principles is with examples, so I'll
show two examples of bad table design, explain why they are bad, though some of the
reasons are obvious, and then how to rehabilitate them.
I'm sure you can tell that it's a bad design, and spot most of the flaws, even without
having read any design theory. Table 3 lists the problems I see with this table and how
to resolve them.
243
Repeat repeated in every header
Yourself
(DRY)
1) Row labels aren't consistently 1) Rename rows
named
2) Amend or remove header
Labels 2) Incorrect vertical umbrella
3) Clarify label
header
3) Confusing last column label
Having corrected all these issues we have a much simpler and more legible table.
244
up the “Help me!” ticket raised by the confused user.
You need to find out why the user is struggling with the report, so you arrange a time
to chat and go through the requirements. It turns out that the user needs the report to
show the last month's Key Performance Indicators (volume and profit) in order to
make operational decisions on what flavors to produce next month and in what
volumes. The user also needs to decide which recipes or processes require
improvement.
The current report does provide the data the user requested (and not a jot more), but
doesn't present it in a way that gives the user the information needed to make these
operational decisions.
You decide to retain all the data from the previous report, but to structure it in a
different way. The key figures will be visible, with the extra data structured in a format
easier for any initial analysis.
245
its own table and reorder
remaining items
Most of the work here is a kind of pseudo-data-modeling. You're normalizing the data
by moving repeating elements into their own table and you're looking for the most
useful grouping of the residual data. A perfect example of how knowledge in another
area can really help! Knowing that the report might be useful to overseas divisions of
the company, you even try to ensure that the currency units are clear.
Apply these simple design principles to all your new report tables, and you'll start to
avoid the need for the sort of refactoring necessary to correct confusion and
misunderstanding.
246
Table 7: An improved version of the ice cream sales report.
247
• Column
• Stacked column
• Line
• Bar
• Pie
Many people advise against use of pie charts, for the very good reason that humans
find it difficult to gauge the real value of pie segments and perform accurate
comparisons. Pragmatically, it's hard to avoid them entirely, though I will try the other
basic chart types first.
Beware the siren call of every “must have” new chart type that appears in the latest
release of your reporting tool. Two recent examples are treemaps and bubble charts.
Treemaps make comparative volumes into areas within a square or rectangle. These
are easier to compare than circular segments but they make the reader look hard to
understand the comparisons. Bubble charts are quite good at mapping a third series
onto a chart, but comparing the sizes of the bubbles is visually difficult. In many cases,
the humble bar chart will do a better job than the treemap or bubble chart.
Most users will benefit from simplicity and familiarity. If the requirements really do
justify elaborate visualization techniques, then let your mind run unfettered by the
humdrum report and pick a chart that is perfect for the dataset.
In his book, Advanced Presentations by Design , Dr. Andrew Abela condenses into a
single diagram the various chart types and when to use each type, (see
http://bit.ly/Hro4Uu , reproduced here with his permission). I also recommend the
Chart Chooser supplied by Juice Analytics (http://bit.ly/MfReXD ), which helps you
pick the relevant chart and provides you with Excel and PowerPoint downloads of the
charts.
248
Figure 1: How to choose chart types.
Again, the best way to demonstrate good design principles for charts is by way of an
example.
249
Figure 2: A bad radar chart.
It looks like the developer, perhaps aware that the results weren't exactly jumping off
the page, simply kept adding in extra bits to make it look more important.
Changing from a radar to a simple bar chart, reducing graphical complexity, and
making the title more informative, allow the user to extract much more information
from the chart.
250
Figure 3: A simple bar chart succeeds where a complex radar chart failed.
No single tool is versatile enough for every reporting job. Most organizations should
have one centralized tool, and a few other tools that are good for other business needs
like ad hoc analysis, exporting to different file formats or rendering on different
devices and self-service reporting.
Determining what tools to have and to use is tough, given the investment in time and
money that can be required to learn and implement a new tool. Most organizations
251
should have one centralized tool, for reports of any reasonable complexity, or reports
that they need to distribute company-wide, over SharePoint. However, for simpler,
single table reports, most companies can allow a little flexibility.
Ultimately, your report is simply a vehicle for getting information to the user and, as
far as you can, you want to match the reporting tool to the requirements, not the other
way round. Users always have a set of “must-haves.” If a user has his or her heart set
on a great big pie chart filled with 20 shades of green then you should probably try to
dissuade them, by demonstrating a better approach during your requirements gathering
and prototyping. However, if they really need the data in Excel then you should
probably try, if it's a relatively simple report, to put it in Excel, rather than force them
to use a reporting mechanism that is more convenient to you.
Most organizations will also need one or two other tools to cover non-standard
requirements. New reporting tools appear regularly. If one catches your eye, I
recommend trialing it to see how much it helps you, as the report developer, and how
much it improves your ability to convey information to the user. It's hard to migrate
away from a reporting solution once it is in place, so you will probably need to
demonstrate significant benefits over your existing systems. However, analysis of the
business benefits encourages you to think about reports as a function, not an end in
themselves.
252
minimize the likelihood of errors. The European Spreadsheet Risks Interest Group
(Eusprig) offers a great set of tips for developing in Excel at http://bit.ly/ImponQ .
People who move into BI from a business, as opposed to a technical, background tend
to do so because they became Excel-addicted. Excel is the software equivalent of a
gateway drug, and I can tell you from experience that it creates a powerful hankering
for data!
On top of all this, new bits of kit for elaborate and powerful self-service reporting are
gaining prominence. Two such tools, are PowerPivot and Power View , both features
of Excel, as well as being tightly integrated with SharePoint. (If Excel is where all
reports originate, SharePoint is where many reports end up.)
253
Significant learning will be involved in picking up these tools, and considerations
about the risk of using software from smaller (or even non-existent) companies is also
a requirement when deciding which tools to use.
A personal story
When I started with my current employer, reporting tools had a very heavy open-
source emphasis, plus a strong reliance on Excel. This meant that reports either
conformed to a single table layout on a custom PHP front-end, or were coded
individually along with VBA, for regular refreshing of data. As a result, it took a long
time to turn around reports.
Many people wanted information that we could not deliver with the existing platform
because our report front-end could not connect to it or because they often needed the
information with a rapid turnaround and we simply could not code it quickly enough.
Where we couldn't deliver, some people would get angry but, worse still, others would
254
stop asking for reports and simply made do with extensive workarounds.
Working with different departments over the past few years, we've implemented a
number of solutions:
• A streamlined automated Excel and simple PDF system.
• Excel 2010 installation with training and support for producing their own reports.
• Statistical reports using R.
• Real-time dashboards on big screens using SSRS.
• Complex formatting and calculations in reports using SSRS.
Over time we trialed a number of solutions with different departments that didn't quite
work out, like Report Builder 3.0 or refreshable PowerPoint slides.
Not everything has worked, and we're still learning, but crucially we're still making
progress and improving, thanks not just to the new reporting tools, but also to a vastly
improved approach to education, both for our report writers and our report users.
We produced a decision tree as guidance to our report writers so we consistently
produced reports in the “right” tool. It was also incredibly helpful for us when it came
to framing discussions amongst ourselves and with others in the business. Internally,
we are able to quantify the burden of each system in relation to the proportion and
value of the reports utilizing it, and we can discuss replacements for focused scenarios
instead of having opinionated discussions about a very large area. Within the business,
it also helps us to explain how we are using resources, and why we are delivering
improvements in certain areas, on top of being able to identify new scenarios which
might need to be taken into account, and homing in on what tool is great for that
scenario.
More importantly still, we started educating users on what they can do with existing
tools, teaching them good visualization principles, and slowly introducing them to the
way the various IT systems interact. In short, we embarked on an education process
like the one espoused in this chapter. As a result, there has been an increase in well-
thoughtout tickets, improved relationships, and decisions becoming more fact-based.
255
Figure 4: Flow diagram of report implementation decisions.
Summary
In this chapter, I've tried to provide a whistle-stop tour of what I believe you need to
know, and how you need to work, to be a good report developer. Report-writing skills
is only one part of your skill set. You need excellent communication skills to find out
what the business really needs from a given report. You need broad-ranging technical
256
knowledge to help ensure the performance and availability of your reports, and to help
end-users as issues arise. You need a solid understanding of visualization theory and
the ability to apply it to produce clear and concise reports that help the user understand
key trends and make the right decisions.
I may have scared you, or I may be preaching to the converted, but all components
take time to implement fully in any business, and the job is never done. Even if you're
not a report developer, think about how well you and your team perform in the three
key areas covered in this chapter (communication skills, visualization, and
tools/technology). How do you, as a DBA, communicate with relevant parties to make
sure they understand why their applications run slowly? How do you, as a developer,
evaluate your website designs? Upon what information do you, as an IT manager, base
technology decisions. How strong is the link back to clear business benefits?
What gets measured, gets done, so evaluate and make a plan with some metrics and
targets on how to improve in these areas. Track your metrics in a report and email it to
me!
I'd be happy to receive feedback about this chapter, or to discuss/clarify any of the
contents with you, so feel free to email me on STEPHANIE.G.LOCKE @ GMAIL.COM or get
in touch via Twitter (@SteffLocke ).
257
Communication Isn't Soft
Matt Velic
258
It may seem odd, but reading is the prime activity that can improve your writing
ability. What you choose to read doesn't matter, though I would suggest a mix of books
you enjoy, perhaps fiction or history, and books that challenge you, such as technical
volumes and professional books.
The logic behind this tip is that by reading more, by consuming more information,
you're opening yourself to new ideas. On a more molecular level, you are studying
words and sentence structure, and learning how other writers share their ideas.
Reading can help ease that “I-don't-know-how-to-start” feeling when you face the
dreaded blank page.
259
Twitter by example
The following are three tweets that help show the differences between good and poor
communication techniques while using the service.
Say, can anyone help me with this problem I'm facing with my log files? I cannot
remember for the life of me if I should only have one log file or if SQL Server can
utilize multiple log files. Thanks! #SQLHelp
While this first example is friendly, it's too long at 210 characters.
any 1 no if sql use more than 1 log? #sqlhelp
This second tweet is short enough, using only 45 characters, but the question is
difficult to understand because of the “txt spk.”
Should SQL Server use only one log file? Or can it utilize multiple logs for better
performance? #SQLHelp
Finally, a tweet that is both concise (at 105 characters) and clear. This is the balance
one should strive for when using Twitter.
260
these writers has wonderful balance and style in their articles. Learning to spot
unbalanced sections in your own writing is a part of growing as a writer and editor.
261
One of the challenges to practicing public speaking is finding an audience. Writing
requires us to find the appropriate way share articles with our readership, but the
timing between publishing and reading is asynchronous. Speaking requires an
audience in the literal sense, or does it?
Thanks to cheapening technology and the popularity of YouTube and Vimeo, video
cameras have become ubiquitous. You can buy a basic camcorder for about $100.
Phones have the ability to shoot video, as do most tablets, iPads, and laptops. All this
technology creates a new kind of opportunity to practice speaking without an
immediate audience. Simply record yourself, play it back, and take critical notes, not
only about how you speak but also about your body language.
You may find you have verbal ticks of which you were unaware. The ubiquitous one is
use of the word “Um” between every thought, but other common ticks include saying
“and,” “ah” and “like” too much. I once had one professor who had an unconscious
tendency to say “Dontcha know” between sentences. In one class, she deployed her
famous catch-phrase over a hundred and thirty times.
Equally, you may learn that you need to improve your visual communication skills.
For example, you may notice that you have poor posture, that you move around too
much and seem fidgety, or that you don't move around enough and appear lifeless.
Similar to editing, the purpose of the video is to turn a critical eye towards your
performance, attempt to identify and correct poor habits, and to bolster positive ones.
It creates a way to edit how you present yourself.
Overall, videos are a great way to get started for anyone wishing to improve their
verbal and visual communication skills. You can practice the script, shoot as many
takes as necessary, edit out the rough parts, and add a fancy music track. You can
publish your best videos on YouTube, let people know about them through Twitter and
start to interact with your audience through comments, user requests, and follow-up
videos.
However, if you have ambitions beyond this, for example to speak at conferences, you
need to start to learn how to handle the anxiety that often accompanies a live audience.
Lunch-and-learn
Even seasoned speakers get nervous in front of an audience but, with practice, they
learn to turn that nervousness into an energy that fuels a presentation rather than
bogging it down.
Lunch-and-learn events are one of the best ways to start practicing. All it takes to get
started is one coworker asking you, “How does that work?” and all you need to do it is
a conference room at your office, a projector and some food.
Why are lunch-and-learn events so valuable? First, they are not meetings. Most people
hate meetings, because the majority of them are a waste of time. Second, there's food.
Lunch is a great way to kick back, forget about that production issue for half an hour,
262
and bond with your coworkers. Third, because of all this good will, it's disarming. It's
a relaxed environment to present ideas. When the projector won't start, you can shrug
it off with a smile; when Management Studio goes boom you can all laugh at it
together.
Even better, if you have ambitions to take your presentations beyond the office, to the
conference, you'll start to gain an understanding of what works and what doesn't work.
Your coworkers can help identify topics that are the most interesting, questions that the
audience are likely to ask in the future, and trim areas that are too slow, complicated or
awkward.
You can even bring along your camcorder to video the session for later review, which
can be an easier solution than trying to scribble down notes. Of course, you could also
turn it into a bloopers reel for the upcoming holiday party.
Finally, there are international events such as the PASS Summit and SQLBits in the
UK, where competition is fiercer still. However, if can attain this level, you can be
certain that your practice has paid off. You've become a known entity, a thought
leader; but keep practicing!
263
is always in demand, and once you acquire it, it will help you in many and varied
ways.
Self-belief
When I first received the crown of “Accidental DBA,” I had no confidence in my
skills. In fact, I had no skills. I could barely write a query, let alone protect production
data. Instead of being overwhelmed and doing nothing, I read, studied, and practiced.
When critical needs arose, I knew how to handle them and had the confidence to do so
because of the practice.
It's the same with communication. The experience that allows you to act calmly and
decisively when the production environment goes down is the same that keeps you
rooted when your laptop dies mid-presentation.
Communicating expertise
The point of practice is gaining confidence in your abilities. Having gained this
confidence and belief, you will find that you are much more effective at
communicating your expertise.
Having mastered the art of precise communication, through Twitter, these same skills
will help you improve your résumé. Just like Twitter, a résumé is a word-restrictive
communication medium; you have only a page or two at most to present yourself in
the best manner possible. It means choosing the words with the most impact, words
that have the greatest meaning. Treat each bullet point item, each nugget of
information about your skills and experience, as a finely honed tweet.
Likewise, having found your inner editor and voice through writing blogs and articles,
you can use these same skills to perfect your cover letters. Set out why the company
ought to hire you, build support for your argument, and close out strongly. Article
writing can also help you write cleaner documentation for your products or support, or
to edit concisely technical emails to your business users.
As for speaking, confidence in your ability goes a long way in any venue. It will help
you handle interviews confidently, both with your manager, during annual reviews,
and with prospective new employers. You'll also communicate more effectively with
non-technical users, finding yourself better able to explain issues simply, and through
the use of analogies.
264
should strive to overcome. I've provided a number of fun ways to practice
communication; activities that can help the practice feel natural, all while working
towards the goal of bettering your skills. You don't necessarily need to take the route
of becoming a public speaker, or community blogger, and there are many more
activities that can help you practice that might make more sense for your situation. For
example, if you enjoy trying new restaurants, you could strive to write reviews of your
experiences on review websites, such as Yelp. If you are involved in your local church,
you could practice your public presentation skills by taking up readings. Always feel
free to experiment and test different waters, because while planning and practice make
perfect, the confidence you gain will help you succeed when life places an unexpected
obstacle in your path.
Finally, always remember that communication is two-way. Whether you're writing,
speaking, or taking photographs, always encourage interaction and feedback. Ask if
you are communicating your message or if there's anything, you can change to make
that message clearer. Always be open to constructive criticism and you will continue
to grow as a professional and as a professional communicator.
265
Guerrilla Project Management for
DBAs
David Tate
This chapter won't contain any hot T-SQL code or cool execution plans. In fact, the
project management tools and advice it contains will not improve your coolness factor
at all, although perhaps your choice to become a DBA sealed your fate, in that regard.
Deep in your heart, though, you know that your job as a Database Administrator
comes with a lot more responsibility than just speeding up queries and fixing broken
servers. Managers expect database administrators to be shape-shifters. We need crazy-
deep knowledge of the technology (did you know that SQL Server 7.0's favorite color
is black?), communication skills like one of those people on TV with the talking and
the podium and stuff (I didn't say I had these skills), and the organizational skills of a
retired drill sergeant with OCD.
For DBAs, project management skills, of which organization and communication form
a core part, are both our weapon and our shield. With them, we can fight off the causes
of long hours and stress, defend against the subjective politics of our employers, and
protect our right to do high quality work. Learn them and you need never feel
overworked and undervalued again.
266
• Consulting on development projects
“Hey, can you help me debug this cursor inside a trigger that I wrote to update my
Twitter feed?”
• Maintenance projects
“Why does your status report always include the item Make Peace Offering to the
Replication God ?”
• Long-term build-outs
“Remember when I should have asked you to build a new production server a month
ago? Is that done yet?”
267
salary. There is no waiting period for its use and typically the sniper fires from long
range.
In my experience, these specific situations, in particular, draw a DBA into the laser
sights of the blame gun:
• A project timeline changes and your tasks suddenly overlap with another
project
“Well, our project shifted back two weeks and so your part of this project now
overlaps with your part of the other project and I miss the part where this is my
problem.”
• Unplanned work overwhelms your working time and leaves you unable to
execute on any planned projects
“He didn't have the development database server ready in time, which is funny,
because the word database is in his title.”
• You perform maintenance work, even planned maintenance work
“I couldn't care less about Windows updates that you applied last night; I want to
know about this security hole I heard about this morning.”
• People “misunderstand” your day-to-day responsibilities
“Dave? Oh, he's in charge of making sure we don't use the first column in our public
Excel documents.”
The more DBAs are exposed to the rays of the blame gun, over time, the more jaded
they become, until the unending trail of half-truths and perceived failures turns their
beards into Brillo pads of fear and hatred.
OK, I'm being both flippant and melodramatic. The serious, underlying point is that
many DBAs are very bad at managing their time, managing expectations, and
“advertising” to the organization exactly what it is they do. However, you can handle
your numerous and diverse projects and you can avoid the blame gun, just by applying
some simple project management techniques.
268
• Who is doing what? (resource assignment)
• How could it go sideways and fail? (risk plan)
• What is the plan? (project plan baseline)
• How is it going? (project plan)
For every task that takes more than a few hours, you should be prepared to
communicate the plan to your manager in the appropriate form. For smaller projects,
this may take the form of simple answers to the Who/What/When/Why questions, as
illustrated here for an index maintenance task:
• Why? Maintaining indexes makes everything faster and brings great honor to our
company.
• When? We perform the maintenance outside peak hours of usage.
• Who is doing it? An automated job, and the Database Team is responsible for
making sure it runs.
• How could it fail? It could fail to run, or run for too long. It could run and fail and
not be noticed.
• What is the plan? Every 3rd Tuesday.
• How is it going? It's been running cleanly for one week. We made one change to
speed it up.
• What changes to the original plan have occurred? We changed it so that we have a
process to re-evaluate what it does as we add databases and new systems to each
machine. The notification system works like this…
As DBAs, we need to learn how to communicate our plans in the correct terms,
depending on to whom we're reporting. A manager does not want to hear a winding
tale of how there was no 64-bit driver, and the ensuing search for suitable third-party
components. They just want to hear about how it affects the plan (“We burned through
40 hours due to Risk #3, which we couldn't mitigate. ”).
For larger projects, we need to create proper documentation. The following sections
describe what sorts of documentation are justified in these cases.
Project charter
The project charter is a document that explains why you're doing the project. If you
have to reread this after every meeting to understand what is going on, then you are
officially on a massive corporate project. For the project of getting my family out the
door to go to the zoo on a Saturday, our simple example, my project charter might look
269
something like this:
We will visit the local zoo and strive to have a safe, fun day in the eyes of our children. We will teach them
about animals. We hope not spend more than $500 on food, drink, and gas.
270
milestone, until complete, fully occupies both parents. Everyone else might be ready
to go before this happens, but we can't leave until it does.
Resource allocation
This is project management speak for “who is doing what.” As the project grows
larger, so the terms employed increase in number and complexity. On a two-person
project, we assign tasks; on a sixteen-person project, we allocate resources; on a
seventy-six-person project, we allocate resources according to the Resource Allocation
Matrix Committee Procedure Strategy Giraffe Hat Check-list (OK, I made up the bit
about giraffe hats).
Burndown rate
How quickly you spend the money allocated to a project. An hour into our Zoo project
we are all holding large drinks, for which budget was allocated. The large stuffed
panda that the 4-year-old is holding was an unforeseen expense.
Checking-in
A term used for the act of a project manager creeping up behind you, during the 30
seconds in which you check your personal email, to see why your task on the critical
path is not yet complete. Also called “touching base” and “sync-ing up.”
At risk
If a task or project is “at risk,” it doesn't mean it grew up in a bad neighborhood: it
means it will probably be late.
271
Outsource prioritization for items assigned to you
Department A and Department B don't like each other and both have work for you to
do. According to the manager of each department, their project is as important to
society as the smallpox vaccine, and he or she demands that you work on it until the
project is finished and/or you collapse from exhaustion.
Each time you receive a work request, note down a brief description, the name of the
stakeholder, the request date, and the priority in terms of expected completion date (at
least according to the stakeholder).
Department A's task might look like this:
• Title : Create a database server for Department A for the tracking of Department
B's birthday cake consumption.
• Stakeholder : Department A's assistant manager (a.k.a. Sudden Def ).
• Priority : Should be complete before the next time that you blink or you will be
fired in a humiliating way.
• Date reported : 3/20/2012 3:02 PM EST.
When Department B's manager swings by, calmly write down the next task:
• Title : Create a database trigger on Department A's vendor tracking database that
removes every odd invoice record from its database.
• Stakeholder : Department B Assistant Manager (a.k.a. Your Median Nightmare ).
• Priority : Should be complete before I get back to my desk or I'll send you an
angry email, and then request status updates every 15 minutes until I fill your Inbox
quota.
• Date reported : 3/20/2012 3:12 PM EST.
What do you work on next? What side do you take? The fact that you have friends in
Department A shouldn't really affect which project you work on first thing the next
morning.
In fact, you cannot make this call. You are here to work on the technical execution, not
on sorting tasks based on business priority. Without a clearly established priority, we
tend to sort by degree of interest, level of difficulty, or what we can describe best as
“DBA order.”
None of these is easy to explain to a non-technical person in a heated political
situation. You need to stay away from these potential political fights. In all cases, you
should outsource the prioritization of your work to your boss, and establish this as an
understood process. If you work at a small company, where there is no one who can
break these ties in your organization chart, then you simply put the two stakeholders in
contact and ask them to arm-wrestle to decide which is the most important. You cannot
make an arbitrary call so do not begin work until someone makes a decision.
272
Never say “No,” just communicate that everything has a
cost
One tip you pick up from consulting (more useful but equally important as being able
to hold an intelligent conversation after three drinks) is the idea that you “Never say
No.” Does this mean you wind up abused and over-committed, based on the priorities
of others? Absolutely, yes…wait, I mean, no! It means that you learn to communicate
that every task has an associated cost. This forces the project owner to take more
responsibility and think more deeply about hard prioritization tasks.
Let's walk through an example:
Dale : Hey, Dave. I don't think I ever got a chance to thank you for standardizing us on
SQL Server after years of a mix of Access, complex systems in Excel, and various
versions of custom vendor databases. However, times change and it feels like we need
to start moving with the NOSQL flow. I've read several white papers on MongoDB,
and talked to a guy with a double nose-ring at Borders, and I'm pretty excited. It's free,
fast and elegant. I think it will revolutionize our inventory systems. I need you to
sunset SQL Server on these systems, and move to MongoDB immediately across all
projects.
Dave : It's a terrible idea on so many levels. Let's fight to the death.
Bad answer ; try again.
Dave : OK, cool, that sounds like an exciting move, but let's do some quick pre-
planning to decide the best first step.
Moving to MongoDB is a big shift; that's OK but, by my estimate, eight members of
staff will require additional training. Let's estimate that each hour a member of the
team is not working but instead taking the training hour, as costing us $150/hour. If we
assume that two of them will need in-depth training, and the rest need an overview of
about a week, then the first two can provide additional internal training for the others.
That's four weeks for the pair, plus an additional month tapered out over the next few
months, which gives us a total of (Dave types in Excel like a twitching hamster )
$32,000 for the primary training and $24,000 for the secondary training. Of course, we
ought to include a cost estimate for delays to existing scheduled projects during the
ramp-up period, let's say $12,000. That gives us a total cost for training and ramp-up,
which we will call Milestone 1, of $68,000.
Next, we'd need to enumerate the in-progress projects and the existing systems and
decide which ones will be the first to move from SQL Server to MongoDB. We have
four projects in progress, one of which is quite large and 80% complete, and 16
existing systems…
Good answer.
Dale : You know what? This has given me a lot of new good information. Let me go
273
back and think about this.
274
In order to avoid a blast from the blame gun, it's vital that a DBA avoids scheduling
conflicts that involve two projects entering “crunch time” simultaneously. Given that
project timelines tend to be about as stable as Jell-O on a treadmill on an airplane, this
is easier to say than to do.
However, DBAs must always track potentially conflicting timelines, and throw up a
flare as soon as they smell any trouble, as project plans shift.
275
work more accurately.
Figure 2 shows a typical check-list for an “Apply Patches” maintenance task. It
exposes the preparation and clean-up tasks that accompany the actual execution, and
will give your boss or other stakeholders a proper appreciation of the work involved
and how long it takes.
276
A good example of a technical debt is the patched-up network infrastructure that really
doesn't do what you want it to, slows down your backups, and causes occasional
failures, which lead you to wake up and fix them about once a month. So how do you
convince your boss to spring for that new router and a second NIC on your database
server? Many DBAs resort to emotional complaints, such as “Man, that is a pain!” or
“I missed watching the American Idol finale last night because the backups failed!”
Unfortunately, your boss doesn't care. However, every time it fails, that flaky network
infrastructure is costing your organization money. You need to calculate that amount
and communicate it, repeatedly. When you tell your boss, repeatedly, that the network
issues are costing the company $13K a year, or $4K every time they cause a failure,
then eventually the part of their brain that does mathematics will stage a non-violent
sit-in until the hand starts writing checks.
Calculating the cost of an ongoing problem is a complex problem but even a rough
estimate can clarify thinking. For a simple down-time event, consider the following
costs:
• Resource time – how long it takes you and others to deal with the issue. Estimate
resource time based on simple aggregates such as $100/hour. Remember that while
some people are fixing the problem, others are communicating the problem to
clients; include all their time in your estimates.
• Shifts in schedule – every time you work to resolve down-time you delay planned
work, in a domino effect. Even spending 10–20% of your time on unplanned work
can have a big impact on the confidence level you and the organization can place on
any future project planning.
• Business loss – down-time of internal or external systems means money lost,
directly or indirectly, by the business. For external systems, you can estimate the loss
based on transactional volumes; for internal systems, estimate based on the time
wasted by those that depend on them. For any type of system, keep in mind the
reputational cost of a system that your customers or employees can't trust to be
available.
• Cultural costs – if your organization spends all its time fighting fires then,
eventually, only firefighters will be successful and not those that can resolve or
prevent the types of problems you are experiencing.
Justify hiring
Are you overworked? How do you prove this? Go to sleep during meetings and send 3
a.m. emails? Dye your hair red and insist people call you “Cranberry Dan the No
Sleep Man”? Show your boss your DVR list, with 87 hours of unwatched television?
Again, these responses are too emotional and not likely to have the desired effect,
which is to convince your boss to hire someone to help. Put another way, you have to
help your boss justify hiring someone to help you, and the correct way to do this is
with data.
277
When you decide you need a hand, you have to build a proper business case to support
your cause. Following are the sort of metrics you need to track and report:
• The amount of time you spend on unplanned work.
• The long-term fixes you can't do because you don't have enough time.
• Projects started but now “shelved” due to inadequate resources.
• Projects never started due to lack of resources.
• Projects hamstrung with technical debt and riddled with Band Aids.
• Tasks that are falling through the cracks because nobody is there to catch them
(proceed with caution here). Are you planning for future growth, up to date on new
security best practices, or fully familiar with new technology that could help your
company solve its challenges?
Onwards
It is very easy for highly skilled technical people, with years of experience in a
competitive market, to rest on their technical skills to prove their value. However,
equally important is the ability to speak the language of the business, track work and
278
costs, and stay out of trouble.
In this chapter, I presented, with a degree of levity, my project management
techniques. However, they really can protect you from overwork and under-
appreciation, and they can help further your career and build for you a better future.
At some point in your DBA career, you will face the same decision as many other
technical roles: should I move into Management or stay Technical? Most organizations
will try to support you with either choice. However, only if you have mastered project
management will you have any chance of moving into a role where you are making
things happen with your team rather than your fingers. Using the techniques discussed
in this chapter, tracking the cost, risk, and politics of each task that you do, you will be
able to explore either option.
279
Agile Database Development
Dev Nambi
More and more software development teams use “Agile” methods. Done well, Agile
improves software quality and makes development and releases more predictable.
Unfortunately, these are not the typical results for “early stage” Agile adopters.
Instead, we see:
• an ill-considered rush towards lots of features, with inadequate testing to avoid bugs
• daily, automated deployments, without getting them right
• emergent (read, chaotic ) system design, and stream-of-consciousness programing
leading to “spaghetti code”
• exponentially increasing technical debt and developer frustration.
Teams stop trusting each other. Human sacrifice! Dogs and cats, living together! Mass
hysteria! Teams adopting Agile take care to improve designs continually, as well as
their testing and deployment practices. Agile leaves little room for error; it requires
good judgment.
Agile database development is particularly hard because databases contain state and
must ensure data integrity. They are harder to upgrade or roll back than the front end
of a website and so are more amenable to up-front design than continual refinement.
Lastly, database developers and DBAs tend to have less experience in Agile practices,
leading to additional struggle in the early stages. This chapter will explore the history
and principles of Agile development with an emphasis on how we can apply Agile
practices successfully to databases.
280
more releases. The shift from plan-driven to feedback-driven work can also reduce
business risk, since the team builds fewer unneeded features, potentially reducing
technical complexity. Continuous feedback also makes it easier to abandon bad ideas
and bad code practices.
Agile development also makes compromises, and adds new risks. The main
compromise is trading away time spent up front on design, for more frequent feature
delivery. Agile teams spread design time over numerous iterations, but the lack of an
initial “unifying” design means that development teams must work hard, adopting best
practices and deploying good judgment, to avoid creating incomprehensible system
architectures. Maintenance time often suffers too, in the push to iterate and progress
continuously. Without careful management, this is what leads to the buildup of the
technical debt, i.e. the cut corners, inelegant architecture, incomplete tests that,
unchecked, gradually take a toll on system performance and developer productivity.
This is particularly bad for databases because they often support multiple applications
and, unless we take the time to design proper interfaces, database-refactoring can
happen only when all coupled applications can change, meaning it happens at the pace
of the slowest application team. The number of deployments increases, requiring
investment in streamlined deployment practices. Ultimately, of course, this is a good
thing, but the amount and pace of change is a shock for some professionals, and
keeping up requires good communications skills. Agile works best with smart, curious,
and experienced engineers.
It's all fun and games.
Automation
Repetitive tasks haunt the lives of most IT professionals: checking email, filling out
reports, responding to bugs and tickets, and so on. However, we can automate or
streamline many of these tasks. Most DBAs use SQL Server Agent jobs, scripting
languages like PowerShell or Python, SSIS packages, and vendor applications to
281
automate many aspects of their monitoring and maintenance, especially for tasks that
happen late at night. Many developers use C#, scripting, and tools such as MSBuild,
MSTest and source control command-line applications, to automate their build and test
processes.
However, most DBAs don't automate documentation, deployments, SQL installs,
patching, and so on. Developers don't make the leap from automated builds and testing
to continuous integration or tool-aided refactoring work. Even if we can't automate a
whole process, we could probably automate most of it.
Automating your repetitive tasks
For DBAs, my favorite automation project is to tie together monitoring scripts with the ticketing system.
That gets rid of 20% of my day-to-day work right there, if I script it carefully enough. For development
work, I like to have a single-command build-and-test script. This prevents me from checking in a broken
build or failed unit test at least once per new feature.
Here is my recommendation to help you extend your automation. Start with the
automation tools you already know, and get more creative. Then, every four months,
pick one additional tool, something that's useful and easy to learn, and add it to your
arsenal. Most importantly, spend a little time every week automating away your most
common headaches.
A few common tasks take up a lot of time that you can save if you automate them,
even partially:
• Automate your monitoring alerts to update your ticketing system (and vice versa).
• Automate your monitoring alerts to trigger your maintenance processes (and vice
versa).
• Automate your email inbox to coordinate with your to-do list.
• Automate your build process with your test process and with a developer check-in.
Every time a developer checks in a change, build the code and run a set of unit tests
against it. This improves code quality almost immediately.
• Build a script that can deploy a single set of changes, either one database or several,
to a single environment. Targeting development or test environments is usually safe.
Make sure it has a rollback mode.
• Create T-SQL scripts that check for issues such as non-indexed foreign keys,
GUIDs in primary keys, tables without interfaces, and permissions without roles.
Turn them into database design-validation tests, and make sure they run every time
you check in a change.
You will never run out of tasks to automate and streamline . I know three brilliant
DBAs who work two hours a day. How? Early on, they spend a month of weekends
automating most of their jobs. If something fails three times, the same way, they
automate it. They never let on how much of their job they automate. This gives them
the time to work with developers, Project Managers and testers to make the databases
more stable overall. The key is to make this “automation work” a part of your daily
282
routine.
Balance
Switching to Agile development creates a lot of change, perhaps the most dramatic
one being the shift from development work that is planned and deliberate, to intuitive
and rapid. This can be perilous. The worst results are:
• Rampant technical debt – code bugs and design flaws pile up rapidly and go
uncorrected because they compete with user features for developer time.
• Deployment time and code quality for each release don't improve despite a
dramatic increase in the number of production releases. Every release causes a lot of
pain.
• Business managers , delighted by the sudden potential for lots more user features,
come to expect a feature delivery rate that is often unsustainable.
• Teams swap current processes and practices for new ones, with reckless
abandon , and without considering their relative merits. Consequently, they
encounter problems they previously avoided.
Blessed with logical engineering leaders, we can institute some important balance
measures. For example, it's a wise practice to set aside 10–20% of development time
to pay the “maintenance tax,” in other words, reduce technical debt and fix operational
bugs.
In addition, allow a fixed, reasonable amount of deployment time per week.
Developers can deploy more frequently, in the longer term, if they are allowed time to
streamline and clean up each individual deployment.
The key goal is to aim for a steady, sustainable pace. As soon as a Project Manager for
an Agile team detects the build-up of technical debt, a large backlog of deployments,
or a long list of security patches starting to pile up, then it is time to pay some
maintenance tax, and push for a better balance. Ironically, creating a successful Agile
team is a marathon, not a sprint.
One great idea that encourages balance is to reward good behavior, and introduce a
mild penalty for bad behavior. I worked in a team where developers who broke the
build had to bring in donuts for everyone. Conversely, the IT staff rewarded with beer
any developers who made the build or deployment process significantly easier. It
worked beautifully.
Communicate, constantly
One big change with a switch to Agile is a dramatic decrease in documentation. This
can be a benefit; thorough documentation is always out of date and rarely used.
Without documentation, the code, including the tests, is the documentation . This,
coupled with the rapid rate of change of the system design means that people , rather
283
than documents, become the primary source of domain knowledge. In order for all of
the team to keep up with this pace, and still understand the system and where it is
heading, they must communicate all…of…the…time .
There are many different ways to communicate. Unless you are telepathic, the fastest
way is speaking in person. Face-to-face communication is high bandwidth . The
second most effective way to communicate is via phone. Spoken language is very
efficient; we can convey about 150–200 words per minute, which is 2–4 times faster
than we can type. Also, speaking lends itself to back-and-forth communication, which
helps people ask questions at just the right time. I'd estimate speaking in person or
over the phone is easily 5–10 times more efficient than instant messaging or email,
meaning that we should be able to communicate in 10 minutes what would take close
to 2 hours of email time.
One of the most common and effective spoken-word communication techniques is the
daily scrum or stand-up. Having DBAs and developers at the same daily stand-up
works wonders; developers learn about how the system is working, and DBAs learn
about imminent code changes.
With this change comes opportunity. For DBAs, constant communication makes it
easy to keep developers in the loop about production issues. It also makes it easy to
ask for fixes to the most annoying issues of the day, and to get these fixes quickly .
DBAs also get the opportunity to provide input into design choices that might affect
database performance. I have worked in teams where DBAs contributed to all code
reviews; the result was very stable code.
For developers, constant communication provides feedback about how well our code
is doing, both good and bad, and allows us to fix production inefficiencies and bugs.
This gradually helps us hone our craft; we learn very quickly what ideas and
approaches work, and which ones don't. That means faster, cleaner, better code over
time.
Design
Good design saves lives, usually your own. I use the SIIP approach to database design:
Simplicity , Integrity , Interfaces , and Patterns .
Simplicity
284
Keep your design as simple as possible. There's a famous acronym for this: KISS
(Keep It Simple, Stupid). Features being equal, a simpler design is easier to fix and
extend than a complex design. You need to be twice as smart to troubleshoot a system
as you do to build one. If you build the most complex system you can, you will not be
able to fix it.
The most important objects in a database are its tables. They store the data and enable
data integrity. The best way to ensure accurate data is to use a clean and simple data
model. For an OLTP database, that means as much normalization as you can handle.
Designing to anything lower than third normal form (3NF) usually causes painful
headaches. Key-value tables are notoriously painful because they force the application
to handle data integrity.
Some database objects are pure code. Stored procedures, views, functions, triggers,
and metadata are pure code. They are similar to code objects in other programming
languages. They should have the same standards of quality (unit tests, security
considerations, versioning, and so on.
Integrity
Data integrity matters in a database. Unlike C# or Java apps, databases store data, so
data integrity is not optional.
The data integrity features of your database design are of prime importance. Start with
data types, primary keys, foreign keys, and check constraints. Beyond that, there are
data integrity implications in our choice of nullability, defaults, and the use of triggers.
Triggers in particular are a double-edged sword; their hidden nature and support of
complexity can lead to unexpected problems.
Interfaces
An interface is the (hopefully) simple contract between your system's guts and any
application that calls it. Interfaces enable decoupling: the ability to separate what an
application expects from its implementation. I have found that we need interfaces
whenever:
• a database is used by multiple applications
• the database and application code change at different speeds.
Inside a database, interfaces are stored procedures, functions, and views. Other types
of interfaces are Object Relational Mapping (ORM) and client-side tools. The most
important, and most obvious, use of interfaces is to decouple a table's physical schema
from the application using it. That way you can change the two independently. Having
well-defined interfaces will save you a lot of pain.
Patterns
285
Use patterns. Similar functionality should look the same. Database developers often
use patterns to ensure a set of consistent naming conventions and data type standards.
The goal is consistency: stored procedures, tables, column names, security practices,
should be similar. Two stored procedures that do almost the same thing, for example,
writing one record to a table, should look similar. For example, they should all use the
same conventions and standards for parameter names, logging, join patterns,
documentation, and so on.
Patterns are effective when they are widely used. Agile is a team sport, so the best way
to adopt a set of patterns is through democratic practices like discussion and voting.
The same goes for eliminating patterns that the team don't find useful.
Make sure you can enforce patterns automatically. Writing them down in a Word
document does nothing at all.
286
Deployments
Having a set of deployment best practices and tools can make it easy to test and
release code, especially if you have multiple environments (development, test,
production, and so on).
The following simple equation can help the team estimate what time they need to
allocate to manual database deployments:
Total Available DBA Time for Deployments =
[Manual Time per Deployment] * [Average Risk per Deployment] * [Number of
Deployments]
To increase the number of deployments, the team needs to automate and streamline the
deployment process to reduce time and risk, or they need more DBAs! The only
alternative, fewer manual deployments, is what happens in many “Waterfall” teams.
Database deployments are different from other application deployments in one critical
way: databases contain data. It is relatively easy to upgrade a website because the
ASP.NET or PHP pages don't have data in them, so we simply overwrite the entire site.
With a database that is impossible, because dropping and re-creating a database drops
all of the information, negating its raison d'être.
There are three tenets to database deployments: keep them robust , fast , and specific .
Robust
The first tenet of database deployments: make them robust . You should be able to
rerun a set of scripts, regardless of whether someone ran them previously, and achieve
the same result. As a simple example, instead of having a script that runs:
IF OBJECT_ID('dbo.foo') IS NULL
BEGIN
CREATE TABLE dbo.foo…
INSERT INTO DeploymentLog (Description) SELECT ('Created table foo')
END
Alternatively, we could code this such that, if the table does exist, the script fails
immediately.
You should also record the actions of all previous deployments in a deployment log,
ideally populated via DDL triggers. Having rerunnable and logged scripts is very
useful. If your deployment fails at Step 4, you can restart it from Steps 1, 2, or 3
without worry.
287
Another pillar of robust deployments: always have a rollback mechanism to undo the
changes made to your database during a deployment, and return the database schema
and objects to their predeployment state.
To roll back a stored procedure change, redeploy the previous version. To roll back a
new table, drop it. Instead of dropping a table during an upgrade, rename it out so that,
if necessary, the rollback can rename it and put it back into place.
Fast
The second tenet of database deployments: make them fast . Agile teams deploy
anywhere between once a week, to ten times a day, a pace unthinkable in a Waterfall
environment. When dealing with a higher frequency of database deployments, it's
important to make sure they don't disrupt users continuously with long (or, ideally,
any) service outages.
Code objects
For objects that contain no data, the key goal is to avoid user disruptions during
deployment. A common way to change a stored procedure is to drop it, if it exists, then
recreate it, as follows:
The problem here is that if a user session calls the procedure in the period between
dropping and creating it, then the user's operation will fail with an “object-not-found”
error. This technique also removes any permissions associated with the stored
procedure, so we need to reassign them to allow users to access it, once it's re-created.
A better approach is one where we don't drop the object. If the object doesn't exist we
CREATE it, if it does we ALTER it. This technique also keeps the object permissions in
place and we can use it for views, functions, triggers, i.e. any object that is procedural,
or does not contain data.
IF OBJECT_ID('dbo.DoSomething') IS NULL
EXEC ('CREATE PROCEDURE AS BEGIN SELECT “foo” END');
GO
ALTER PROCEDURE DoSomething AS…<Code>
The dynamic SQL approach can get messy for anything beyond a very simple stored
procedure; the previously referenced article by Alexander Karmanov describes an
elegant alternative approach using SET NOEXEC ON /OFF :
288
GO
-- skipped, if object already exists
CREATE PROCEDURE DoSomething
AS
PRINT 'DoSomething: Not implemented yet.'
GO
-- always executed
SET NOEXEC OFF
GO
ALTER PROCEDURE DoSomething AS…<Code>
Indexes
For indexes, the choice is different. It is impossible to change an index definition using
ALTER INDEX . Instead, we must drop and re-create it. Another option for non-
clustered indexes is to create a duplicate index, drop the old one, and rename the new
index with the old index's name.
Dropping an index is a quick operation, but creating or altering (rebuilding) an index
often requires significant processing time. If your SQL Server edition doesn't support
online index operations, or if the specific operation must be offline (such as those on
clustered indexes that contain LOB data types), then applications and users will be
unable to access the table during the entire index operation. That's often unacceptable.
With online index creation, the indexes and underlying tables remain accessible but
you can still expect to see an impact on performance during the operation, and users
may experience blocking or deadlocking.
I'd recommend creating and modifying indexes when your application is idle.
Commonly, teams perform these operations in the middle of the night. The exception
is if a table isn't yet in use by applications, in which case there is minimal user impact.
Tables
The most difficult object to deploy is a table. This is especially true if applications
query the table directly, without interfaces. In that situation, the table schema is tightly
coupled to the applications themselves, and cannot be changed without coordinating
the change among application teams. I would recommend implementing an interface
first, to decouple the database design from the application design.
There are only a few ways to change a table: we can add or remove a column, or
change its definition (modifying a data type, adding a constraint, and so on).
Commonly, we might want to change a column name or data type. The former is a
metadata-only operation and should have no performance impact on concurrent user
operations. Changing a data type, however, can result in SQL Server checking every
row, to make sure existing data does not violate the new data type.
Generally, making a column bigger is a metadata-only, quick change. Making a
column smaller requires SQL Server to check the entire table for invalid data, resulting
in a table scan, which can take considerable time on large tables.
289
A quick check to see if a change is metadata-only
Turn on STATISTICS IO before running the ALTER . If you see no output, it's a metadata-only change.
Another common task is to remove an unneeded column. The simplest way to do this
is to drop the column (assuming no foreign keys reference that column), but this does
not allow for a rollback mechanism.
The key decision is whether to save off the data in the column we wish to drop. To roll
back a column drop, first you need to save off all of the data.
Then if we need to undo the change, we simply re-create the column with NULL values
and then reload the column with the saved values.
The last common requirement is to add a column to a table and load it with data. If the
table is small, we can add it in any way we like, since any blocking from populating
the column is likely to be short-lived. For a large table, the way to add a column with
minimal user impact is to add it as a NULL able column. If you need to add values into
the new column, for existing rows, do so in batches and then set the column to be NOT
NULL , if that's required. This minimizes blocking, although it does require several
steps.
Similar to indexes, it is best to change tables when your application is idle. However,
by following practices such as these you can minimize downtime and disruption. For
further reading on how to refactor tables without downtime, I recommend Alex
Kuznetsov's article at HTTP://TINYURL.COM/KL8A3DD .
Specific
The third tenet of deployments: keep them specific . You should deploy all of the code
you have changed, and only the code you have changed. Deployment verification
queries are a great help here. For example, if you're deploying a new view, a
verification query might look like this:
It should be easy for anyone to identify exactly what is, and what isn't, part of a
deployment.
290
Keep releases decoupled , so that you can deploy them independently of other
deployments, such as application deployments. For example, deploy a CREATE TABLE
script before your application needs to use it. Just in case, make sure your application
fails gracefully if the table isn't there. That way, we can deploy either application or
database on their own, without any dependencies. That is also an example of forward
compatibility.
We should also attach a version number to each database. For example, if we know
that a test database is running version 4.1.120, and the production database is running
version 4.1.118, we can find the codebase definitions for each version. In addition, we
can identify quickly which changes we need to deploy in order to advance the
production database to the current version.
Having a good folder-diff tool is a huge benefit. Database code is text, after all.
Comparing text files is simple, with the right tool. This is an area in which we can
benefit from application developers' solutions, who have had decades to solve the
same challenges. Of course, keeping your database code in a source control system is
even better, giving access to features such as change history, comments, branching and
merging, code review tools and so on.
Tests
Database testing is the third component of DDT. Having a solid set of tests and testing
tools increases code quality. Tests ensure that our code works, and that we aren't
breaking existing functionality. They give us confidence, and reduce risk.
Focus the majority of your testing efforts on the most critical system components. For
databases, I've found that the following factors predict quite accurately how critical a
particular piece of database code is:
• it has lots of business logic
• it impacts one of the 20 biggest tables in a database
• it impacts one of the 20 most-commonly-queried tables in a database
• it impacts tables/databases that are used by several different applications
• it changes permissions, or requires odd permissions
• it deletes data from a database via DDL (for example, dropping a column or a table)
• it uses an uncommon piece of the database engine ( xp_cmdshell , linked servers,
service broker, SQL CLR, log shipping).
If we have 100 tests but we only run 20 of them, those 20 are the valuable ones. Tests
that aren't run effectively don't exist. Getting developers and testers to use the same set
of tests and testing tools is the most important step. Do that, and then incrementally
add to your tests.
Test-driven development (TDD) is a very common Agile technique that helps you to
291
make sure you write the tests you need by writing them before you write or change
code. You know your code works when all of the tests pass. Writing tests first forces
you to think about different ways your code can fail.
There are three important categories of tests: unit , integration , and performance .
Unit tests
Put simply, unit tests test a single piece of code in isolation, and make sure it works.
For example, having a unit test for a stored procedure makes a lot of sense, especially
if the stored procedure has complicated logic. In our team, we have a build-and-test
machine that automatically runs unit tests whenever a developer checks in code. It
checks our source control system for code changes, rebuilds the entire database chema
from the code base and runs a folder full of unit tests against the new database. We
catch any test failures instantly, and can notify the developer(s) responsible. This is a
continuous build system.
Integration tests
While unit tests will check just a particular database object, integration tests will make
sure that a change doesn't “break” some dependent database object. In other words,
integration tests check code changes to make sure they don't break other parts of a
system. For example, if we make a change to the database, we should run integration
tests to ensure we didn't break any applications or services that use the database. I have
found them to be particularly important when working on a database without enough
interfaces, or when changing an interface.
The key point with integration tests is to have a good set of cross-application tests
running, to make sure that you're simulating system behavior. The emphasis of the test
is on integration points; each test creates/modifies data in every application in the
system architecture. For an OLTP system, that can mean calling the UI, the business
middle-tier, and the database tier.
In our team, we have integration tests for each business feature and we usually try to
run them every 1–3 days. When an integration test fails, we email developers on all
teams who have made check-ins since the last successful test. Of course, the ideal
might be to run them every time we make any change, with every team member
getting immediate feedback, before checking in the change.
Performance tests
Performance tests verify that the system meets the identified performance criteria, in
terms of query response times under various workloads, and so on. To put it another
way, you run a piece of code to see if it is fast enough, and if it scales well under load.
In my experience, development teams don't run database performance tests as often as
they should, mainly because running them can require a near-copy of production
292
hardware and they don't have the IT resources or budget to do that.
I have found that the best systems to run performance tests on are restored copies of
production databases. It's extremely rare to see test databases with enough data volume
and variety comparable to a production environment. A simple PowerShell script can
run a set of scripts against a test database and capture the runtime of each query. The
best queries to run are copies of the most common ones on the system and the most
critical intermittent query, like a monthly billing query.
Our team usually tries to run performance tests at least once per release, especially for
changes to large or critical systems. When doing performance testing, define fast
enough and scales enough before you test. Be realistic about your goals.
Conclusion
Agile works very well if you do the right things. It also breaks down quickly if you do
the wrong things too often. The key is to know when something is working, and to
improve on it. Good judgment, smart people, lots of communication, and a level head
are what you really need. With that, you can make Agile work for you, in your own
way.
Keep learning, and keep improving. Good luck!
293
Nine Habits to Secure a Stellar
Performance Review
Wil Sisney
I know something about you. I know that you are a motivated database professional,
and I know that you're one of the best. How do I know that? Mediocre IT workers
don't pick up technical books like this one. They have no interest in attending technical
conferences or studying on their own time.
No, you are one of the chosen few, the true engines of any company. You are good at
what you do, and more importantly, you want to get better at what you do. I don't tell
these secrets to just anyone, but thanks to your dedication, I'm going to share with you
the nine habits that together form the recipe which will get you the recognition you
deserve and secure you a stellar performance review.
I have refined these habits over many years, working with supervisors from both
technical and non-technical backgrounds. In fact, when I set out to crystalize this
knowledge into a chapter in the book you're holding, I surveyed some of my past
supervisors to discover what makes a fantastic employee stand out. I'll also cover how
to use your performance review to obtain the best reward for all of your hard work,
and how to continue your momentum until next year's equally stellar review.
294
you manage work more efficiently. One excellent system is the Pomodoro Technique
(HTTP://WWW.POMODOROTECHNIQUE.COM/ ), based on the idea that you should work in
short sprints of 25 minutes at a time, taking 5-minute breaks between each sprint. The
technique provides artifacts to help you organize your work, most notably the Activity
Inventory Sheet.
Another system I've used in the past is the Time Management Matrix . This system
divides tasks into one of four quadrants based on the combination of two factors:
importance and immediacy. Immediate tasks demand our time even if they aren't
important. The most productive work is done in Quadrant 2, where tasks are important
but not immediate. Tasks in the fourth quadrant are neither important nor immediate,
and so usually represent time-wasters. It's important to eliminate or at least minimize
Quadrant 4 tasks in order to focus on the most important jobs.
A third system, which seems to fit the work of database administrators well, is the
Getting Things Done system (abbreviated to GTD), developed by David Allen and
described in his book of the same name. This system focuses on tracking tasks
externally (like in an application or a notebook) so that the mind is free from
remembering tasks that need to be completed. A “next action” for each task is tracked
as well, which means that the next time you visit that task you'll have a plan to work to
before you even begin. GTD also encourages a daily and weekly session where tasks
are reviewed, statuses are updated, and next actions for new tasks can be assigned.
GTD is a solid system, easy to implement, with many resources to help track work,
such as smartphone and web applications.
Other useful work management systems
If you work in a development team, you're doubtless familiar with Agile development methodologies such
as Scrum, which build in a plan for your daily deliverables and help you stay on track (there are some
similarities with GTD). For DBAs, I'd also recommend the Six Sigma process called DMAIC, which
Thomas LaRock showed how to apply to database troubleshooting.
In my experience, it doesn't matter which system you use so long as it works for you
and you stick to it . Finding a system that fits your work style will help you manage
and prioritize your tasks. Manage your time well and you'll quickly attain rock star
status in the eyes of your boss.
295
make you successful, since your boss will be making a decision during your review.
So how do you determine what your boss will base your performance rating on? Ask!
This might seem self-explanatory, but have you ever asked your boss what it takes for
you to succeed? Simply set up a meeting and ask her how she will judge your success
when the year is over. Bear in mind that your boss may be inclined to point you
towards the one-size-fits-all performance objectives; if that happens, simply explain
that you understand those objectives to be the baseline for your performance, and ask
her to expand on what she expects from an exceptional employee. Take notes, ask
pointed questions, and discuss plans for how you can meet expectations. Once the
meeting is complete, compile a summary of your plan for success and send it to your
boss. This will give her a chance to refine her expectations. Once that refinement
process is complete, you've got a good-faith contract which, if you meet the terms, will
deliver the results you expect.
Once these expectations are established, your next job is to create a plan to meet these
expectations. Write your plan as you would write effective goals; make sure the plan is
specific and measurable . Establishing your boss's expectations and creating a plan to
meet them will position you for success better than most of your peers.
296
Habit 4: Train to Gain an Edge
In 2008, IBM conducted a study to determine the return on investment from employee
training. The results showed that employees who received 5 hours of training or less
delivered only 65% revenue per employee, compared to those employees who
received more than 5 hours of training.1 IBM's study focused on many industries, and I
would argue that 5 hours of training is not nearly enough for an IT employee to keep
up with changing technology, much less advance their own knowledge.
Should your company pay for training? Absolutely; the fatal flaw of most IT
organizations is that they don't budget properly for training. Use every resource your
company offers, and don't be afraid to ask for more. Unfortunately, my experience is
that you'll have to fight and claw your way to company-paid training. Fortunately,
there is a wealth of free and paid resources available for training on SQL Server. I
group training into five categories:
1. Blogs.
2. Webcasts.
3. User Group meetings (including webcasts for virtual chapters).
4. Books and white papers.
5. Events (such as the Professional Association for SQL Server's yearly PASS
Summit, or a local SQL Saturday event).
You can read more about each of these types of trainings, including recommendations
on which are worth your time, at HTTP://WWW.WILSISNEY.COM /F IND T RAINING .
If finding high-quality training material to study is not the problem, finding time to
study it can be more of a challenge. I have a confession to make. When I became a
database administrator, I had no idea what I was doing. Thanks to my well-intentioned
but non-technical boss at the time, my job title was suddenly Senior Database
Administrator and it propelled me into a job for which I had no formal training, and
with no one to teach me. I felt like the emperor with “invisible clothes” from the
children's fable; sooner or later, someone would spot my ignorance. I hated that
feeling, and so I decided to change the situation.
I began studying for at least an hour a day, every day. When I started the job, I had no
clue what I was doing, so I had to study on my own time. At first, it was challenging to
find a free hour every day, but after just a few weeks of making this habit a priority, I
had no trouble finding time. I'd love to say that after a few weeks I knew everything
there was to know about database administration, but we'd both know that was a lie.
However, it wasn't long before I was able to hold up my end of a conversation about
databases.
I've kept up that pace of at least 1 hour of study every day for the last 6 years. Not
everyone can study an hour a day, but it's important to make a commitment to your
training and establish a regular schedule.
297
Now when it comes to your performance review, it is critical to track your training .
There are four things you need to log:
1. The date/time of training.
2. The source and topic of training.
3. The length of training.
4. Brief notes about what you learned.
You can track these four items in many ways; I use a Google Docs spreadsheet so my
study log is accessible from any Internet-connected device.
You're going to use this information in two ways. First, you're going to review your
notes the day following your training session. According to the Cornell Note Taking
System , during a learning scenario, such as a lecture, most people will forget 80% of
what they've learned unless they take notes and review those notes within 24 hours.2
Spend just a few minutes of each study session reviewing the last session's notes and
you'll find that your retention of what you've learned will dramatically improve.
The other way you're going to use your study tracking comes when it is time to write
your performance review. We'll discuss that in more detail later. Rest assured that your
study commitment will make you more valuable and knowledgeable than most of your
peers.
I've noticed something about the SQL Server community: it is populated with
geniuses. Some of the most intelligent and, more importantly, most dedicated people
I've met work as database professionals. Strike up a conversation at a technical
conference or a user group meeting and chances are that you'll find a lively intellectual
with something interesting to say. These are brilliant people and they are doing
brilliant work, some of which they've decided to share.
We're going to use and adapt the solutions created by others. Now, I know what some
of you are thinking: don't the best database professionals write all of their own
solutions from scratch? I used to think that, too. As I monitored resources like Twitter,
however, I found just the opposite to be true. The best and brightest among our
community were raving about solutions that others had created. They use these
solutions. They've learned that by using an off-the-shelf solution they can spend more
time working on the projects they are passionate about.
There are some rules about using solutions developed by others, rules I have learned
through trial and error. First, research the solution to see what kinds of results it has
provided for others, and how it addresses your specific need. Go into full research
mode. Read blog comments to get a feel for what other people have experienced with
298
this solution and see what alternatives are out there.
The second rule is that you need to read scripts in enough detail that you understand
how they work. Make sure you read all code comments. Scan the code for anything
that looks suspicious, such as adding new logins with fixed passwords, or dangerous
techniques like creating procedures that build and execute dynamic SQL statements.
Make informed decisions before releasing potentially dangerous code into your
environments. The world of SQL Server is not risk free, so it is up to you to decide
what risks are acceptable.
The third rule is that you test the code in your least important environment first, and
migrate it through environments in order of increasing importance. I'll give you a
simple example from my personal experience. I recently went hunting for a trigger-
based solution to log user logins on a development server. I found a simple and elegant
solution developed by Nicholas Cain, a friend and a recipient of the MCM
certification. He's one of those guys you can't help liking, and he's absolutely brilliant.
His code used a server-level trigger to track logins, and it provided a daily summary of
the data in a rollup table.
His code is flawless; however, I decided to change the name of a security principal for
my implementation, but I missed an occurrence of that security principal later in the
script. That mistake is forgivable, and is something that would have come out in
testing. My mistake, however, was running the script on a busy development server
first. Development is less important than production, right? Of course, but I live by the
motto that the development environment is production for my developers . My slight
mistake in code was drastically magnified when suddenly everyone was locked out
from the entire server because of my flawed logon trigger. I realized my mistake
instantly, but it took me a few minutes to figure out how to fix the problem.
Dealing with flawed logon triggers
To get around a logon trigger that locks everyone out of the server, use the DAC to log in and disable and
delete the trigger.
Once I fixed my mistake, it worked so well that I now use the solution on many of my
production servers. The moral of this story is that you need to test solutions in the least
impactful environment first. For many of us, that's our local machine's SQL Server
Developer Edition instance.
Don't let my cautionary tale dissuade you from using solutions developed by other
people. You'll be glad you found something that worked so well for such little effort,
and your boss will appreciate that you're using proven solutions that permit you to
focus on more important tasks.
Here's a shortlist of my six favorite community solutions on SQL Server, just to help
you get started:
• Ola Hallengren's Backup, Integrity Check and Index Maintenance solution
• What it does: Makes you look like a genius. Ola has developed some exceptional
299
scripts that do everything SQL Server Maintenance Plans do, but so much better.
His backup scripts integrate with third-party backup tools and allow for simple
database selection. The index scripts are absolutely amazing. The scripts only do
work on objects that require it, and it includes robust logging and excellent
statistics maintenance. Suddenly your index management on your data warehouse
goes from hours to minutes.
• Where to get it: HTTP://OLA.HALLENGREN.COM/ .
• sp_WhoIsActive from Adam Machanic
• What it does: Puts important details about currently executing SQL statements at
your fingertips. There's so much goodness packed into sp_WhoIsActive that I
could have devoted this whole chapter to describing the benefits of this gem.
Consider sp_WhoIsActive as the replacement for sp_who2 or Activity Monitor.
When you need to troubleshoot a problem occurring on your SQL Server,
sp_WhoIsActive is nearly a one-stop shop. Adam has put literally years of work
into sp_WhoIsActive and has excellent documentation covering the procedure's
capabilities.
• Where to get it: HTTP://SQLBLOG.COM/BLOGS/ADAM_MACHANIC/ .
• SSMS Tools Pack from Mladen Prajdić
• What it does: SSMS Tools Pack is an add-in for SQL Server Management Studio
that transforms it into a much more robust application. SSMS Tools Pack contains
several amazing tools, such as a detailed execution plan analyzer, a handy SQL
statement formatter, and a SQL Snippets tool that allows you to store predefined
blocks of SQL statements that you type often, which you can then trigger by typing
a small shortcut. One of my favorite features comes in the Query Execution
History feature, which stores history on each query you run. Since developing on
SQL Server is often an iterative process where you build and refine a query one
component at a time, this history window is a dream come true. I'm always
surprised when I find a new feature in SSMS Tools Pack, and I think you'll find it
to be one of the most valuable arrows in your SQL Server quiver.
• Where to get it: HTTP://WWW.SSMSTOOLSPACK.COM/ .
• Sean Elliot's Full Server DDL Auditing Solution
• What it does: Logs Data Definition Language (DDL) changes across an entire
server. Tracking changes to the structure of your databases can be challenging,
especially on busy development servers. Sean's solution provides a robust trigger-
based logging solution that basically maintains itself. Whenever an object is
created, dropped or altered, this solution logs it. Even better, it logs changes in two
places – at the database level and at the server level. This feature allows you to
reconstruct changes when databases are dropped and restored, and allows you to
secure logs in a central location. Sean's solution allows you to know who changed
what, and when. I've had situations where two developers have both changed the
same stored procedure right after each other and without this auditing solution
300
they'd have had hours of rework to reconstruct what was lost. The brilliant thing
about this solution is that it is self-maintaining. When a database is created or
restored, the solution automatically creates the objects in that database to enable
the logging solution. It is almost a set-and-forget solution. This DDL Auditing
Solution isn't source control for databases, but it is darn near close.
• Where to get it : HTTP://WWW.SQLSERVERCENTRAL.COM/ARTICLES /DDL/70657/
(Note: this website requires a free registration.)
• Konesans Custom SSIS Tasks
• What it is: Custom components for SSIS. Many seasoned SQL Server Integration
Services developers don't know that SSIS functionality can be extended using
custom data flow tasks. Konesans provides a set of free custom SSIS tasks that can
solve common SSIS development challenges. For example, application developers
who work with SSIS often wonder how they can use regular expressions – what
they call “RegEx” – to validate expressions. SSIS doesn't natively include that
functionality; however Konesans has not one, but two components that solve this
problem (one is a data cleansing transformation, and the other uses regular
expressions like a conditional split transformation). There's a transformation that
watches for files to appear in a directory, and there's another that generates row
numbers. These tasks can be accomplished using a Script Task component, but that
requires that the developer know C# or Visual Basic. These tasks save time and
effort, and they work on any server running SSIS (even if the component isn't
installed on that server). Test them out and you'll likely find them just as useful as I
do.
• Where to get it: HTTP://WWW.KONESANS.COM/PRODUCTS.ASPX .
• Pro tip: If you like these custom components, check out the CodePlex
Community Tasks and Components page for additional custom tools. The
CodePlex page is located at HTTP://SSISCTC.CODEPLEX.COM/ .
• Management Data Warehouse
• What it does: The Management Data Warehouse (MDW) collects SQL Server
performance information and provides a series of impressive reports to analyze
results. I'll admit that the MDW is challenging to install, but once you get past that
hurdle you've got a very impressive suite of reports to drill into any performance
challenge. The reports start with an overview report, and by clicking on any
element within that overview you can drill into very detailed information – all the
way down to the execution plan level. The MDW reports are pretty enough that
you can show them to management, and they work so well that you can drill into a
problem with your boss standing over your shoulder.
• Where to get it: Start with Bill Ramos's article series, where he explains
everything you need to know about the MDW and how to install it:
HTTP://TINYURL.COM/2EDLDMF .
These solutions are just the tip of the community-contributed iceberg. As a word of
301
advice, don't let your boss or team believe that you developed any of these solutions.
Give credit where credit is due, and you'll find that your boss appreciates that you're
using proven solutions and saving time.
302
the tape backup system running too slowly to keep up? As the database
administrator, you often see problems emerging before they cause serious
challenges. This is where you tell your boss what is coming up, and where you
provide suggested solutions.
• Sections highlighting your specialties: We are all good at working on SQL Server
in certain ways. If your passion is performance tuning, add a section that summarizes
your tuning efforts. Tell your boss how many procedures you tuned, and highlight
the statistics that matter (such as how much time your tuning efforts shaved off of
execution times). If it is your job to deploy new databases and instances, track here
how many were deployed and how long it took. Did you handle the release this
weekend? Tell your boss how much time it took, what was deployed, and suggest
ways to make the release even smoother next time.
Those are just a few options. This is your report, so customize it to cover the things
that are most important to your shop. As a word of caution, don't try showing
everything as always operating smoothly. Your boss needs to know what problems
exist and that you're on top of them. If you demonstrate that metrics are improving
over time, you're controlling the headlines like a champ.
It's just as important to control the headlines when things go badly as when everything
is going well. If you run into a crisis, control the headlines by focusing on what you've
done, or are doing, to fix the problem. Never try to hide a problem; your boss needs to
know what you're up against. Communicating statuses and solutions underway during
a crisis makes the best of a bad situation. Don't forget to follow up with a detailed
analysis of the root cause of the problem and suggestions on how to prevent it in the
future.
303
select the top achievements from each, summarizing achievements where possible.
Prioritize the list of achievements, giving weight to those with numeric facts to
demonstrate results, and select the top 10–12 achievements. If you can tie the
achievements in with cost savings or profit for the company, do so. Some of us don't
have that ability, but so long as you keep your achievements based on numbers, your
boss will have the material to evaluate your performance properly.
Now that you've got a shortlist of the best work you did this year, it is time to set some
goals for next year. You'll add these to the performance review. Approach this in the
same way you'd tackle the inevitable “What are your weaknesses?” question in an
interview. Choose three areas to improve and, at this point, keep your choices broad.
For example, someone who wants to learn more about SQL Server can choose to
improve their technical skills.
For each area of improvement that you've identified, write two goals. Make them
specific and measurable. Include these goals in your review, and then work towards
them the next year. Don't be afraid to write goals to which your employer will need to
contribute as well.
Last, take a peek at your training logs. Sum up the hours you've spent training; you'll
add this to your review later. Showing how many hours you've trained will set you
apart from your peers. You've probably learned a great deal over the year. If you can,
pick common themes from your training and summarize them into a bulleted list.
Each employer typically has a self-review template to use, so it will be up to you to
determine how best to fit your achievements in. On the off-chance that you don't work
for a company that requires self-appraisals as part of the review process, you're still
going to write a self-review. Your boss needs to read this. Even though self-review
templates vary, you should be able to incorporate the list below into that review.
Your review should include the following:
• A two- to three-sentence introductory paragraph that discusses your role and
contributions to the company.
• A bulleted list of your top 10–12 achievements, with numbers-based measurements.
• A one-sentence statement explaining your work management system and how well
it has helped you meet your goals. Don't be afraid to name the system you selected.
• A statement explaining your passion for studying SQL Server and the total hours
you've spent studying. If you feel it is appropriate, include a one-sentence statement
discussing the focus of your training.
• Next, write a forward-looking statement about your plans to build upon your
successes this year by working to improve three areas next year.
• List each area of improvement with your two goals for each area. If you wish, you
can include a statement about the value each of these goals will bring to the
company. It is also a good idea to include a statement asking for your company's
help to meet these goals.
304
• A final paragraph summarizing your successes this year and your eagerness to
continue these successes to help the company succeed in the coming year.
I use this pattern every year when writing my review, and I've proven it repeatedly
with many managers of different styles. When describing your achievements, you'll
have a tendency to be verbose. Fight it; your boss wants as little reading as possible, so
keep everything you write relevant, brief and factual.
Most people loathe writing self-appraisals and put them off until the last minute. One
last tip is to get your review submitted well before the deadline to give your boss
plenty of time to read it.
If you follow these seven steps, you'll have done everything you can to secure a stellar
review. Next, we'll talk about what to do with it.
305
base salary is from your market value. Except in situations where your salary is far
under market value, it is rare to see even exceptional employees getting a raise over
8%.
There are a few things to keep in mind when talking money with your boss. The time
for salary negotiations comes after you receive your performance review. Let your
boss present the proposed merit increase to you. This does two things: first, it lets you
know how far apart you are in your expectations; second, it gives your boss a chance
to discuss factors that might be affecting the proposed increase. Your boss almost
certainly has been given a limited budget for merit raises, so your raise needs to be
balanced against the other members of your team.
Another critical thing to keep in mind when negotiating your raise is that you need to
keep your personal life out of the conversation. You can't guilt your boss into a raise
because your kid needs braces. Guess what? Her kid needs braces, too. Even if you
can guilt her into a raise based on your personal situation, don't. This amounts to
emotional warfare and it will come back to bite you in the end. Instead, focus on the
results you get. Your review is proof now that the company agrees that your
contributions were a major factor in your department's success. Focus your
negotiations on the value you've brought to the company. Results are what matters,
and they form the entire platform to build your case on.
If you're satisfied with the proposed raise, leave well enough alone. You've proven
your worth to your boss, so if you get a solid raise offer, she's probably done her best
for you. If your performance review was great and the proposed amount doesn't fit
your expectations, begin negotiating.
On rare occasions, no matter how fiercely you negotiate, your boss is simply unable to
provide you with the raise you want. Factors such as the economy and company's
stock price can make it impossible for your boss to give you the raise you deserve.
Fortunately, there are still options for you to get non-traditional rewards as
alternatives. Ask your boss to send you to that training event or technical conference
you've always wanted to attend. Negotiate a better shift, or ask for that office you've
had your eye on. Your boss wants to reward you for your hard work, and creative
solutions can work well for both of you.
306
You have the tools you need to succeed, and by following these nine habits, you'll
stand out clearly as an exceptional employee that the company will strive to retain.
1 The ROI of Employee Training and Development: Why a Hearty Investment in Employee Training and
Development Is So Important – Rachele Williams and Lawson Arnett, APQC. See
HTTP://TINYURL.COM/NTYB5KV (requires membership for access).
2 Source: HTTP://WWW.WWU.EDU/TUTORING/CORNELL%20SYSTEM.SHTML .
307
Index
A
Access Methods 51
Adam Machanic 444
Agile database development 413 – 433
background 414 – 415
deployments 423 – 430
fast 425 – 429
code objects 425 – 426
indexes 427
tables 427 – 429
robust 424 – 425
specific 429 – 430
design 420 – 423
integrity 421
interfaces 421
patterns 422
simplicity 420 – 421
implications 415
requirements 415 – 419
automation 415 – 417
balance 417 – 418
communication 418 – 420
tests 430 – 432
integration tests 431 – 432
performance tests 432
unit tests 431
Alerts
configuring in transactional replication 330 – 333
Apache JMeter 107
Articles 292
properties 313 – 314
selecting 311 – 312
Auditing in SQL Server 197 – 234
C2 auditing 199
Change Data Capture (CDC) 199
Change Tracking 199
develop your own audit 220 – 232
DDL/logon triggers
creation of 230 – 232
operation 229
pros and cons 232
Event Notifications
creating 221 – 232
308
operations 220 – 221
pros and cons 232
events and event classes 200 – 201
Policy Based Management (PBM) 199
SQL Audit 198 , 208 – 220
creating the audit 210 – 213
creating the audit specification 214 – 217
operation 209
pros and cons 219 – 220
terminology 209 – 210
viewing audit output 217 – 218
SQL Trace 198 . See also SQL Trace third-party solutions 233
Auto-created logins 176
Automation
backup restores 79 – 93
backup testing 77
B
Backups
test restores 84 – 86
confidence level 87 – 94
verifying with statistical sampling 75 – 94
automating 77 – 78
margin of error 88 – 94
planning 78 – 79
population 89 – 94
reasons for 76 – 77
response distribution 88 – 94
sample size 87 – 93
determining 90 – 93
Bad parameter sniffing 118
Bill Ramos 447
Binary searches . See SQL Server storage mechanisms: indexes
b-tree . See SQL Server storage mechanisms: indexes
C
C2 auditing 198
Change Data Capture (CDC) 199
Change Tracking 199
ClearTrace analysis tool 111 – 112
CodePlex Community Tasks and Components 446
Communication for DBAs 379 – 390
benefits 388 – 390
in speech 384 – 387
lunch-and-learn 386
user groups 387
using video 385 – 386
309
in writing 380 – 384
blogging 382 – 383
NaNoWriMo 383 – 384
Twitter 381 – 382
Configuring notifications for Agent jobs 281
CUME_DIST 147
D
Database corruption 76
Database Engine Tuning Advisor (DTA) 97
Database Mail 261 – 289
configuring 264 – 269
enabling 263 – 264
real-time monitoring 281 – 283
reporting on Agent jobs 278 – 281
configuring an operator 280
configuring notifications 281
designate Fail-safe operator 279 – 280
enabling a profile 278 – 279
security requirements 270 – 271
testing 269 – 270
troubleshooting 283 – 288
common problems 284 – 285
interrogating system views 285 – 287
maintaining the log table 287 – 288
SSMS Database Mail log 288
use of 271 – 278
customized email alerts 275 – 278
sending to multiple recipients 275
sending with file attachment 272 – 273
with multiple files 272
sending with results of T-SQL statement 273 – 274
Database roles 185 – 186
Database users 180 – 185
creating 181 – 182
default 181
for contained databases 184 – 185
orphaned 183 – 185
Data compression 49 – 74
and backup compression 73 – 74
and transparent data encryption 73 – 74
basics of, 49 – 53
benefits 50 – 51
costs 51
types of setting 52
compression ratio vs. CPU overhead 63 – 65
data compression syntax, brief 67 – 72
data usage patterns 66 – 67
310
page compression 57 – 62
additional overhead of, 61 – 62
column prefix compression 58 – 60
page dictionary compression 60 – 61
performance monitoring 65
row compression 53 – 56
when to use, 62 – 67
Data records . See SQL Server storage mechanisms DBCC IND 35 – 36
DBCC PAGE 32
dbo user 181
DDL/logon triggers 229
Default trace in SQL Server 207 – 208
DENY keyword 163
design-first project methodologies 414
Distribution Agent 294 , 304
security 320 – 321
Distributor 292 , 301 – 302
configuring 306 – 309
DMAIC (Six Sigma process) 437
Dynamic Management Views (DMVs) . See Performance tuning
E
Enabling replication 310
Event Notifications 220
Extended Events . See Performance tuning
advantages of 120 – 121
converting traces to 127 – 129
preparing for migration to 119 – 129
F
Fail-safe operator 279
Filters 315
FIRST_VALUE function 144
fixedvar 27
Full Server DDL Auditing Solution (S. Elliot) 445
G
Getting Things Done (GTD) system 437
GRANT keyword 163
guest user 181
H
Hacking . See SQL injection attacks
HTTP Parameter Pollution 245
I
311
immediate_sync subscriptions 339
Index allocation map (IAM) 37
INFORMATION_SCHEMA user 181
K
Konesans Custom SSIS Tasks 446
L
LAG function 142
LAST_VALUE function 146
LEAD function 143
LoadRunner 107
Logins . See Principle of Least Privilege: applying
from Windows 165 – 169
SQL Server login 164 – 165
Log Reader Agent 293 , 303
M
Management Data Warehouse (MDW) 447
Merge Agent 304
Metasploit 259
Missing indexes 117
Mladen Prajdić 445
O
Ola Hallengren 444
Orphaned users 183
Out-of-date statistics 118
P
PERCENTILE_CONT 148
PERCENTILE_DISC 148
PERCENT_RANK 147
Performance Monitor (PerfMon) . See Performance tuning
Performance review 410
secure a stellar review 435 – 454
negotiate your merit increase 452 – 454
off-the-shelf solutions 442 – 447
overall vision 439
parameters for success 438
positive status reports 447 – 449
training 440 – 442
work/life balance 436 – 437
writing your self-appraisal 449 – 452
Performance tuning 95 – 130 , 333 – 339
Dynamic Management Views (DMVs) 95
312
Extended Events 95
Performance Monitor 95
Permissions . See Principle of Least Privilege
dbcreator 172
granting in SSMS 188 – 190
granting in T-SQL 191
securityadmin 172
sysadmin 172
testing 191 – 192
Policy Based Management (PBM) 199
Pomodoro Technique 436
Principle of Least Privilege 155
applying 165 – 192
create SQL Server logins 164 – 172
fixed server-level roles 172 – 177
granting in the database: schema: object hierarchy 186 – 192
server-level permissions 177 – 179
Profiler . See SQL Server Profiler
Project charter 397
Project management for DBAs 391 – 411
common project management terms 397 – 400
at risk 400
burndown rate 399
checking-in 400
critical path / long pole 398
project charter 397 – 398
resource allocation 399
the constraint triangle 399
Work Breakdown Structure (WBS) 397
DBA's place in the organisation 393
defensive moves 400 – 407
awareness of conflicting schedules 404 – 405
communicate risk, positively 403 – 404
never say “No” 402 – 403
outsource prioritization 400 – 402
recurring “invisible” tasks 405 – 407
offensive moves 407 – 410
justify hiring 409 – 410
performance reviews 410
remove technical debt 408 – 409
risks for DBAs 394 – 395
the DBA's workload 392
what is project management? 395 – 411
your next move 411
Publisher 292 , 300 – 301
configuring 309 – 318
Pull subscriptions 294
Push subscriptions 294
313
Q
Queue Reader Agent 304
QUOTENAME 257
R
REPLACE 257
Replicating stored procedure calls 338
Replication 297 – 298 . See also Transactional replication
merge 298
monitoring with ReplMon 324 – 330
peer-to-peer transactional 298
snapshot 297
transactional with updating subscriptions 298
troubleshooting 340 – 344
accidentally dropped objects 343
errors in more detail 340 – 341
find large batches of data changes 342 – 343
re-initialize Subscribers 343 – 344
verbose error output 341
tuning 335 – 339
Replication Agents 292 , 303 – 306
custom profiles 336 – 338
Replication latency 326 – 327
Replication Monitor (ReplMon) 324 – 330
Report writing 347 – 378
choosing tools 371 – 377
in the Microsoft stack 372 – 373
outside the Microsoft stack 373 – 374
strategy 374 – 375
design fundamentals 356 – 370
designing graphs and charts 366 – 370
designing tables 360 – 365
spreadsheet formatting tips 358
key skills 348 – 356
communication 349 – 352
technical skills required 353 – 355
report implementation decisions - flow diagram 377
Response distribution
in backup verification 88
Restoring backups . See Backups: verifying with statistical sampling
test restores 84 – 86
REVOKE keyword 163
Row identifier (RID) 38
S
Sean Elliot 445
314
Security (SQL Server) 155 – 196 . See also Principle of Least Privilege
architecture 157 – 163
permissions 157 . See also Permissions
in the hierarchy 162 – 163
keywords 163
principals 157
granting permission through 190
hierarchy of 159 – 161
securables 157
granting permission through 188 – 189
hierarchy of 161 – 162
considerations 156
Server and database file configuration and tuning 334 – 335
Server-level role 159
Server-side trace 101 – 116
for DDL auditing 202 – 207
Snapshot Agent 293 , 303
configuring 315 – 318
sp_trace_create 103 – 105
sp_trace_setevent 105 – 106
sp_trace_setfilter 105 – 106
sp_WhoIsActive 444
SQL-authenticated logins 159 , 169 – 172
SQLCAT 64
SQL injection attacks 235 – 260
additional resources 260
attacking websites 239 – 243
defending websites 243 – 250
blacklists 244
HTTP Parameter Pollution 245 – 246
queries as binary strings 246 – 247
parameterized queries 248 – 250
whitelists 247
how attacks work 236 – 239
other defenses 256 – 259
appropriate permissions 258
automated tools 258 – 259
QUOTENAME and REPLACE 257
web application firewalls (WAFS) 259
protecting stored procedures 250 – 256
sqlmap 258
SQL Server 2012
analytic functions 142 – 152
FIRST_VALUE and LAST_VALUE 144 – 146
LAG and LEAD 142 – 143
performance of 150 – 152
statistical analysis functions 146 – 149
CUME_DIST and PERCENT_RANK 147 – 148
315
PERCENTILE_CONT and PERCENTILE_DISC 148 – 149
sliding windows 132 – 141
defining frame extent 135 – 136
RANGE vs. ROWS - behavioral 137 – 139
RANGE vs. ROWS - performance 139 – 141
Windows functions 131 – 153
SQL Server and Database Mail . See Database Mail
SQL Server logins 159 . See also Principle of Least Privilege: applying
SQL Server Management Systems (SSMS)
creating event sessions in 121 – 126
SQL Server Profiler 95
Distributed Replay feature 107
pros, cons, best practices 97 – 100
Replay Traces feature 107
SQL Server security . See Security (SQL Server)
SQL Server service accounts 173
SQL Server storage mechanisms 25 – 47
DBCC PAGE 32 – 35
heaps 37 – 38
indexes 39 – 44
b-trees
maintenance 43 – 44
searches 41 – 45
structure 40 – 41
design 45 – 46
storage requirements 46 – 47
pages 30 – 32
investigating with DBCC commands 32 – 36
records 27 – 30
SQL Trace 202 – 208 . See also Performance tuning
actions of 96 – 119
analyzing the trace 110 – 116
importing the trace 109 – 110
resolution 117 – 119
running the trace 106 – 108
default trace 207 – 208
pros and cons 208 – 209
relevance 96
server-side trace for DDL auditing 202 – 207
SSMS Tools Pack (Mladen Prajdić) 445
Statistical sampling
and backups 77
Stored procedures
protecting 250 – 256
Subscriber 292 , 302
configuring 318 – 322
tuning query performance 323
Subscriptions 292
316
initialisation 321 – 322
local versus remote 319 – 320
pull versus push 318 – 319
synchronization 321 – 322
Subscription Watch List 324 – 325
sysadmin
and auto-created logins 176
and SQL Server service accounts 173
and Windows Administrators group 173
sys user 181
T
Test email addresses 265
Thomas LaRock 437
Time management matrix 436
Tracer tokens 326 – 327
Transactional replication 291 – 345
architecture/components 292 – 295 , 299 – 304
configuring alerts 330 – 333
error details 327 – 330
requirements 305
use cases 295 – 297
walk-through 306 – 322
when not to use 299
Transparent data encryption (TDE) 73
T-SQL_Tuning template 97
Tuning 335 – 339
Twitter 381
V
Validating/verifying backups . See Backups: verifying with statistical sampling
W
Waterfall-style software development 414
Web application firewalls (WAFs) 259
Windows Administrators group 173
Windows login 159
Windows OS-level securables 161
Window Spool operator 139
317
Table of Contents
Cover Page 2
Title Page 3
Copyright Page 4
Table of Contents 5
About this Book 11
The Tribal Authors 11
Computers 4 Africa 13
The Tribal Reviewers and Editors 14
The Tribal Sponsors 15
Introduction 17
SQL Server Storage Internals 101 19
The Power of Deductive Reasoning 19
Records 20
Pages 22
Investigating Page Contents Using DBCC Commands 24
DBCC Page 24
DBCC Ind 26
Heaps and Indexes 27
Heaps 27
Indexes 28
SQL Server Data Compression 35
Compression Basics 35
Overview of SQL Server compression 35
Version and edition requirements 37
Row Compression 38
Page Compression 40
Stage 1: Column prefix compression 40
Stage 2: Page dictionary compression 42
Additional overhead of page compression 43
Where and when to Deploy Compression 43
Data compression ratio vs. CPU overhead 44
Performance monitoring 45
Data usage patterns 46
A Brief Review of Data Compression Syntax 47
Combining Data Compression with Backup Compression and Transparent 50
318
Data Encryption 50
Summary 51
Verifying Backups Using Statistical Sampling 52
Why Backup Validation? 52
Why Automate Backup Validation? 53
Planning Considerations 54
Automating Restores with Statistical Sampling: the Parts 54
Retrieving a list of databases and backup files from each server 55
Performing test restores 58
The magic happens here 60
Summary 64
Performance Tuning with SQL Trace and Extended Events 65
Is SQL Trace Still Relevant? 65
How SQL Trace Works 66
Profiler: pros, cons, and best practices 66
Server-side trace 69
Resolution 80
Preparing for Migration to Extended Events 81
Conclusion 88
Windows Functions in SQL Server 2012 89
Sliding Windows 89
Defining the window frame extent 91
Behavioral differences between Range and Rows 93
Performance differences between Range and Rows 94
Analytic Functions 96
Lag and Lead 96
FIRST_VALUE and LAST_VALUE 97
Statistical analysis functions 99
Performance of analytic functions 101
Summary 103
SQL Server Security 101 104
Securing SQL Server, the Bigger Picture 104
SQL Server Security Architecture, a Brief Overview 105
The hierarchy of principals 107
The hierarchy of securables 108
Granting permissions in the hierarchy 109
The three permission keywords 109
319
Creating SQL Server logins 110
Fixed server-level roles 115
Server-level permissions 119
Database users 120
Database roles 124
Granting Least Privileges in the database: schema: object hierarchy 125
Taking the “Least Privileges” Challenge 129
Example 1 129
Example 2 130
Answer 1 130
Answer 2 130
Summary and Next Steps 131
Further reading 131
What Changed? Auditing Solutions in SQL Server 133
Auditing Options Not Covered 133
C2 auditing 134
Policy Based Management 134
Change Data Capture and Change Tracking 134
The Basis of Auditing: Events and Event Classes 134
SQL Trace 136
Server-side trace for DDL auditing 136
Using the default trace 139
SQL Trace: pros and cons 140
SQL Audit 140
SQL Audit: how it works 141
SQL Audit: terminology 141
SQL Audit: creating the audit 141
SQL Audit: creating the audit specification 144
SQL Audit: viewing audit output 146
SQL Audit: pros and cons 147
Develop your own Audit: Event Notifications or Triggers 148
Event notifications: how it works 148
Event notifications: creating an event notification 148
DDL and Logon triggers: how they work 153
DDL and Logon triggers: creating triggers 154
Event notifications and triggers: pros and cons 155
Third-party Solutions 156
Conclusion 156
320
Conclusion 156
SQL Injection: How it Works and how to Thwart it 157
My First SQL Injection Attack 157
Attacking Websites 160
Defending Websites 162
Blacklists 163
Why blacklists don't work 163
Whitelists 165
Parameterized queries 165
Protecting Stored Procedures 167
Other Forms of Defense 171
QUOTENAME and REPLACE instead of sp_executesql 171
Appropriate permissions 172
Automated tools 172
Web application firewalls 173
Additional Resources 173
Using Database Mail to Email Enable your SQL Server 174
Getting Started with Database Mail 174
Enabling Database Mail 175
Configuring Database Mail 176
Testing Database Mail 180
Security requirements 180
Using Database Mail in your own Applications 181
Sending an email with a file attachment 181
Sending an email with the results of a T-SQL statement 182
Sending an email to multiple recipients 183
Producing customized email alerts 184
Taming Transactional Replication 193
Transactional Replication Architecture 194
Transactional Replication Use Cases 196
Other Types of Replication 198
Snapshot replication 198
Merge replication 198
Transactional replication with updating subscriptions 199
Peer-to-peer transactional replication 199
When not to Use Replication 199
Deeper into Transactional Replication 199
Publishers, publications and articles 200
321
The Subscribers 201
The Replication Agents 202
Transactional Replication Requirements 203
Transactional Replication Walk-through 204
Configuring the Distributor 204
Configuring the Publisher and publication 207
Configuring the Subscriber 214
Tuning Query Performance on the Subscriber 218
Monitoring Replication with ReplMon 218
Subscription Watch List 219
Replication latency and tracer tokens 220
Getting details of errors 221
Configuring Alerts 223
Performance Tuning 226
Server and database file configuration and tuning 226
Tuning replication 227
Troubleshooting Replication 231
Finding more details on an error 231
Using verbose error output 232
Finding large batches 232
Restoring accidentally dropped replication objects 233
Reinitializing Subscribers 233
Summary 234
Building Better Reports 235
Key Report Development Skills 235
Communication 236
Technical skills and versatility 238
Report writer, Business Intelligence (BI) specialist or data scientist? 240
The Fundamentals of Good Report Design 240
Requirements gathering 241
Initial visualization 242
Designing tables and graphs 242
Choosing the Right Reporting Tools 251
Reporting in the Microsoft stack 252
Predicting the future 253
Outside the Microsoft stack 253
Who decides and, most importantly, who pays? 254
A personal story 254
322
A personal story 254
Summary 256
Communication isn't Soft 258
The Written Word 258
Read more to write better 258
Write right now 259
The Spoken Word 261
A different kind of audience 261
Lunch-and-learn 262
User groups, SQL Saturday and beyond 263
Guerrilla Project Management for DBAs 266
A DBA's Crazy Workload 266
A DBA's Place in the Organization 267
The Dark Side of Being a DBA 267
The Shield of Project Management 268
Common project management terms cheat sheet 269
Specific defensive moves 271
Offensive project management 276
Onwards 278
Agile Database Development 280
Agile 101: A History 280
Agile 201: Implications 281
ABC – Automate, Balance, and Communicate 281
Automation 281
Balance 283
Communicate, constantly 283
DDT – Design, Deployments and Tests 284
Design 284
Deployments 287
Tests 291
Conclusion 293
Nine Habits to Secure a Stellar Performance Review 294
Habit 1: Work your Tail Off 294
Habit 2: Establish the Parameters for your Success 295
Habit 3: Work with Vision 296
Habit 4: Train to Gain an Edge 297
Habit 5: Stand on the Shoulders of Giants 298
Habit 6: Control the Headlines 302
323
Habit 7: Write a Self-Appraisal that Sparks Memory of Success 303
Habit 8: Use your Review to Negotiate Rewards 305
Habit 9: Don't Rest on Past Success 306
Index 308
Crunching the Numbers 32
Index design 32
Storage requirements 33
Summary 33
Reporting On Success or Failure of SQL Server Agent Jobs 185
Enable a database mail profile for alert notification 186
Designate a Fail-safe operator 186
Configure an operator 186
Configuring notifications for Agent jobs 187
Provide a Real-time Notification System for SQL Server Alerts 187
Troubleshooting Database Mail 189
Common problems 189
Interrogating the Database Mail system views 190
Maintaining the Database Mail log table 191
The SSMS Database Mail log 192
Summary 192
Practice Pays Off 263
Self-belief 264
Communicating expertise 264
Practice Makes Perfect 264
324