0% found this document useful (0 votes)
30 views61 pages

HA Solutions For Windows, SQL, and Exchange Servers

This document is a collection of chapters about high availability solutions for Windows, SQL, and Exchange servers. The chapters discuss topics such as business continuity planning, backing up Group Policy Objects, automated backup solutions, recovering from database crashes, implementing SQL Server on a SAN, and building an Exchange 2003 cluster. The document contains expert advice and step-by-step instructions on setting up and maintaining high availability environments.

Uploaded by

kevin.haklar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views61 pages

HA Solutions For Windows, SQL, and Exchange Servers

This document is a collection of chapters about high availability solutions for Windows, SQL, and Exchange servers. The chapters discuss topics such as business continuity planning, backing up Group Policy Objects, automated backup solutions, recovering from database crashes, implementing SQL Server on a SAN, and building an Exchange 2003 cluster. The document contains expert advice and step-by-step instructions on setting up and maintaining high availability environments.

Uploaded by

kevin.haklar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

ITPro ™

SERIES

HA
Solutions
for Windows, SQL and Exchange Servers

by Sameer Dandage
Daragh Morrissey
Jeremy Moskowitz
Paul Robichaux
Mel Shum
Ben Smith
Bill Stewart
i

Contents
Chapter 1: Surviving the Worst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Ben Smith
A 6-Step Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Step 1: Identify Critical Business Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Step 2: Map IT Systems to Critical Business Activities . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Step 3: Model Threats Posed by Predictable and Plausible Events . . . . . . . . . . . . . . . . . . 2
Step 4: Develop Plans and Procedures for Preserving Business Continuity . . . . . . . . . . . . 2
Step 5: Develop Plans and Procedures for Recovering from Disaster . . . . . . . . . . . . . . . . 3
Step 6: Test Business Continuity Plans and Practice Disaster Recovery . . . . . . . . . . . . . . 3
6 Steps Away from Disaster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2: Don’t Wait—Back Up Those GPOs Now! . . . . . . . . . . . . . . . . 5


Jeremy Moskowitz
A Little History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Using GPMC to Back Up GPOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Restore GPOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Copy or Migrate GPOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
The Sooner, the Better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 3: Want a Flexible Automated Backup Solution? . . . . . . . . . . . . . 12


Bill Stewart
Step 1. Understand the Supporting Scripts’ Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Step 2. Understand How the Main Script Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Step 3. Prepare the Environment and Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Ready, Test, Go . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Chapter 4: After the Crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22


Sameer Dandage
Publisher Database Crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Distributor Database Crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Subscriber Database Crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Queue Complications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Prepare, Rehearse, and Shine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
ii HA Solutions for Windows, SQL, and Exchange Servers

Chapter 5: SQL Server on a SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27


Mel Shum
SAN Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
SAN Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
SAN Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
When Using DAS Makes Sense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Sidebar: Selecting a Storage Array for a SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
I’m Ready for a SAN. Now What? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Step Up to a SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Chapter 6: SQL Server HA Short Takes . . . . . . . . . . . . . . . . . . . . . . . . . . . 34


Running SQL Server on Raid . . . . . ...................................... 34
Michelle A. Poolet
High Availability in Analysis Services ...................................... 35
Carl Rabeler
The High Availability Puzzle . . . . . ...................................... 35
Michael Otey
High Availability Options . . . . . . . . ...................................... 36
Michael Otey
Log Backup Checklist . . . . . . . . . . ...................................... 37
Kalen Delaney
Are Your Backups Useless? . . . . . . ...................................... 39
Brian Moran

Chapter 7: Exchange and SANs: No Magic Bullet . . . . . . . . . . . . . . . . . . . 40


Paul Robichaux

Chapter 8: Build an Exchange 2003 Cluster . . . . . . . . . . . . . . . . . . . . . . . 42


Daragh Morrissey
Exchange Virtual Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Exchange Cluster Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Preparing Your Cluster for the Exchange Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Ready for the Next Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Before You Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Creating an Exchange Virtual Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Installing Exchange 2003 SP2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Post-Installation Tasks and Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Sidebar: Planning Your Exchange Cluster Deployment . . . . . . . . . . . . . . . . . . . . . . . 56
1

Chapter 1:

Surviving the Worst


Ben Smith

It’s 2:00 A.M. on a Monday morning, and your cell phone rings. The water fountain on the floor
directly over your server room has malfunctioned, and your organization’s servers and routers are
standing in water, as are most of your employees’ workstations. The office opens at 8:00 A.M. What
do you do in the meantime?
Situations like this one separate IT departments that have planned for disaster from those that
haven’t. For the latter group, the situation I’ve described is more than a disaster—it’s an absolute
disaster. When total data loss is possible, the absence of a disaster recovery program can put a
business at risk, particularly small-to-midsized businesses (SMBs), which often don’t have the financial
wherewithal to survive unexpected catastrophic events. Although disasters are inevitable and, to a
degree, unavoidable, being prepared for them is completely within your control. Increasingly, IT has
become the focal point of many companies’ disaster planning. Creating a program to preserve
business continuity and recover from disaster is one of the central value propositions that an IT
department can contribute to an organization.

A 6-Step Plan
In the terminology of disaster planning, two phrases are common: Business Continuity Planning (BCP)
and Disaster Recovery Planning (DRP). Although many people use these phrases interchangeably,
they represent different concepts. BCP traditionally defines planning that ensures an organization can
continue operating when faced with adverse events. DRP is actually a subset of BCP and traditionally
focuses on recovering information and systems in the event of a disaster. As an example, the failure
of a hard disk in a database server is an event that potentially affects business continuity but doesn’t
result from a disaster. However, a water-pipe break that floods a server room and submerges the
database server is a threat to business continuity and within the scope of disaster recovery planning.
BCP and DRP can be complex; in fact, large organizations dedicate groups of people to them.
But without getting into detailed risk analyses and other complexities that usually accompany BCP
and DRP in large companies, all organizations can benefit by following six steps to create a program
that will preserve business continuity and facilitate recovery in the event of disaster.

Step 1: Identify Critical Business Activities


The first step in BCP and DRP is to identify your organization’s critical business activities—those
things that must occur on a daily basis in order for your business to stay in business. For example, a
customer service call center must be able to receive calls, look up customer records, and create new
incident records for customers calling in. A law firm will need to be able to access client information
and electronic schedules, send and receive email, research online law libraries, and make and receive
telephone calls. As you work through this step, you’ll need to partner with your organization’s key

Brought to you by Neverfail and Windows IT Pro eBooks


2 HA Solutions for Windows, SQL, and Exchange Servers

business decision makers to identify the activities that are essential to your organization’s continued
functioning. Your organization’s BCP will center on preserving continuity of operations by recovering
these services.

Step 2: Map IT Systems to Critical Business Activities


With the identification of your organization’s key business activities, you can determine which IT
systems these activities depend on. For example, to enable the customer service call center to look
up customer records and create new records for incoming calls, the database servers that store the
records and the line-of-business applications that access them must be available. In turn, some degree
of core network infrastructure will also need to be operable for this critical business activity to take
place. These are the IT systems that you must be able to keep operating by quickly recovering them
after a disaster.

Step 3: Model Threats Posed by Predictable and Plausible Events


Nearly all disasters and failures in business continuity are predictable to a certain degree of precision
and plausible within a certain degree of reason. Such events can be natural, such as an earthquake or
flooding; human-caused, such as an accidental fire or deliberate sabotage; or mechanical, such as a
hard disk failure or a water pipe bursting. For example, if a customer service call center is located in
Wakita, Oklahoma, it is plausible that the center’s IT systems could be in the direct path of a tornado.
Likewise, for any company that relies on technology, it is predictable that computer hardware will
eventually fail.
After you identify your critical IT systems, you can begin modeling the threats posed to these
systems by predictable and plausible events. Threat modeling lets you apply a structured approach to
identifying threats with the greatest potential impact to your business continuity and their mitigation.
List all the ways that critical IT systems might be disrupted and which events must happen for each
threat to be realized. For example, something that would disrupt the call center’s business continuity
might be the customer record database’s inaccessibility. Events that could cause such inaccessibility
include computer hardware failure, a power failure, or something more severe, such as destruction of
the data center by a tornado.

Step 4: Develop Plans and Procedures for


Preserving Business Continuity
Now that you’ve listed your critical business activities, identified the IT systems your business
depends on for carrying out those activities, and brainstormed the possible and plausible events that
could disrupt IT services, you can use your threat model to determine countermeasures to preserve
business continuity. Four primary BCP countermeasures exist: fault tolerance and failover, backup,
cold spares and sites, and hot spares and sites.

Fault tolerance and failover. This countermeasure relies on the use of redundant hardware to
enable a system to operate when individual components fail. In IT, the most common fault tolerance
and failover solutions for preserving IT operations are hard disk arrays, clustering technologies, and
battery or generator power supplies.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 1 Surviving the Worst 3

Backup. On- and offsite backup programs are a core countermeasure in DRP. Backup gives you the
ability to restore or rebuild recent data to a known good state in the event of data loss.

Cold spares and sites. Cold spares are offline pieces of equipment that you can easily prepare to
take over operations. For example, you might maintain a set of servers that aren’t connected to your
network and that have your company’s standard OS installed and configured. In the event of an
emergency, you can complete the configuration and restore or copy necessary data to resume
operation. Similarly, a cold site is a separate facility that you can use to resume operation if a disaster
befalls your primary facility. Often, a cold site is nothing more than a large room that can
accommodate desks and chairs. For most SMBs, cold sites aren’t cost-effective.

Hot spares and sites. Hot spares are pieces of equipment that are ready for immediate use after a
disaster. For example, you might continuously replicate a critical database’s data to remote facilities so
that client applications can be redirected to the data replicas if necessary. Hot sites are facilities that
let you resume operations in a very short amount of time—typically, a hot site is
operational within the time it takes for employees to arrive at the facility. Hot sites have real-time or
near real-time replicas of data and are always operational. Because hot spares and sites are expensive
to maintain, only organizations that must be operational in a disaster, such as a public safety
organization, use them.

Step 5: Develop Plans and Procedures for Recovering from Disaster


Not all events are predictable or plausible. There is perhaps no better example of this kind of event
than the September 11, 2001, attack on the World Trade Center. For these types of disastrous
circumstances, as well as for other severe disasters in which total data or service loss from primary
systems is possible, you must create plans and procedures for recovering systems. Because recovering
from a disaster is stressful, having well-documented, tested, and practiced procedures in place
beforehand is essential. Similarly, rehearsing recovery procedures can help you verify that the data on
backup media is usable and restorable. Be sure to store copies of your DRP procedures offsite with
your verified backups. For most organizations, bank safe deposit boxes are the most effective,
affordable, and secure remote storage solution for verified backups and DRP plans.

Step 6: Test Business Continuity Plans and


Practice Disaster Recovery
Test, test, test. When it comes to BCP and DRP, the very nature of the circumstances that necessitate
their existence dictates that the plans, procedures, and technologies you use to preserve business
continuity must work when they are required. Conduct planned and spontaneous drills to test your
BCP and DRP. These drills might include failing over cluster nodes on a monthly basis, restoring cold
spare servers periodically, or even conducting full cold- or hot-site disaster simulations. At an absolute
minimum, perform DRP restoration of critical data from offsite backup media periodically. Off-site
backup media is your last line of defense against total data loss.

Brought to you by Neverfail and Windows IT Pro eBooks


4 HA Solutions for Windows, SQL, and Exchange Servers

6 Steps Away from Disaster


By following these steps, you can help your organization create a BCP and DRP program that will
shield it from the risk of natural, human-caused, and mechanical disasters. When the cell phone rings
at two in the morning, the last thing you want to be doing is brainstorming ways to recover data
from a server and backup tapes that have been under water for 30 hours or, even worse, recovering
from the physical destruction of your data center after disastrous circumstances.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 2 Don’t Wait—Back Up Those GPOs Now! 5

Chapter 2:

Don’t Wait—Back Up Those GPOs Now!


Jeremy Moskowitz

When you formulate a backup and recovery strategy for your Windows systems, you need to make
sure to include Group Policy Objects (GPOs) in that strategy. Microsoft provides a means to back up
and restore GPOs in the form of Group Policy Management Console (GPMC), a Microsoft
Management Console (MMC) snap-in that you can use to manage GPOs on Windows Server 2003
and Windows 2000 Server systems. In June, Microsoft released GPMC with Service Pack 1 (SP1)—an
updated version of GPMC—that lets you back up, restore, and copy or migrate GPOs on Windows
2003, Windows XP, and Win2K Server systems without requiring you to have Windows 2003
installed. (Previously, you needed a Windows 2003 license to use GPMC.) This change means that
you can use GPMC to back up, restore, and copy GPOs in domains with any combination of
Windows 2003, XP, and Win2K Server systems. You can download the free GPMC with SP1 at
http://tinyurl.com/ysx4u.

A Little History
In the early days of Active Directory (AD) and Group Policy, the only way to back up and recover
GPOs was to use the NTBackup utility, then perform an AD authoritative restore—a procedure not
for the faint of heart. One irritating characteristic of NTBackup is that it backs up the entire system
state as well as the GPOs themselves and thus requires a hefty chunk of free disk space to house
each instance of the system state.
Performing an authoritative restoration of a GPO that had been accidentally deleted, changed, or
corrupted was even more complicated. First, you had to take offline the domain controller (DC) on
which you ran NTBackup and reboot the DC in Directory Services Restore Mode. Then, you had to
restore the backup to prepare the server with the data you wanted to restore. Finally, you performed
an authoritative restore, which required you to know the complete distinguished name (DN) of the
deleted or modified GPO. Don’t confuse the DN with the GPO’s more familiar friendly name—for
example, “Security Settings for the Sales OU.” The DN is a complicated string that includes the GPO
path in DN format along with the GPO’s globally unique identifier (GUID)—for example,
cn={01710048-5F93-4F48-9DD2-A71C7486C431}, cn=policies,cn=system,DC=corp,DC=com, where the
GUID is the component preceding the first comma. If you didn’t know the GPO’s GUID before the
disaster, you had little hope of recovering it (and thus, little hope of restoring the GPO). At this point
in the GPO restoration process, people often just gave up. Third-party products made the GPO
backup-and-restore process bearable. Indeed, third-party tools are available today that include a GPO
backup-and-restore feature.

Brought to you by Neverfail and Windows IT Pro eBooks


6 HA Solutions for Windows, SQL, and Exchange Servers

Using GPMC to Back Up GPOs


IT pros were more than ready for a better Windows tool to back up and recover GPOs. Soon after
Microsoft released Windows 2003, the company fulfilled customers’ wishes when it delivered GPMC
and, more recently, GPMC with SP1.
To start using GPMC, your first task is to install GPMC by loading it on a Windows 2003 or an
XP machine. After you’ve installed GPMC, start the program, then navigate to Forest, Domains,
domain name, Group Policy Objects. You’ll see a list of all GPOs in the domain. At this point, you
can perform one of two actions: Back up all GPOs, or back up individual GPOs. To back up all
GPOs, right-click the Group Policy Objects node and select Back Up All from the context menu, as
Figure 1 shows. (Alternatively, you can right-click a single GPO and select Backup.)
Next, you’re prompted to enter the directory in which to store the backups and the name of the
backup set. Although you can store the backup files anywhere, I recommend that you store them in
a secure location. The GPMC then backs up each GPO in the domain and stores the backed-up
GPOs as files in subdirectories of the directory you specified. At this point, you’re ready to burn the
resulting directories and files to a CD-ROM, copy them to tape or to another secure server, or
otherwise ensure that they remain safe.

Figure 1
Backing Up All GPOs

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 2 Don’t Wait—Back Up Those GPOs Now! 7

If you examine the automatically generated subdirectories that the system creates during backup,
you’ll notice that the names of these directories resemble the GUIDs that I described earlier.
However, what isn’t immediately obvious is that these directory and GUID combinations don’t
correspond to the GUID of the underlying GPO and, in fact, are unique and unrelated to the GPO’s
GUID. This distinction lets you back up a GPO without fearing collision with an existing subdirectory.
You can store all the backups in the same subdirectory or in different ones.

Restore GPOs
If a GPO is deleted, corrupted, or becomes otherwise invalid and you want to restore the backed-up
GPO, you can do so at any time by right-clicking the Group Policy Objects node and selecting
Manage Backups. In the Manage Backups dialog box, which Figure 2 shows, you choose which
GPOs you want to restore.

Figure 2
Restoring a Backed-up GPO

Select the location of the GPO backup you want to restore from the Backup location drop-down
list, or click Browse. If you’ve created multiple backups of one or more GPOs in the same directory,
simply select the Show only the latest version of each GPO check box to view the most recent set that
you backed up. Otherwise, all GPOs you wrote to this directory will be displayed along with the time
they were backed up. If you need a reminder about what settings were preserved within a GPO,

Brought to you by Neverfail and Windows IT Pro eBooks


8 HA Solutions for Windows, SQL, and Exchange Servers

simply click the GPO, then click View Settings. Finally, when you’re ready, select the GPO and click
Restore. If you want to remove a GPOs from a particular backup set, you can do so by clicking
Delete.

Copy or Migrate GPOs


Depending on how your organization is constructed, you might choose to first create your GPOs
somewhere other than the eventual target location. For instance, you might generate all your GPOs in
a test domain—perhaps in a domain that’s online and trusts the production network or in a lab,
completely offline and isolated. Then, after the GPOs have been fully tested and are ready, you can
migrate them to your production domain. GPMC can help you transition your GPOs from test to
production in both situations.
You can use GPMC to copy a GPO within a domain or from one domain to another. You’ll need
to tell GPMC to change the view to display the available domains; to do so, right-click the Domains
node and select Show Domains. Next, to copy a GPO, right-click that GPO and select Copy. Then,
simply right-click the domain’s Group Policy Objects container, and select Paste to create a copy.
Migrating GPOs to your production domain from an offline lab is a bit more difficult. To do this,
you use GPMC’s Import function. The migration steps that you perform when using the Import
function are related to the backup-and-restore procedure. To migrate a GPO between domains that
are in different forests, perform these steps:
1. Make a backup copy of the source domain’s GPOs.
2. Create a new GPO in the target domain (or choose to overwrite an existing GPO).
3. Right-click the target GPO and select Import Settings. Doing so starts the Import Settings Wizard
that Figure 3 shows. The wizard lets you select a backup set and choose a GPO from which to
import.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 2 Don’t Wait—Back Up Those GPOs Now! 9

Figure 3
Import Settings Wizard

The Import Settings Wizard lets you back up the target GPO before you import the source GPO
to it. You need to back up the target GPO only when it isn’t newly created.
The steps I just walked through make up the basic procedure for copying or migrating a GPO by
using the Import command. However, if the GPO from which you want to import settings contains
Universal Naming Convention (UNC) paths or security groups, you’ll probably need to use the GPMC
migration table feature. For instance, the Group Policy Software Installation and Folder Redirection
settings functions use UNC pathnames. To appropriately specify software to distribute, the GPO
typically launches a Windows Installer (.msi) file that’s located in the UNC path—for example,
\\Server1\share. However, a server named Server1 might not exist in the target domain. Or, worse,
Server1 does exist, but you don’t want your users to use that server. To ensure that you import the
correct GPO with the correct UNC and security group references, you need to process your migration
Copy and Import function with a migration table.
The migration table lets you convert any UNC references from the source domain into valid
references in the target domain. The Import Settings Wizard automatically alerts you of UNC paths in
the source domain and gives you two options for handling UNC pathnames, as Figure 4 shows. The
first option, Copying them identically from the source, typically isn’t a wise choice because, as I
mentioned earlier, the UNC or security group references in the source domain might not exist in the
target domain; thus, the GPO probably won’t work after you copy it to the target domain. Therefore,
the better choice is to select the other option, Using this migration table to map them in the
destination GPO. To create your first migration table, at the Migrating References window that
Figure 4 shows, click New. You’ll see the Migration Table Editor, the spreadsheet-like dialog box that
Figure 5 shows. You can start by filling in the table with the information you know. Because you’re
importing from a backup, select Tools, Populate from Backup. Next, select the GPO that you’ll be

Brought to you by Neverfail and Windows IT Pro eBooks


10 HA Solutions for Windows, SQL, and Exchange Servers

migrating. Doing so automatically populates the Source Name column with all the UNC references in
the GPO you’ve specified. Then, simply type a new UNC path (or security group reference) in the
Destination Name field for each UNC path (or security group) you need to migrate. In Figure 5, you
can see that the selected GPO includes the UNC pathname \\OLDServer\Software. However, in the
target domain, this server doesn’t exist. Therefore, you need to enter the appropriate pathname for
the GPO, such as \\NEWServer\OurStuff, to ensure that the GPO has the correct references in the
target domain.

Figure 4
Migrating References window

Figure 5
Migration Table Editor

After you’ve entered the new destination names, select Tools, Validate in the Migration Table
Editor to ensure that all the destination names are valid. After you’ve verified their validity, select File,
Save as to save the migration table you just created, then close it. The Migrating References window
is again displayed. Again, select the Using this migration table to map them in the destination GPO
option, and from the drop-down list select the migration table that you want to use. Typically, you’ll
select the migration table you just created, although you can select a previously created migration

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 2 Don’t Wait—Back Up Those GPOs Now! 11

table instead. Using an existing migration table comes in handy when you’re repeating the same
actions—for instance, when you want to transfer the same GPO from one domain to several other
domains. I also recommend that you select the Use migration table exclusively... check box, to ensure
that you always migrate GPOs with valid destination references.

The Sooner, the Better


Domain administrators can use GPMC to back up, recover, and transfer GPOs safely. Use this chapter
as a guide for using Microsoft’s useful GPMC with SP1 to perform GPO backup, recovery, and copy
and migration operations. Don’t wait to back up those GPOs—do it today! The domain you save
might be your own.

Brought to you by Neverfail and Windows IT Pro eBooks


12 HA Solutions for Windows, SQL, and Exchange Servers

Chapter 3:

Want a Flexible Automated Backup


Solution?
Bill Stewart

With the release of Windows 2000, Microsoft overhauled the built-in backup utility NTBackup and
added media-management and scheduling capabilities. Although these updates are welcome, com-
ments I have read in Web support forums and newsgroups suggest that many administrators and
other users have been frustrated in their attempts to get NTBackup to work in their environments.
At my company, we previously used third-party backup software on Windows NT 4.0 because
NT 4.0’s NTBackup tool wasn’t robust enough for our needs. When we upgraded to Windows Server
2003, I took a second look at NTBackup to determine whether the overhauled version would be
robust enough and whether it could be easily automated. During this process, I discovered several
shortcomings in NTBackup:
• There’s no simple way to write a backup to an arbitrary tape unless you use the /um (unman-
aged) option on the command line. This option lets you overwrite an arbitrary tape. However,
when NTBackup overwrites a tape in a backup job, it always applies a new media label—either
a label you specify or a generic label based on the current date and time. There’s no built-in way
to tell NTBackup to keep the current label but overwrite the tape.
• There’s no simple way to append backup information to an inserted tape because you must use
either the /t (tape name) or /g (globally unique identifier—GUID) options on the command line.
Unmanaged mode won’t work because you can only append to a specifically designated tape.
• NTBackup can’t eject a tape after a backup.
• NTBackup can’t email or print completed job logs.

To overcome these shortcomings, I created a set of scripts, which Table 1 lists. The set includes a
main script and 13 supporting scripts. The main script, Backup.cmd, uses NTBackup to perform a
backup that overwrites the current tape but doesn’t change the tape’s media label. The 13 supporting
scripts perform various functions for Backup.cmd. Together, these scripts provide a backup solution
that you can easily customize and use—no script-writing experience is necessary to run them. You
simply follow these steps:
1. Understand what each supporting script does so you can customize a solution.
2. Understand how the main script works so that you can customize it.
3. Prepare your environment and the scripts.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 3 Want a Flexible Automated Backup Solution? 13

Table 1 Scripts to overcome NTBackup’s shortcomings


The Scripts and Their Functions
Script Name Function Dependencies*
Backup.cmd Performs the backup with the help All except TapePrep.cmd
of the other scripts.
Eject.cmd Ejects the tape. PhysicalMedia.cmd, Refresh.cmd, ram.exe, and
SetupVars.cmd
Library.cmd Outputs the media library’s name Findstr.exe, RSMView.cmd, and SetupVars.cmd
and/or GUID.
MailLog.cmd Emails the most recent NTBackup log file. Cmd.exe, blatexe, Setup Vars.cmd, an ShowLog.cmd
MediaGUID.cmd Outputs the tape’s logical-media GUID. Partition.cmd, PhysicalMedia.cmd, RSMView.cmd,
and SetupVars.cmd
MediaName.cmd Outputs the media label for the tape. PhysicalMedia.cmd, RSMView.cmd, and SetupVars.cmd
Partition.cmd Outputs the partition’s name and /or GUID. PhysicalMedia.cmd, RSMView.cmd, and SetupVars.cmd
PhysicalMedia.cmd Outputs the tape’s name and/or GUID. Library.cmd, RSMView.cmd, and SetupVars.cmd
PrintLog.cmd Prints the most recent NTBackup log file. Notepad.exe, SetupVars.cmd, and ShowLog.cmd
Refresh.cmd Refreshes the media library. SetupVars.cmd, Library.cmd, rsm.exe, and sleep.exe
RSMView.cmd Handles the Rsm View command in Findstr.exe, rsm.exe, and SetupVars.cmd
the scripts.
SetupVars.cmd Configures environment variables for the None
other scripts.
ShowLog.cmd Outputs the most recent NTBackup log SetupVars.cmd
or its filename.
TapePrep.cmd Labels and allocates a tpe for use by Cmd.exe, ntbackup.exe, PhysicalMedia.cmd,
NTBackup. Refresh.cmd and SetupVars.cmd
* Blat.exe is available from http://www.blat.net. Cmd.exe, findstr.exe, notepad.exe, ntbackup.exe, and rsm.exe are part of the Windows
OS (%SystemRoot%\system32). Sleep.exe is available in sleep.zip, which is in 44990.zip. It’s also in various Windows resource kits.

Step 1. Understand the Supporting Scripts’ Functions


Backup.cmd is the skeleton of this backup solution, whereas the 13 supporting scripts are its muscles.
Here’s how the 13 scripts flex their muscles and contribute to the end result:

RSMView.cmd. NTBackup manages tapes and other media by using the Removable Storage
service, which has a database in the %SystemRoot%\system32\NtmsData directory. You can use a
command-line tool called RSM to manage the Removable Storage service. One of RSM’s most useful
commands is Rsm View, which lets you view the objects in the Removable Storage service’s database.
The object types of interest here are the LIBRARY, PHYSICAL_MEDIA, PARTITION, and
LOGICAL_MEDIA objects. (The object model for the Removable Storage database contains many
other types of objects.) A LIBRARY object represents a physical device that uses removable media.
The device we’re interested in is the tape drive. One type of object the LIBRARY object can contain is
the PHYSICAL_MEDIA object, which represents the physical medium—in this case, a tape. The
PHYSICAL_MEDIA object, in turn, can contain a PARTITION object, which represents a partition.
(The GUI uses the term side rather than partition, but the two terms refer to the same thing.) One

Brought to you by Neverfail and Windows IT Pro eBooks


14 HA Solutions for Windows, SQL, and Exchange Servers

object the PARTITION object can contain is the LOGICAL_MEDIA object, which represents the
allocated partition of a tape.
When you use the Rsm View command to display the objects in a container object (i.e. an object
that contains other objects), you must specify the container object by its GUID. Determining the
GUID is simply a matter of using one of RSM’s options. When you use Rsm View with the
/guiddisplay option, RSM outputs a list of objects and their associated GUIDs. With the GUID in
hand, you can cut and paste the GUID into the Rsm View command. Cutting and pasting a couple
of GUIDs isn’t a problem, but having to cut and paste many of them can become quite tedious.
Scripting is the perfect solution to eliminate this tedious task. So, the RSMView.cmd script handles the
Rsm View command in the rest of the scripts. You don’t need to cut and paste a single GUID.

Library.cmd. Library.cmd obtains the tape drive’s friendly name and GUID or just the tape drive’s
GUID. If you run Library.cmd by itself, it will display the tape drive’s name and GUID. This
information is helpful for troubleshooting if you have problems with the scripts. When Backup.cmd
runs Library.cmd, Library.cmd retrieves only the GUID because Backup.cmd doesn’t need the tape
drive’s name.

PhysicalMedia.cmd. PhysicalMedia.cmd obtains the inserted tape’s friendly name and physical-
media GUID (i.e., the GUID for the physical tape) or just the tape’s physical-media GUID. If you run
PhysicalMedia.cmd by itself, it will display the tape’s name and GUID. When Backup.cmd runs
PhysicalMedia.cmd, PhysicalMedia.cmd retrieves only the tape’s GUID.

Partition.cmd. Partition.cmd obtains the partition’s friendly name and GUID or just the partition’s
GUID. If you run Partition.cmd by itself, it will display the partition’s name and GUID. When
Backup.cmd runs Partition.cmd, Partition.cmd returns only the partition’s GUID.

MediaGUID.cmd. MediaGUID.cmd outputs the inserted tape’s logical-media GUID (i.e., the GUID
for the allocated partition of the tape), which is used with NTBackup’s /g option. MediaGUID.cmd
outputs the GUID in the format that NTBackup needs.

MediaName.cmd. MediaName.cmd obtains the inserted tape’s media label as it appears in the Name
field on the Side tab of the tape’s Properties dialog box in the Removable Storage console. Figure 1
shows a sample Properties dialog box. Backup.cmd uses the media label obtained by MediaName
.cmd with NTBackup’s /n option to reapply the existing media label when overwriting a tape.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 3 Want a Flexible Automated Backup Solution? 15

Figure 1
Sample Properties dialog box

Note that NTBackup uses the Info field when referring to tapes by name. Windows 2003 and
Windows XP automatically set the Name field to the Info field. However, when I was testing Win2K,
I discovered the Name field was left blank, and I had to copy the label in the Info field to the Name
field. In any case, the Name field must match the Info field on the Side tab; otherwise,
MediaName.cmd will fail or return incorrect results.

Refresh.cmd. Depending on your hardware, RSM isn’t always able to detect when tapes are inserted
and removed. Thus, before running a backup job, it’s important to perform a refresh operation to
ensure that the database contains the tape drive’s actual state. Refresh.cmd refreshes the tape drive. If
the refresh operation is successful, Refresh.cmd uses sleep.exe to pause the script so that the database
will return up-to-date information.

Eject.cmd. Eject.cmd ejects the tape after the backup is complete. The script then calls Refresh.cmd
to refresh the tape drive to ensure the Removable Storage service’s database will return up-to-date
information.

Brought to you by Neverfail and Windows IT Pro eBooks


16 HA Solutions for Windows, SQL, and Exchange Servers

ShowLog.cmd. ShowLog.cmd outputs the most recent NTBackup log file (i.e., the log you’d see in
the NTBackup GUI) to the screen. If you run ShowLog.cmd with the /f option, it will output the log’s
full path and filename instead.

PrintLog.cmd. PrintLog.cmd uses Notepad to print the most recent NTBackup log file. You need to
configure a default printer before PrintLog.cmd will work. Summary logs (/L:s) are recommend if you
use PrintLog.cmd.

MailLog.cmd. MailLog.cmd uses blat.exe to email the most recent NTBackup log file to the specified
person (e.g., the administrator). Blat.exe is a free utility that lets you send mail messages from a
command script.

SetupVars.cmd. SetupVars.cmd defines the settings of key environment variables used in the other
scripts. This setup makes the scripts easily portable to different computers. You just need to modify
the settings in one script (i.e., in SetupVars.cmd) rather than 13 scripts. (In Step 3, I discuss the
settings you need to modify.)

TapePrep.cmd. When you have one or more new tapes you want to label and allocate to
NTBackup, you can use TapePrep.cmd. This script runs NTBackup in unmanaged mode so that
NTBackup can use whatever tape happens to be inserted. The script labels the tape with the text
you specify.
You run TapePrep.cmd at the command line and not in a script such as Backup.cmd because
you need to use the script only once per tape. The command to launch TapePrep.cmd is
TapePrep media_label

where media_label is the text you want to use as the label. If the text contains spaces, you must
enclose it in quotes.

Step 2. Understand How the Main Script Works


As I mentioned previously, Backup.cmd runs a backup that overwrites the inserted tape, without
changing the tape’s media label. This script either directly or indirectly uses all the other scripts,
except for TapePrep.cmd.
Listing 1 contains Backup.cmd. Callout A in Listing 1 shows the environment variables that direct
its operation. To use this script, you need to customize these variables, which you’ll do in Step 3.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 3 Want a Flexible Automated Backup Solution? 17

Listing 1: Backup.cmd
@Echo Off
Setlocal EnableExtensions EnableDelayedExpansion

BEGIN COMMENT
:: Set up the environment variables.
END COMMENT
Call SetupVars

BEGIN COMMENT
A :: Set up the variables for the current script.
END COMMENT
Set BACKUP=C:\NTBackup
Set JOBNAME=NTBackup tools
Set SETDESC=NTBackup tools [%DTSTAMP%]
Set JOBINFO=NTBackup job ‘%JOBNAME%’ on %COMPUTERNAME%
Set OPTIONS=/hc:on /l:s /m normal /r:yes /v:no
Set LOGFILE=% dpn0.log
Set RC=0

BEGIN COMMENT
:: Refresh the media library.
END COMMENT
Call Refresh > “%LOGFILE%”

BEGIN COMMENT
:: Make sure the tape is inserted.
END COMMENT
Call PhysicalMedia > nul
If %ERRORLEVEL% NEQ 0 (Set RC=%ERR_PHYSICAL_MEDIA%
Call :DIE “%JOBINFO% aborted: Media not detected (!RC!)”
Goto :END)

BEGIN COMMENT
:: Determine the tape’s media label.
END COMMENT
Set MEDIANAME=
For /f “delims=” %%n in (‘MediaName /b’) Do Set MEDIANAME=%%n
If Not Defined MEDIANAME (Set RC=%ERR_PARTITION%
Call :DIE “%JOBINFO% aborted: Unable to determine media name (!RC!)”
Goto :END)
Echo Media name: %MEDIANAME% >> “%LOGFILE%”

BEGIN COMMENT
:: Determine the GUID for the current tape.
END COMMENT
Set MEDIAGUID=
For /f %%g in (‘MediaGUID /b’) Do Set MEDIAGUID=%%g
If Not Defined MEDIAGUID (Set RC=%ERR_LOGICAL_MEDIA%
Call :DIE “%JOBINFO% aborted: Unable to determine media GUID (!RC!)”
Goto :END)
Echo Media GUID: %MEDIAGUID% >> “%LOGFILE%”

Set COMMAND=ntbackup backup %BACKUP% /j “%JOBNAME%” /d “%SETDESC%”


B /g %MEDIAGUID% /n “%MEDIANAME%” %OPTIONS%
:: END CALLOUT B
Echo NTBackup command line: >> “%LOGFILE%”
Echo %COMMAND% >> “%LOGFILE%”
%COMMAND%
Set RC=%ERRORLEVEL%
Echo NTBackup exit code: %RC% >> “%LOGFILE%”

If /i “%EJECT%”==”YES” Call Eject >> “%LOGFILE%”


C If /i “%MAILLOG%”==”YES” Call MailLog “%JOBINFO% [%DTSTAMP%] (%RC%)”
If /i “%PRINTLOG%”==”YES” Call PrintLog
Goto :END

:DIE
D Setlocal
Set ERRORMSG=% 1
Echo %ERRORMSG% > “%NTBACKUP_DATA%\error.log”
Echo %ERRORMSG% >> “%LOGFILE%”
If /i “%MAILLOG%”==”YES” call MailLog “%ERRORMSG%”
If /i “%PRINTLOG%”==”YES” call PrintLog
Endlocal & Goto :EOF

:END
Endlocal & Exit /b %RC%

Brought to you by Neverfail and Windows IT Pro eBooks


18 HA Solutions for Windows, SQL, and Exchange Servers

After defining the variables, Backup.cmd uses Refresh.cmd to refresh the tape drive. The script
then performs three important operations. First, Backup.cmd uses PhysicalMedia.cmd to see whether
a tape is inserted in the tape drive. Then, Backup.cmd uses MediaName.cmd to determine the tape’s
media label. Finally, Backup.cmd uses MediaGUID.cmd to determine the tape’s logical-media GUID.
If all three operations succeed, Backup.cmd defines the COMMAND variable, which contains the
NTBackup command, as callout B in Listing 1 shows. The script records the NTBackup command in
the script’s log file named backup.log, then executes the command. After NTBackup completes the
backup, Backup.cmd writes NTBackup’s exit code to backup.log. Finally, Backup.cmd checks the
EJECT, MAILLOG, and PRINTLOG variables, as the code at callout C in Listing 1 shows. If any of
these variables are set to YES, Backup.cmd calls the appropriate scripts.
If any of the three operations fail (i.e., the script can’t detect the tape’s physical-media GUID,
media label, or logical-media GUID), Backup.cmd calls the :DIE subroutine, which callout D in Listing
1 shows. This code displays an error message, creates the error.log file in the NTBackup data
directory, and writes the same error message to error.log and backup.log. Depending on the values
set for the MAILLOG and PRINTLOG variables, the code might also email and/or print the most
recent NTBackup log file, which contains the error message.

Step 3. Prepare the Environment and Scripts


You can find the 14 scripts on the Windows IT Pro Web site. Go to http://www.windowsitpro.com,
enter InstantDoc ID 44990 in the InstantDoc ID text box, then click the 44990.zip hotlink. After you
create a directory to hold the scripts (e.g., C:\NTBackup), extract them into that directory. When you
schedule a job that uses the scripts, make sure you specify this directory as the Start in directory for
the scheduled task.
The 44990.zip file includes the sleep.zip file, which contains sleep.exe. Unzip this file and extract
its contents into the directory you just created. If you want to email the most recent NTBackup log
file to your administrator after the backup is complete, you need to obtain blat.exe. You can
download blat.exe from http://www.blat.net. This utility requires no installation, registry entries, or
support files. Just place it in the same directory as the scripts.
After the scripts and utilities are in place, you need to customize SetupVars.cmd and Backup.cmd:

Customizing SetupVars.cmd. Open SetupVars.cmd, which Listing 2 shows, in Notepad or another


text editor. In the code at callout A in Listing 2, you need to define at least the RSM_LIBRARY and
RSM_REFRESH variables.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 3 Want a Flexible Automated Backup Solution? 19

Listing 2: SetupVars.cmd
BEGIN COMMENT
A :: Define the computer-specific values. Set RSM_LIBRARY to the tape drive’s
:: name. Set RSM_REFRESH to the number of seconds to wait for the refresh.
:: If you plan to use TapePrep.cmd, set RSM_POOL to the tape drive’s media pool.
END COMMENT
Set RSM_LIBRARY=
Set RSM_REFRESH=
Set RSM_POOL=

BEGIN COMMENT
B :: Define the post-backup behavior. To enable the behavior, change NO to YES.
END COMMENT
Set EJECT=NO
Set MAILLOG=NO
Set PRINTLOG=NO

BEGIN COMMENT
C :: Define the email settings if you set MAILLOG to YES. Set SMTPSERVER
:: to SMTP server’s DNS name or IP address. Set SENDER to the sender’s
:: email address. Set RECIPIENTS to the recipients’ email addresses.
END COMMENT
Set SMTPSERVER=
Set SENDER=
Set RECIPIENTS=

BEGIN COMMENT
:: Creates a date and time stamp in the format: dow yyyy-mm-dd hh:mm.
END COMMENT
Set DTSTAMP=%DATE: 0,3% %DATE: 10,4%-%DATE: 4,2%-%
DATE: 7,2% %TIME: 0,5%

BEGIN COMMENT
:: Do not modify: Error constants
END COMMENT
Set ERR_NOT_FOUND=2
Set ERR_LIBRARY=6
Set ERR_PHYSICAL_MEDIA=7
Set ERR_PARTITION=8
Set ERR_LOGICAL_MEDIA=9

BEGIN COMMENT
Do not modify: Path to NTBackup’s Data directory.
END COMMENT
Set NTBACKUP_DATA=%USERPROFILE%\Local Settings\Application
Data\Microsoft\Windows NT\NTBackup\Data

If you plan to use TapePrep.cmd, you must also define the RSM_POOL variable.
RSM_LIBRARY defines the friendly name for your tape drive. If you’re unsure of this name, you
can open a command-shell window and run the command
RSMView library

to obtain the list of libraries on your system. Then simply copy the friendly name (but not the GUID)
into SetupVars.cmd. You don’t need to enclose the library’s friendly name in quotes if it contains
spaces. To check whether you have specified the library name correctly, type the following command
at a command prompt
Library

You should see the library’s friendly name followed by its GUID. You must make sure that the
library’s friendly name doesn’t contain any of the following reserved shell characters:
( ) < > ^ & |

Brought to you by Neverfail and Windows IT Pro eBooks


20 HA Solutions for Windows, SQL, and Exchange Servers

If your library’s friendly name contains any of these characters, you need to remove them from
the library’s name. To do so, first open the Removable Storage console by selecting Start, Run,
ntmsmgr.msc, OK. In the left pane, expand Libraries (in Windows 2003 and XP) or Physical Locations
(in Win2K). Right-click the tape drive in the left pane and choose Properties. Remove any of the
offending characters from the Name text box, and click OK.
You need to set RSM_REFRESH to the number of seconds that Refresh.cmd should pause after
performing a device refresh. This value depends on your hardware; set it to the number of seconds it
takes to insert and remove a tape. A good starting point is 30 seconds.
If you plan to use TapePrep.cmd, set RSM_POOL to the media pool that matches the type of
media used by your tape drive. For example, if your tape drive uses Travan tapes, specify the Travan
media pool:
Set RSM_POOL=Travan

If the value contains spaces (e.g., 4mm DDS), you don’t need to enclose it in quotes.
The code at callout B in Listing 2 defines the post-backup behavior. Setting the EJECT variable to
YES prompts Eject.cmd to eject the tape. Setting MAILLOG to YES causes MailLog.cmd to email the
NTBackup log file to the specified recipient. Setting PRINTLOG to YES prompts PrintLog.cmd to print
the NTBackup log file on the default printer.
If you set MAILLOG to YES, you need to set the variables in the code at callout C in Listing 2.
(If you set MAILLOG to NO, you don’t need to do anything with this code.) Set the SMTPSERVER
variable to the DNS name or IP address of an SMTP server on your network. (In this example, no
authentication is used. However, blat.exe supports SMTP authentication. For information about how
to incorporate authentication, see the blat.exe documentation.) Set SENDER to the email address you
want to appear in the message’s From field. Set RECIPIENTS to the email address of the person you
want to receive the message. You can specify more than one recipient. Simply separate the addresses
with a comma; no spaces are allowed.
You don’t need to do anything with the remaining code, which defines the DTSTAMP variable,
ERR constants, and NTBACKUP_DATAPATH variable. Backup.cmd uses DTSTAMP to create a date
and time stamp. The ERR constants define the exit codes for the scripts. And the NTBACKUP_
DATAPATH variable points to NTBackup’s data folder, which stores the backup selection (.bks) files
and log files.

Customizing Backup.cmd. Open Backup.cmd in Notepad or another text editor. In the code at
callout A in Listing 1, set the BACKUP variable to the directory or directories you want to back up.
Separate directory names with spaces. If a directory contains spaces, enclose it in quotes. Alterna-
tively, you can specify a .bks file. Enter the full pathname to the file, prefix it with the @ symbol, and
enclose it in quotes if it contains spaces. You can use the NTBACKUP_DATAPATH variable as well.
For example, if your filename is Weekly full.bks, you’d set the BACKUP variable to
“@%NTBACKUP_DATAPATH%\Weekly full.bks”

Set the JOBNAME variable, which is used with NTBackup’s /j option, to the backup job’s name.
This name will appear in NTBackup’s Backup Reports window. Set SETDESC, which is used with
NTBackup’s /d option, to the backup job’s description. This description will appear in the log file and

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 3 Want a Flexible Automated Backup Solution? 21

in the Description column on the Restore tab in NTBackup’s UI. The optional DTSTAMP variable that
appears at the end of the description will let you see at a glance when the backup job ran.
You can customize the JOBINFO variable, but it’s not required. This variable specifies the text
used in error messages and in the email subject line. Note that it references the JOBNAME and
COMPUTERNAME variables so you’ll know at a glance the job and computer that generated the
message.
You use the OPTIONS variable to define NTBackup’s command-line options. For example,
suppose you want NTBackup to append the backup to whatever tape happens to be inserted. You
simply add the /a option to the OPTIONS variable. (You also need to remove /n “%MEDIANAME%”
in the Set COMMAND= line because you can’t use /n with /a.)
You can specify any of NTBackup’s command-line options, except /j, /d, /n, /g, /f, /t, /p, and
/um. As I mentioned previously, /j and /d are already defined in JOBNAME and SETDESC,
respectively. The script automatically determines the tape’s media label (/n) and GUID (/g). The
other options (/f, /t, /p, and /um) shouldn’t be used because they create conflicts.
The LOGFILE variable exists for troubleshooting purposes. Each script’s output is redirected to
this file, along with the tape’s name and logical-media GUID, the NTBackup command line, and that
command’s exit code. If you leave the LOGFILE variable set to %~dpn0.log, the file will be created in
the same directory as Backup.cmd and will have the filename Backup.log.

Ready, Test, Go
After you’ve completed these three steps, the backup scripts are ready to test in a nonproduction
environment. Even though I tested them with Windows 2003, XP, and Win2K, you should always test
any new scripts in a controlled lab environment before deploying them. After you’ve successfully
tested them, you’ll have a flexible backup solution to use.

Brought to you by Neverfail and Windows IT Pro eBooks


22 HA Solutions for Windows, SQL, and Exchange Servers

Chapter 4:

After the Crash


Sameer Dandage

If your system must be highly synchronized but can tolerate a certain amount of latency as well as
some data loss when conflicts arise, SQL Server 2000’s transactional replication with queued updates
(TRQU) can be a useful way to replicate data to multiple, always-on database servers. In this chapter,
I examine the intricacies of backing up and restoring TRQU setups. The following steps, which
Figure 1 shows, summarize the TRQU process.

Figure 1
The TRQU process

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 4 After the Crash 23

For the transactions that follow a forward path from Publisher to Subscriber:
1. SQL Server applies transactions at the Publisher server to the Publisher database log.
2. SQL Server specially marks for replication the transactions that involve the replicated articles
(tables).
3. The Log Reader agent running at the Distributor server periodically reads the Publisher database
log, picks up the transactions marked for replication, then applies them to a table in the
Distributor database on the Distributor server.
4. The Distribution Agent then runs on the Distributor server (for a push subscription) and applies
those transactions to the Subscriber database on the Subscriber server.

For the transactions that follow a reverse path from Subscriber to Publisher:
1. SQL Server records external transactions made by end users or end-user applications at the
Subscriber database in the queue table in the Subscriber database on the Subscriber server.
2. The Queue Reader agent runs periodically on the Distributor server to pick up the transactions
from the queue table on the Subscriber server, checks for and reconciles data conflicts, then
applies the transactions to the Publisher database on the Publisher server.

In a basic transactional-replication scenario that uses SQL Server’s default settings, the Log Reader
agent reads a transaction that SQL Server has marked for replication in the Publisher database log.
Then, after moving the transaction to the Distributor database, the Log Reader agent unmarks the
transaction. Unmarking the transaction is a signal for SQL Server to remove the transaction from the
log after the log backup’s next run. This backup process is consistent and highly effective-until
disaster strikes. Then, the quirks inherent in TRQU implementations cause anomalies that require
creative problem-solving.

Publisher Database Crash


In most setups, the Log Reader agent runs in continuous mode and reads marked transactions
from the log more frequently than SQL Server backs up the log. For example, let’s say that two
transactions that need replicating take place at the Publisher. During the next Log Reader agent run,
the agent picks up these transactions quickly from the log, moves them to the Distributor, then
unmarks them.
If the Log Reader agent run takes place at time A, these transactions typically remain in the
Publisher’s active database log until the next log backup takes place (time B). At time B, SQL Server
backs up and truncates the transactions from the active Publisher log.

Problem. What happens when the Publisher database crashes between times A and B, and SQL
Server can’t back up the log? In this situation, SQL Server has copied the transactions to the
Distributor and perhaps to the Subscriber, but not to the Publisher database’s log backup. Therefore,
if you restore the Publisher to the last good log backup, the Publisher database doesn’t contain the
transactions, even though SQL Server has copied these transactions to the Distributor and perhaps to
the Subscriber-or will copy them to the Subscriber during the next Distribution agent run. This type
of failure compromises the system’s data consistency, and, depending on the nature of the production
data, serious problems can arise.

Brought to you by Neverfail and Windows IT Pro eBooks


24 HA Solutions for Windows, SQL, and Exchange Servers

Solution. SQL Server 7.0 and earlier releases offer no easy way around this problem. DBAs who
work with these releases have to manage the problem more through access and timing control than
through an integrated SQL Server mechanism. The good news is that SQL Server 2000 provides a
feature for resolving this inconsistency easily. You merely set an option called sync with backup by
running the following T-SQL command in the master database on the Publisher server:
EXEC master..sp_replicationdboption @PubDBName ,’sync with backup’, ‘true’

This setting tells the Log Reader agent to read only those marked transactions in the log that SQL
Server has backed up. After the log backup is complete, SQL Server updates the log record by noting
that the transactions have been backed up. Then, during its next run, the Log Reader agent reads the
transactions and unmarks the transactions that were marked to denote they needed replication. The
next log-truncation process removes those transactions from the Publisher database’s active log. This
small, clever feature maintains data consistency in the Publisher backup.

Distributor Database Crash


Now, let’s say that the Log Reader agent has read the transactions and SQL Server has backed them
up on the Publisher and deleted them from the Publisher log. At this stage, the transactions are on
the Distributor in the Distributor database, and the Distribution agent will apply them to the
Subscriber during the next Distribution agent run.

Problem. What happens when the Distributor database crashes before the Distribution agent can run
and SQL Server hasn’t backed up those transactions on the Distributor? Data inconsistency again
exists-the Publisher has copies of the transactions, but the Distribution agent hasn’t copied these
transactions to the Subscriber. Restoring the Distribution database to the last good log backup won’t
retrieve those transactions because SQL Server never backed them up.

Solution. Using sync with backup again is the solution. This time, however, you need to set the
option for the database on the Distributor server. When you set this option, SQL Server can’t delete
the transactions from the Publisher database log until it backs them up at the Distributor server as
part of the Distributor database’s log backup.
At this stage, the Log Reader agent has read the transactions, the Publisher log backup has
backed up the transactions on the Publisher server, but SQL Server hasn’t backed up the Distributor
database log. Under these circumstances, SQL Server doesn’t delete the transactions in the active
Publisher database log until it has backed up the transactions on the Distributor. After the Distributor
database’s log backup is complete, SQL Server marks the transactions in the Publisher’s active log.
Then, the transactions are deleted from the Publisher database’s active log during the next truncation
process. Therefore, if the Distribution database crashes before SQL Server has backed up the
transactions on the Distributor, data-consistency problems don’t occur. After the Distributor database
restore, the Log Reader agent picks up the transactions from the active Publisher log based on the
transactionID.

Subscriber Database Crash


Now you know how you can prevent disaster when the Publisher and Distributor databases crash.
However, the possibility of disaster doesn’t end with those two scenarios.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 4 After the Crash 25

Problem. Sometimes in a TRQU project, updates occur at the Subscriber as well as at the Publisher.
What do you do when the Subscriber database crashes?

Solution. As long as the replicated transactions are stored in the Distributor database, you haven’t
lost anything. You can use the @min_distretention option of the sp_adddistributiondb stored
procedure that sets the minimum transaction retention period for the Distributor database. The
distribution-cleanup process ensures that the process doesn’t delete transactions from the msrepl_
commands table in the Distributor database that occur within that minimum retention period. So, if
the Subscriber database crashes, you can restore it to the last good backup. And, assuming that your
minimum transaction-retention period is more than the time elapsed since the last good backup of
the Subscriber database, the replicated transactions are safe. The Distribution agent determines the
state of the Subscriber database and starts reapplying transactions from the point to which the
database has been restored. Therefore, you should consider using options such as sync with backup
and minimum history retention when you estimate the log sizes and size of the database (e.g., for the
msrepl_commands table at the Distributor, the Queue table at the Subscriber, or conflict tables at the
Publisher and the Subscriber).

Queue Complications
The sync with backup solutions to the Publisher and Distributor database crashes work when the
updates take place only at the Publisher. However, as I noted earlier, TRQU updates can take place
simultaneously in more than one database. What happens in a TRQU scenario in which updates are
applied simultaneously at the Subscriber and at the Publisher when the sync with backup option is
on? Visualize the following scenario: An update takes place at the Subscriber and this transaction is
recorded in the queue table on the Subscriber. The Queue Reader agent reads this transaction and
applies it to the Publisher (at which stage SQL Server makes a log entry in the Publisher log). The
Queue Reader agent then deletes the transaction from the queue table on the Subscriber, but before
you can back up the Publisher log, the Publisher database crashes, and you can’t back up the
transaction log.

Problem. Unlike with the forward path (from Publisher to Subscriber), in which the transactions are
preserved in the distribution database, no transactions are preserved in the reverse direction. This loss
of data occurs because the Queue Reader deletes the transactions from the queue table after applying
them to the Publisher.

Solution. Typically, if you want to minimize the loss of transactions in such a disaster scenario, you
undo replication, get the failed SQL Server database up and running, then set up replication again
with the roles reversed. The original Subscriber then becomes the Publisher, and the original
Publisher becomes the new Subscriber.
However, even when the Publisher database has crashed, you can salvage your data if you can
back up the Publisher log. To accomplish this backup, you need to set up the TRQU to allow the
Log Reader to read transactions from the available Publisher database log (after the log backup), stop
all the replication agents, then restore the Publisher database to the last backed-up log. After you
restart all the agents, you can resume replication.

Brought to you by Neverfail and Windows IT Pro eBooks


26 HA Solutions for Windows, SQL, and Exchange Servers

Prepare, Rehearse, and Shine


The scenarios for backing up and restoring replication setups that I’ve described in this chapter are
only a few of the situations you might encounter. Remember that your work isn’t done after you set
up TRQU and get it working. You must also prepare thoroughly for disasters and rehearse the
backup and restore plans so that if a production disaster does strike, your data is safe and quickly
recoverable.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 5 SQL Server on a SAN 27

Chapter 5:

SQL Server on a SAN


Mel Shum

As a DBA, one of your many tasks is to manage your SQL Server databases’ ever-expanding storage
requirements. How often do you find yourself adding more disk, trying to accurately size a database,
or wishing you could more efficiently use your existing disk capacity? Storing database data on a SAN
can make such tasks much easier and can also improve disk performance and availability and shorten
backup and restore times. Start your search for a SAN here, as you learn the basics of SAN
technology and the benefits of using a SAN to store SQL Server databases. And the sidebar “Selecting
a Storage Array for a SAN” covers several features you’ll want to consider when selecting a storage
array for your SAN.

SAN Fundamentals
A SAN is basically a network of switches that connect servers with storage arrays. SAN topology is
similar to how Ethernet switches are interconnected, as Figure 1 shows. A SAN’s physical layer
comprises a network of either Fibre Channel or Ethernet switches. Fibre Channel switches connect to
host bus adapter (HBA) cards in the server and storage array. Ethernet switches connect to Ethernet
NICs in the servers and storage array.

Figure 1
SAN Topology

Brought to you by Neverfail and Windows IT Pro eBooks


28 HA Solutions for Windows, SQL, and Exchange Servers

A storage array is an external disk subsystem that provides external storage for one or more
servers. Storage arrays are available in a range of prices and capabilities. On the low end, an array
consists simply of a group of disks in an enclosure connected by either a physical SCSI cable or Fibre
Channel Arbitrated Loop (FC-AL). This type of plain-vanilla array is also commonly called Just a
Bunch of Disks (JBOD). In high-end arrays, storage vendors provide features such as improved
availability and performance, data snapshots, data mirroring within the storage array and across
storage arrays, and the ability to allocate storage to a server outside the physical disk boundaries that
support the storage.
Two types of SANs exist: Fibre Channel and iSCSI. Fibre Channel SANs require an HBA in the
server to connect it to the Fibre Channel switch. The HBA is analogous to a SCSI adapter, which lets
the server connect to a chain of disks externally and lets the server access those disks via the SCSI
protocol. The HBA lets a server access a single SCSI chain of disks as well as any disk on any storage
array connected to the SAN via SCSI.
iSCSI SANs use Ethernet switches and adapters to communicate between servers and storage
arrays via the iSCSI protocol on a TCP/IP network. Typically, you’d use a Gigabit Ethernet switch and
adapter, although 10Gb Ethernet switches and adapters are becoming more popular in Windows
server environments.
On a SAN, a server is a storage client to a storage array, aka the storage server.The server that
acts as the primary consumer of disk space is called the initiator, and the storage server, which
provides the disk space, is called the target.
The disks that the storage arrays provide on the SAN are called LUNs and appear to a Windows
server on the network as local hard drives. Storage-array vendors use a variety of methods to make
multiple hard drives appear local to the storage array and to represent a LUN to a Windows server by
using parts of multiple hard drives. Vendors also use different RAID schemes to improve performance
and availability for data on the LUN. Whether the SAN uses Fibre Channel or Ethernet switches,
ultimately what appears from the Windows server through the Microsoft Management Console (MMC)
Disk Management snap-in are direct-attached disks, no different from those physically located within
the server itself. In addition, most arrays have some type of RAID protection, so that the storage that
represents a given LUN is distributed across multiple hard drives that are internal to the storage array.

SAN Security
SAN architecture provides two measures for securing access to LUNs on a SAN.The first is a
switch-based security measure, called a zone. A zone, which is analogous to a Virtual LAN (VLAN),
restricts access by granting only a limited number of ports on several hosts an access path to several,
but not all, storage arrays on the SAN.
The second security measure is storage-array-based; a storage array can use LUN masking to
restrict access. Depending on the vendor, this security feature comes free of charge with the storage
array or is priced separately as a licensed product. LUN masking can be configured either by the
administrator or by the storage-array vendor for a fee. When masking is configured, the array grants
only explicitly named ports of named hosts an access path to the specified LUNs. LUN masking func-
tions similarly to ACLs on Common Internet File System (CIFS) shares in a Windows environment.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 5 SQL Server on a SAN 29

SAN Benefits
Now that you have a grasp of what a SAN is, you’re probably wondering how a SAN could benefit
your SQL Server environment. To address this question, we’ll first examine problems inherent in local
DAS, then explore how using a SAN avoids these problems.

Performance and availability. As part of the typical process of designing a database that will
reside on a local disk, or DAS, you’d determine how the disks on which the database will be stored
are attached (i.e., which disks are attached to which SCSI adapter).You want to carefully organize the
database files to minimize contention for disk access—for example, between a table and indexes on
the table, two tables that are frequently joined together, or data and log files.To minimize contention
(i.e., disk I/O operations), you’d try to ensure that the two contending objects are separated not only
on different disks but also across SCSI adapters.
Another disk-related issue that you must consider in designing a database is availability.You need
to use some type of disk redundancy to guard against disk failures. Typically, you’d use either RAID
1 ( mirroring) or RAID 5 to provide redundancy and thus, improved availability.
After you create the RAID devices by using Windows’ Disk Management, you might lay out the
database across these multiple RAID storage structures. When allocating such structures, you have to
decide how to size them. Determining the amount of storage each server needs is like estimating
your taxes: If you overestimate or underestimate taxes or storage needs, you’ll be penalized either
way. If you overestimate your storage and buy too much, you’ll have overspent on storage. If you
underestimate your storage needs, you’ll soon be scrambling to find ways to alleviate your shortages.
A SAN addresses the issues of contention, availability, and capacity. On a SAN, the storage array
typically pools together multiple disks and creates LUNs that reside across all disks in the pool.
Different disks in the pool can come from different adapters on the storage array, so that traffic to
and from the pool is automatically distributed. Because the storage array spreads the LUNs across
multiple disks and adapters, the Windows server that’s attached to the SAN sees only a single disk in
Disk Management.You can use just that one disk and not have to worry about performance and
availability related to the disk, assuming that your storage or network administrator has properly
configured the SAN.
How complex or simple a storage array is to configure depends on the vendor’s implementation.
I recommend that you meet with the IT person responsible for configuring your storage and ask
him or her to explain your storage array’s structure. Also, determine your storage requirements ahead
of time and give them to this person. In addition to storage size, note your requirements for
performance (e.g., peak throughput—40Mbps); availability (e.g., 99.999 percent availability); backup
and recovery (e.g., hourly snapshot backups take 1 minute; restores take 10 minutes); and disaster
recovery, based on metrics for recovery time objective (RTO)—the time it takes to restore your
database to an operational state after a disaster has occurred—and recovery point objective (RPO)—
how recent the data is that’s used for a restore. Using these metrics to define your requirements will
help your storage administrator better understand your database-storage needs.
Some vendors’ storage arrays let you dynamically expand a LUN that you created within the disk
pool without incurring any downtime to the SQL Server database whose files reside on that LUN.This
feature lets DBAs estimate their disk-space requirements more conservatively and add storage capacity
without downtime.

Brought to you by Neverfail and Windows IT Pro eBooks


30 HA Solutions for Windows, SQL, and Exchange Servers

Backup control. As a database grows, so does the amount of time needed to perform database
backups. In turn, a longer backup requires a longer backup window. Partial backups—such as
database-log-backups— take less time but require more time to restore. Increasingly, upper
management is mandating smaller backup windows and shorter restore times for essential
applications, many of which access SQL Server databases. SANs can help decrease backup windows
and restore times. Some storage arrays can continuously capture database snapshots (i.e., point-in-
time copies of data), which are faster to back up and restore than traditional database-backup
methods. The snapshot doesn’t contain any actual copied data; instead, it contains duplicate pointers
to the original data as it existed at the moment the snapshot was created.
To back up SQL Server database data by using snapshots, you’d typically want to put your
database in a “ready” state, more commonly called a hot-backup state, for a few moments to perform
the snapshot. If you didn’t put your database in a hot-backup state, the snapshot could take a
point-in-time copy of your database before SQL Server has finished making a consistent database
write. Storage-array vendors often use Microsoft’s SQL Server Virtual Backup Device Interface (VDI)
API to enable their software to put the database in a hot-backup state.This lets the system copy the
point-in-time snapshot image to separate backup media without causing a database outage.
Snapshots are minimally intrusive, so you can use them frequently without affecting database
performance. Restoring data from a snapshot takes only a few seconds. By using a SAN-connected
storage array along with a snapshot capability, DBAs can minimize backup windows and restore
times, in part because snapshot images are maintained on distributed disks in the array, instead of on
one local disk.

Reduced risks for database updates. Changes to a database, such as SQL Server or application
upgrades or patches, can be risky, especially if the changes might cause database outages or worse,
database corruption. To test changes without putting the production database at risk, you’d need to
set aside an amount of storage equivalent to the size of the production database. On this free storage,
you’d restore the last recent backup of that database (typically 1 week old).You’d spend a few hours
(maybe even days) restoring the database from tape to disk, applying the changes, then testing to see
whether the changes were successfully applied and whether they adversely affected the database.
After you verified that the changes were successfully implemented, you’d apply them to the
production database.
Some vendors’ SAN storage arrays let you quickly clone your database data for testing purposes.
Cloning the data takes only a few seconds versus hours to restore it from tape. The added benefit of
cloning is reduced disk utilization. Some cloning technology lets you take a read-only database
snapshot and turn it into a writeable clone. For testing purposes, the clone consumes far less disk
storage than a full backup of a database because only modified blocks of data are copied to the
clone database.

When Using DAS Makes Sense


Storing database data in a SAN gives you features not available with DAS, such as local and remote
mirroring, data cloning, the ability to share data across multiple hosts, and the ability to capture data
snapshots. However, if you don’t need these features, storing your SQL Server databases on DAS
might make more sense. A SAN environment consists of multiple SAN clients with multiple HBAs on

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 5 SQL Server on a SAN 31

SAN switches connected to storage arrays. If the SAN wasn’t properly designed and configured (i.e.,
to provide redundancy), the storage array or a component on the SAN could fail, so that servers on
the SAN couldn’t access data on the storage array.
To enable you to troubleshoot storage problems, you’ll need to make sure that SQL Server bina-
ries and message-log files stay on the local disk. Storing the message log and binaries on a disk other
than the local disk puts the database in a Catch-22 situation, in which a database-access failure
caused by a storage-connection failure can’t be logged because logging occurs only for the device on
which the logs and binaries are stored.

Selecting a Storage Array for a SAN


Storage arrays are available in a wide spectrum of capacities and capabilities, and sorting
through the options can be confusing. These guidelines can help you narrow down the type of
storage array you need to house your SQL Server databases.
Snapshot methodologies. Snapshots work about the same in all storage arrays.The idea is to
freeze all the blocks of data in a database and the structure of the data being captured at a point in
time. Vendors use one of two basic methodologies for handling snapshots after data has been
modified.The first methodology, which Figure A shows, is to leave the snapshot block alone and use
a free block to write the modified block information. Of the two approaches, this is the more
efficient because it requires only one block I/O operation to write the new block and one update to
a pointer.
Figure A
First snapshot-updating methodology

The second methodology is to copy the snapshot block to a free block, then overwrite the block
that was just copied. This approach, which Figure B shows, is often called copy-on-write. Copy-on-
write requires more data movement and overhead on the storage array’s part than the first approach.
In Figure B, block D is moved from the current block to a new block so that the new contents of D
can be written to D’s old location. Doing so requires three block I/Os and an update to a link,

Brought to you by Neverfail and Windows IT Pro eBooks


32 HA Solutions for Windows, SQL, and Exchange Servers

whereas the first approach requires only one block I/O.This difference becomes significant for disk
performance as large numbers of blocks are updated.
Figure B
Second snapshot-updating methodology

Support for Fibre Channel and iSCSI on the same array. Consider buying a storage array that
supports both Fibre Channel and iSCSI, so that you have the flexibility to switch from one to the
other or implement both. (For example, you might want to use an iSCSI SAN for testing and
development and use a Fibre Channel SAN for production.)
Ability to create, grow, and delete LUNs dynamically. Being able to create, grow, and delete
LUNs without bringing a database down is a major benefit of putting the database on a SAN. If you
need this capability, consider storage arrays that provide it.
Integration of snapshot backups with SQL Server. The process of taking a snapshot copy of your
SQL Server database needs to be coordinated with your database and NTFS. Storage-array vendors
can use Microsoft’s SQL Server Virtual Backup Device Interface (VDI) API to accomplish this
coordination. If the snapshot process isn’t synchronized with NTFS and the database, the created
snapshot might not be in a consistent state because either NTFS or the database might not have
completely flushed pending writes from memory to the LUN.
A uniform storage OS as you scale up. You’d most likely want to start with a small storage array
to test and validate the SAN’s benefits before deploying it enterprise-wide. Look for a storage array
that lets you grow without having to do a “forklift” upgrade or having to learn a new storage OS.
Maintaining a consistent OS lets you upgrade your storage array as your needs grow, with a
minimum of database downtime.
A transport mechanism to mirror data over the WAN to a recovery site. The storage array
should provide a uniform transport method for sending mirrored data across the WAN to another
storage array for disaster recovery purposes.
Ability to instantaneously create a writeable copy of your database. Look for storage arrays that
let you instantaneously-create a writeable copy (i.e., clone) of your database for testing upgrades and
large data loads without affecting the production-database. This feature could reduce outages and
corruption of the production database, giving DBAs a tool to test major changes without endangering
data.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 5 SQL Server on a SAN 33

I’m Ready for a SAN. Now What?


If your organization doesn’t already have a Fibre Channel SAN switching network in place, iSCSI will
most likely give you a greater ROI and minimize your equipment investment. For a Fibre Channel
SAN, you need to buy a storage array, Fibre Channel SAN switches,and HBAs.For an iSCSI SAN, you
need to buy a storage array, but you can use your existing Ethernet switches and Gigabit Ethernet
adapters. To include your Windows servers in the iSCSI SAN, you need only download and install an
iSCSI driver for your particular OS. (You can download the latest version of Microsoft’s iSCSI driver,
Microsoft iSCSI Software Initiator, at
http://www.microsoft.com/downloads/details.aspx?familyid=12cb3c1a-15d6-4585-b385-
befd1319f825&displaylang=en.) Carving up the storage array and presenting it to your Windows
server could get complicated, depending on the storage vendor. As I mentioned earlier, you should
discuss your storage requirements with your storage administrator.
Most modern storage arrays let you access LUNs on the same storage array via either Fibre
Channel or iSCSI. I’ve found that many IT environments don’t take full advantage of their SAN’s fea-
tures. If your organization already uses a Fibre Channel SAN switching network, you can try out
some storage-array features such as cloning and snapshots in a development or test environment. If
your organization doesn’t have a SAN yet, you can still try some of these features relatively inexpen-
sively by setting up an iSCSI SAN.

Step Up to a SAN
As you can see, housing databases on a SAN can benefit DBAs in various ways. SANs can reduce the
pain of sizing storage requirements for databases, enhance overall storage throughput, simplify
storage performance tuning, and improve availability. Using a SAN can also decrease backup and
restore windows and enables quicker and easier testing cycles and reduced overhead in test storage.
The availability of iSCSI removes the cost barriers that have until now inhibited some users from
investigating SANs. Now’s the time to check out SAN technology and see whether it can improve
your database-storage environment.

Brought to you by Neverfail and Windows IT Pro eBooks


34 HA Solutions for Windows, SQL, and Exchange Servers

Chapter 6:

SQL Server HA Short Takes


Running SQL Server on Raid
Michelle A. Poolet

I’m often asked about using a redundant array of inexpensive disks (RAID) for fault tolerance. Should
SQL Server be installed on a RAID device? The answer is yes, if you can afford it. RAID is the easiest
and best way to implement fault tolerance, and you definitely want your production databases
housed in a fault-tolerant environment. RAID server systems are more expensive than ordinary s
ingle-disk servers because of the additional hardware and software you use to implement the RAID
configurations. Several types of RAID are commonly available, and each has its own specific use.
RAID 0 uses striping, a method that creates a disk partition that spans multiple hard drives to
take advantage of the many operational read/write heads associated with multiple spindles (similar in
concept to Windows NT striping). RAID 0 is the fastest type of RAID, but unlike most RAID
implementations, it doesn’t provide fault tolerance. If one of the drives in a RAID 0 configuration fails,
all the data is lost. Don’t use RAID 0 if you value your data.
RAID 5 is the most common way to implement fault-tolerance. RAID 5’s read data transaction
rate is high, and its write data transaction rate is also good compared to other configurations. RAID 5
also offers a good aggregate transfer rate. A typical RAID 5 configuration contains three or more hard
drives. RAID 5 divides up the data and writes it in chunks spread across all the disks in the array.
Redundancy is provided by the addition of data parity information, which the RAID controller
calculates from the data, and which is written across all the disks in the array, interleaved with the
data. The parity information enables the RAID controller to reconstruct data if one of the disks and
the data stored on it is lost or corrupted. RAID 5 is the most cost-effective way to implement
fault-tolerance. Store SQL Server system and user data files on a RAID 5 device.
RAID 1 uses mirroring, a method in which each drive has a mirror copy on another drive.
Mirroring is the most fault-tolerant RAID scheme, but it’s also the most expensive because of the
additional hardware and software you need to support mirroring. SQL Server stores data sequentially
on the transaction logs and in TempDB, which makes these essential parts of your database well
suited for RAID 1 protection. Put the transaction log and TempDB on a RAID 1 device at least, even
if you can’t afford RAID for any other parts of your database.
RAID 10 is a combination of RAID 1 and RAID 0 that uses mirroring and striping. It’s expensive,
but it’s fast and provides the best redundancy and performance. RAID 10 involves mirroring two or
more RAID 0 drives. If you can afford it, put the transaction log and TempDB on a RAID 10 device
rather than a RAID 1 device for the extra protection that it offers. For a good comparison of RAID
types, see Advanced Computer and Network Corporation’s RAID tutorial at
http://www.acnc.com/raid.html.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 6 SQL Server HA Short Takes 35

High Availability in Analysis Services


Carl Rabeler

A typical approach to high availability in Analysis Services 2000 is to use Windows Network Load
Balancing (NLB) to distribute user queries across multiple Analysis Services instances on disparate
machines while also increasing availability. You keep the databases on these machines in sync with
file-based backup and restore (required for large databases due to the 2GB .cab file size limitation)
from a secondary server on which cube and dimension processing is performed. For more
information, read the Microsoft white paper “Creating Large-Scale, Highly Available OLAP Sites” at
http://www.microsoft.com/sql/evaluation/bi/creatingolapsites.asp.
Even though Analysis Services 2000 isn’t cluster-aware, you can cluster an Analysis Services 2000
database. SQL Server 2005 Analysis Services is cluster-aware and fully supports active-active clustering,
which means you can create a failover cluster to ensure high availability. In addition, Analysis
Services 2005 has a server-synchronization feature, which lets you incrementally synchronize metadata
and data changes between a source database and a destination (production) database while users
continue to query the destination database. Unlike Analysis Services 2000, with Analysis Services 2005
you can’t simply copy the data files from the data folder of an Analysis Services 2005 instance on
one machine to the data folder on another machine. These files are encrypted by default (using the
machine name as part of the key) unless you modify the .config file to change the RequiredProtection-
Level setting from 1 to 0. Finally, in Analysis Services 2005, you can back up any size database (the
2GB .cab file limit has been removed), which means you can use back up and restore to move any
size database between servers.

The High Availability Puzzle


Michael Otey

Of all a DBA’s missions, none is more important than ensuring that vital business services are
available to end users. All of your high-end scalability hardware and modern .NET coding techniques
will make little difference if users can’t access data. Unplanned downtime for an application or the
database server can cost an organization dearly in money and reputation. Outages for large online
retailers or financial institutions can cost millions of dollars per hour, and when users can’t access a
site or its vital applications, the organization loses face and customer goodwill.
Microsoft and other enterprise database vendors have devised several high-availability
technologies. For example, Microsoft Clustering Services lets one or more cluster nodes assume the
work of any failed nodes. Log shipping and replication help organizations protect against both server
and site failure by duplicating a database on a remote server. And traditional backup-and-restore
technology protects against server and site failure as well as application-data corruption by
periodically saving a database’s data and log files so you can rebuild the database to a specified date
and time. Although these technologies can help you create a highly available environment, by
themselves they can go only so far. Technology alone can’t address two critical pieces of the complex
high-availability puzzle: the people and processes that touch your system.

Brought to you by Neverfail and Windows IT Pro eBooks


36 HA Solutions for Windows, SQL, and Exchange Servers

Server and site failure can produce downtime, but they’re relatively rare compared to human
error. The mean time between failures (MTBF) for servers is high, and today’s hardware, although
not perfect, is usually reliable, making server failures uncommon. In contrast, users, operators,
programmers, and administrators interact with your systems virtually all the time, and the high volume
gives more chances for problems to arise. Thus, the ability to quickly and efficiently recover from
human errors is essential for a highly available system. An operator error can take down a database
or server in a few seconds, but recovery could take hours. However, with proper planning, you can
reduce downtime due to human error by creating adequate application documentation and by
ensuring that personnel receive proper training.
Processes are also critical for a highly available environment. Standardized operating procedures
can help reduce unnecessary downtime and enable quicker recovery from planned and unplanned
downtime. You need written procedures for performing routine operational tasks as well as
documentation that covers the steps necessary to recover from various types of disasters. In addition,
the DBA and operations staff should practice these recovery plans to verify their accuracy and
effectiveness. Another process-related factor that can contribute to high availability is standardizing
hardware and software configurations. Standardized hardware components simplify implementing
system repairs and acquiring replacement components after a hardware failure. Standardized software
configurations make routine operations simpler, reducing the possibility of operator error.
Creating a highly available environment requires more than just technology. Technology provides
the foundation for a highly available environment. But true high availability combines platform capa-
bilities, effective operating procedures, and appropriate training of everyone involved with the system.

High Availability Options


Michael Otey
High availability is probably a DBA’s highest priority. Nothing gets a DBA involved faster than the
database server going down. SQL Server provides several features that you can use to create a highly
available server environment. Here I discuss SQL Server’s high-availability options and show you what
types of failures each solution handles best.

Cluster Service
Microsoft Cluster service provides a high degree of database protection as well as automatic failover
by letting you set up two or more servers in a cluster. If one server fails, its workload is automatically
transferred to one of the remaining servers in the cluster. SQL Server 2000 Enterprise Edition supports
Cluster service, but it can be expensive to implement because it requires multiple servers that must
come from the Microsoft Hardware Compatibility List (HCL).

Log Shipping
Log shipping protects against server and database failure by creating a backup of the original
database on the primary server, then restoring the backup to the standby server. The standby server
is in a state of continuous recovery so that transaction logs captured on the primary server are
automatically periodically forwarded and applied to the standby server. If you use TCP/IP as the data
transport, you can operate the primary and standby servers in different locations. You have to initiate

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 6 SQL Server HA Short Takes 37

the failover process manually. Log shipping is included in the SQL Server 2000 Enterprise Edition, but
it’s a less expensive option than Cluster service because the servers don’t have to come from the HCL
list and you can manually implement it on any server that runs SQL Server.

Replication
Typically, transactional replication is a feature you use for distributed data but it also functions as a
high-availability solution that protects against server and site failure by duplicating data between
geographically separated servers. When you use replication, both your primary server and your
backup servers can actively provide database services. Switching from the primary server to the
backup server containing the replicated data is a manual process. All SQL Server editions support
transactional replication.

Database Mirroring
SQL Server 2005 will introduce database mirroring, which uses a primary server, a mirrored server,
and a witness server (that monitors the database mirror’s status) with Transparent Client Redirection
(in Microsoft Data Access Components—MDAC) to provide a database-level high-availability solution.
Database mirroring, basically built-in real-time log shipping, begins by restoring a database backup on
the mirrored server, then forwarding transaction logs in realtime from the primary server to the
mirrored server. The failover process is automatic; if the MDAC layer on the client fails to connect to
the primary server, it can automatically connect to the mirrored server.

Log Backup Checklist


Kalen Delaney

Backing up your transaction log lets you maintain a record of all the changes to a SQL Server
database so that you can restore it later if you need to. The following list will help you remember the
key features of log backups so that you can use them to your best advantage.

Use the full or bulk-logged recovery model. If your database is in the simple recovery model, you
can’t make log backups because SQL Server will truncate your log periodically.

Store your transaction log on a mirrored drive. Even if your data files are damaged and the
database is unusable, you can back up the transaction log if the log files and the primary data file are
available. Use a RAID level that guarantees redundancy, such as 1 or 10, and you’ll be able to back
up all the transactions to the point of failure, then restore them to the newly restored database.

Monitor log size. Although carefully planning how large your log should be is vital, don’t assume it
will never grow bigger than it did during testing. Use SQL Agent Alerts to watch the Performance
Monitor counters that track file size, and when the log crosses a threshold that you define, SQL Server
Agent can take predetermined actions such as running a script to increase the log size, sending you
email, or shrinking the file.

Brought to you by Neverfail and Windows IT Pro eBooks


38 HA Solutions for Windows, SQL, and Exchange Servers

Remember that log backups are non-overlapping. In SQL Server 2000 and 7.0, each log backup
contains all transactions since the previous log backup, so a long-running transaction can span
multiple log backups. So when you’re restoring log backups, don’t use the WITH RECOVER option
until you’ve applied the last log—later log backups might contain the continuation of the transactions
in the current log backup.

Understand the difference between truncating and shrinking. Backing up the log performs a
truncate operation, which makes parts of the log available for overwriting with new log records. This
doesn’t affect the physical size of the log file—only shrinking can do that.

Carefully plan how often to make log backups. There’s no one-size-fits-all answer, and you’ll
always have trade-offs. The more often you make log backups, the more backups you’ll have to
manage and restore, but the less likely you’ll be to lose transactions in case of a system failure.

The log size doesn’t always reflect the log-backup size. If your database is using the bulk-logged
recovery model, the log backups will include all data that the bulk operations affected, so the
backups can be many times as large as the log file.

Maintain log backups for at least two previous database backups. Usually, when you restore a
database, you apply all log backups you made after the database backup. But if a database backup is
damaged, you can restore an earlier database backup and apply all the logs made after that backup.
For full recovery, you just need to start your restore with a full database backup, then apply an
unbroken chain of log backups to that database.

You need log backups to restore from file or filegroup backups. If you’re planning to restore
from individual files or filegroups, you need log backups from the time the file or filegroup backup
was made until the time you restore the backup. The log backups let SQL Server bring the restored
file or filegroup into sync with the rest of the database.

To restore to a specific point in time, you need a log backup made in full recovery model.
Restoring a log backup to a specific point in time requires that the log contain a sequential record of
all changes to the database. If you’re using the bulk-logged model and you’ve performed any
bulk-logged operations, the log won’t be a complete sequential record of the work, so you can’t do
point-in-time recovery.

You might need to mix log backups with differential backups. If certain data changes
repeatedly, a differential backup will capture only the last version of the data, whereas a log backup
will capture every change. Because SQL Server’s Database Maintenance Plan Wizard doesn’t give
options for differential backups, you need to define your own jobs for making differential backups.

Practice recovering a database. Plan a recovery-test day to make sure your team knows exactly
what to do in case of a database failure. You may have the best backup strategy in the world, but if
you can’t use your backups, they’re worthless.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 6 SQL Server HA Short Takes 39

Are Your Backups Useless?


Brian Moran

SQL Server backups are useless if you can’t recover them. Backups are simply big disk files unless
you have a recovery mechanism that puts those bits back into SQL Server when you need them. So
when was the last time you tested your restore strategy? I’m not asking whether you’ve performed a
backup and run the RESTORE command manually to see whether the media is valid. I’m asking
whether you’ve tested your restore methodology to make sure it works the way you think it does.
Can you get your production database back up and running after a disaster? You must have a
planned and tested restore methodology to be sure.
At a SQL Server Magazine Connections in Orlando, Florida, I sat in on Kimberly Tripp’s talk
about SQL Server backup and restore. Kimberly presented a number of interesting tips, but her
fundamental backup tenet is that the backup is useless without the ability to restore it. And you
don’t know that you can restore your backup unless you’ve fully tested your plan.
I suspect that many of you don’t have a well-tested backup and recovery plan. Testing backup
and recovery plans can be difficult, especially if you don’t have the hardware resources to do a
complete dry run of a failure and recovery. For example, properly testing your restore methodology
is hard to do if your production system is a one-tier warehouse and you don’t have a test server of
equal capacity. But budgeting for adequate testing and quality assurance equipment should be a
non-negotiable part of an efficient data center. If you haven’t planned how to recover your data and
tested that plan, when a true disaster happens, you’re asking for trouble.
If you haven’t tested your backup and recovery plan, your backups might not be as valuable as
you think they are. Backing up is easy; getting the data back can be the hard part.

Brought to you by Neverfail and Windows IT Pro eBooks


40 HA Solutions for Windows, SQL, and Exchange Servers

Chapter 7:

Exchange and SANs: No Magic Bullet


Paul Robichaux

Recently, a company that’s thinking about deploying a SAN for its Exchange servers contacted me.
The company wanted to know whether a SAN made sense for its organization as well as how best to
configure and tune a SAN. Read on to learn the answers to these questions.
Figuring out whether a SAN makes sense for a given organization can be tricky because the term
spans a wide range of technology and complexity. For example, you could claim that the old Dell
650F storage enclosure I owned several years ago was a SAN. It had a Fibre Channel interconnect,
and I used it as shared storage for a three-node cluster. It didn’t, however, have replication, dynamic
load balancing, or much expandability. Since that time, the SAN category has broadened so that it
includes two primary classes of devices.
Fibre Channel SANs use optical fiber (or, in rare cases, copper cables) to interconnect SAN
devices. Each node on the SAN requires a Fibre Channel host bus adapter (HBA), and most Fibre
Channel SANs use a fibre switch to provide mesh-like connectivity between nodes. Fibre Channel
speeds range from 1GBps to 4GBps and, with the right implementation, can span distances up to
100 kilometers.
iSCSI SANs are a relatively new, lower-cost way to implement SANs. Instead of using optical fiber
or copper, iSCSI SAN uses TCP/IP over ordinary network cabling. Its advantages are pretty obvious:
lower costs and more flexibility. Instead of spending big bucks on Fibre Channel HBAs and switches,
you can deploy lower-cost Gigabit Ethernet HBAs (which are, more or less, ordinary network adapter
cards) and switches, and it’s much easier to extend the distance between SAN devices without
resorting to a backhoe.
In either case, the primary advantages of SANs are their flexibility, performance capabilities, and
support for high availability and business continuance. Let’s consider each of these advantages
separately.
SAN gets its flexibility from the fact that it’s a big collection of physical disks that you can
assemble in various logical configurations. For example, if you have an enclosure with 21 disks, you
can make a single 18-disk RAID-5 array with three hot spares, a pair of 9-disk RAID-5 arrays with
three hot spares, or one whopping RAID-1+0 array (although I would be loath to give up those
spares). In theory, these configurations let you build the precise mix of logical volumes you need and
tailor the spindle count and RAID type of each volume for its intended application. (In practice,
sometimes this doesn’t happen, as I’ll discuss next week.)
SAN’s performance capabilities are the result of two primary factors: lots of physical disks and a
big cache. Which of these is the dominant factor? It depends on the mix of applications you use on
the SAN, how many disks you have, and how they’re arranged. When you look at a SAN’s raw
performance potential, remember that the SAN configuration will have a great effect on whether you
actually realize that degree of performance.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 7 Exchange and SANs: No Magic Bullet 41

When it comes to high availability and business continuance, even if you use your SAN only as a
big RAID array, you’ll still get the benefit of being able to move data between hosts on the SAN.
SANs also make it much easier to take point-in-time copies, either using Microsoft Volume Shadow
Copy Service (VSS) or vendor-specific mechanisms. Add replication between SAN enclosures, and you
get improved redundancy and resiliency (albeit at a potentially high cost).
SANs are often deployed in conjunction with clusters, but they don’t have to be. A SAN shared
between multiple unclustered mailbox servers still offers the benefits I describe above—without the
complexity of clustering. SANs themselves are fairly complex beasts, which is one common (and
sensible) reason why organizations that could use SAN’s performance and flexibility sometimes shy
away from SAN deployments. If you aren’t comfortable setting up, provisioning, and managing a
SAN, being dependent on it can actually leave you worse off than you would have been without it.
Cost is also a factor to consider. Obviously, the actual cost of a given solution varies according to
its specifics, but all this capability doesn’t come cheap. Purchase and maintenance cost is the other
big reason why SANs aren’t more prevalent; many organizations find that they get more business
value from spending their infrastructure dollars in other ways.

Brought to you by Neverfail and Windows IT Pro eBooks


42 HA Solutions for Windows, SQL, and Exchange Servers

Chapter 8:

Build an Exchange 2003 Cluster


Daragh Morrissey

Clustering Microsoft Exchange Server 2003 servers can potentially improve service levels by reducing
downtime—especially planned downtime, when you have to reboot servers after applying monthly
Microsoft patches. Windows Server 2003 includes enhancements that make setting up and deploying
a cluster much easier than under Windows 2000 Server. If you believe clustering can benefit your
Exchange organization and are ready to get started, this chapter can help guide you through the
cluster-setup process. I’ll explain Exchange clustering basics and the preparatory steps you must take
before building a new two-node Exchange 2003 Service Pack 1 (SP1) cluster running on Windows
2003 SP1. And I’ll explain how to install Exchange 2003 on a Windows 2003 SP1 cluster and
post-installation best practices.

Exchange Virtual Servers


Clusters are groups of servers configured to work together to provide the image of a single server.
Microsoft Outlook clients access Exchange running on a cluster via the Exchange Virtual Server (EVS)
service. To a user, an EVS looks like a typical standalone server. An EVS contains these Exchange
cluster resources:
• Exchange System Attendant
• Exchange HTTP Virtual Server
• Exchange Information Store (IS)
• Exchange Message Transfer Agent (MTA)
• Exchange MS Search
• Exchange Routing Service
• SMTP Virtual Server

Before you can install an EVS, you must manually create the following cluster resources by using
the Cluster Administrator program:
• a TCP/IP address resource for the EVS
• a network name for the EVS
• disk resources that the EVS uses

Later I’ll outline the steps for creating an Exchange cluster group and the resources that the EVS
requires. When you configure an Exchange 2003 cluster, the Exchange Setup program places all these
resources in a resource group (called an Exchange cluster group). An EVS can’t be split across

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 43

separate resource groups, which ensures that the resources and virtual servers all fail over as a single
unit and that resource-group integrity is maintained.

Exchange Cluster Models


Exchange 2003 supports two cluster models: active/active (two-node clusters only) and active/passive
(two- to eight-node clusters). A node is considered active if it hosts an EVS and passive if it doesn’t
host an EVS. Active/active clusters were first introduced in Exchange 2000 Server, and for
backward-compatibility reasons, Microsoft continues to support them in Exchange 2003. Because of
scalability limitations of active/active clusters, Microsoft’s recommended cluster model for Exchange
2003 and Exchange 2000 is active/passive.
Although active/active clusters appear to be an attractive proposition (no servers are sitting idle
waiting for a failover to occur), they have limitations. Early deployments of active/active clustering on
Exchange 2000 had virtual memory problems, which led Microsoft to state that the maximum number
of concurrent Outlook client connections that a node could support under Exchange 2000 release to
manufacturing (RTM) was 1000. Microsoft made some improvements to Exchange 2000 SP1 and SP2
that allowed an active/active cluster to support more connections (1500 for SP1; 1900 for SP2). The
1900 limit in Exchange 2000 SP2 still applies to clusters running Exchange 2000 SP3, Exchange 2003
RTM, or Exchange 2003 SP1. Virtual memory fragmentation is less of an issue on active/passive
clusters because an EVS can always be started on a passive cluster node (there’s no active EVS on the
passive node, so virtual memory fragmentation isn’t an issue).
Additionally, active/passive clusters don’t have the same constraints on the numbers of supported
connections as active/active clusters. As I mentioned, Exchange 2003 SP1 supports a maximum of
1900 concurrent connections per node in an active/active configuration. A connection in this context
means an active Outlook, Microsoft Outlook Web Access (OWA), or Outlook Express client
connection and shouldn’t be confused with the number of mailboxes residing on an Exchange
server. Because of scalability limitations of active/active clusters, Microsoft’s recommended cluster
model for Exchange 2003 and Exchange 2000 is active/passive. With an active/passive cluster, the
1900-connections limit doesn’t apply. In Exchange 2003, Microsoft has built functionality into
Exchange System Manager (ESM) that enforces active/passive clustering guidelines on clusters
that have more than two nodes. (You can find more information about these guidelines at
http://support.microsoft.com/?kbid=329208.) You can create n ? 1 EVSs, where n represents the
number of nodes in the cluster. ESM prevents you from creating EVSs that equal or exceed the
number of nodes in the cluster.

Preparing Your Cluster for the Exchange Installation


Planning is an essential ingredient in a successful Exchange 2003 cluster deployment. It involves
considerations such as training, choosing a cluster model, choosing hardware and storage, and setting
permissions. In the sidebar “Planning Your Exchange Cluster Deployment,” I discuss Exchange
cluster-planning considerations in detail.
After you’ve planned the Exchange cluster and created the Windows 2003 cluster, you’re almost
ready to begin installing Exchange 2003 on it. But before you install Exchange, you need to perform
some additional tasks and checks to avoid problems during the installation.

Brought to you by Neverfail and Windows IT Pro eBooks


44 HA Solutions for Windows, SQL, and Exchange Servers

Install layered products that Exchange requires. Exchange requires several Windows
components, which you install via the Add or Remove Programs applet. You need to install these
components on both cluster nodes. To install the components, in Add or Remove Programs click
Add/Remove Windows components, check Application Server, click Details, and check ASP.NET and
Internet Information Services (IIS). Click Details, then check Common Files, NNTP Service, SMTP
Service, and World Web Service. Click OK to close each box and install the components

Install Windows 2003 SP1 and the latest security patches. It’s a good time to install the latest
patches to your cluster since no mailboxes or users are connected to it.

Verify the network connection configurations on each cluster node. Each node has two
network connections: a public connection to the LAN, which Outlook clients use to access the EVS
and administrators use to remotely manage the cluster, and a heartbeat connection, which the cluster
service uses to detect whether a cluster node is online or offline. (The public network can also be
used for heartbeat communication.) Occasionally, Windows clusters can be built with the cluster
heartbeat set at a higher priority in the binding order (i.e., the order in which they’re accessed by
network services) than the public-facing LAN.
Modify the binding order so that the public-facing connection is highest, followed by the
heartbeat, as Figure 1 shows. To check the binding order, in Control Panel, go to Network
Connections, Advanced, Advanced Settings, and select the Adapters and Bindings tab. Standardize the
binding order on each node. Follow the steps described at http://support.microsoft.com/?kbid=258750
to remove NetBIOS from the heartbeat connections, set the appropriate cluster communication
priority, and define the correct NIC speed and duplex mode on the heartbeat connection.

Figure 1
Modifying the binding order

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 45

Create cluster groups and the Microsoft Distributed Transaction Coordinator (MS DTC)
resource. Before installing Exchange, you need to create the required cluster groups and resources.
By default, the cluster installation program creates an initial configuration that includes a cluster
resource group containing the cluster IP address resource, a cluster network name, and a quorum
disk resource (the quorum drive is usually assigned drive letter Q). This group is commonly known
as the cluster group. The cluster installation program also creates a cluster resource group for each
disk resource (called Group 0, Group 1, and so on). Figure 2 shows my initial cluster configuration
for DARA-CL1.
Figure 2
An initial cluster configuration

Exchange 2003 requires MS DTC to be configured as a cluster resource. MS DTC is a service based
on the OLE transactions-interface protocol, which provides an object-oriented (OO) interface for initi-
ating and controlling transactions. The Microsoft article at http://www.support.microsoft.com/?kbid=301600
describes the procedure for configuring MS DTC in a cluster environment and recommends placing
MS DTC in a separate cluster resource group with its own disk, IP address, and network name
resources. In my opinion, the advice in this article is appropriate for applications such as Microsoft
SQL Server that make heavy use of MS DTC. Exchange, on the other hand, makes very light use of
MS DTC and unless you’ve deployed some workflow applications that use MS DTC, you can actually
take MS DTC offline without affecting Exchange. (See also some sound advice about this matter
posted by Evan Dodds at http://blogs.technet.com/exchange/archive/category/3896.aspx.) Therefore,
I recommend placing the MS DTC resource in the cluster group (along with the cluster network
name, IP address, and quorum disk resources).
You should also create each EVS in a separate cluster resource group (i.e., an Exchange cluster
group). To create a group, in Cluster Administrator click New, then click Resource Group. For my
EVS, DARA-EVS1, I called my resource group DARA-EVS1-GRP to reflect the EVS’s name. The New

Brought to you by Neverfail and Windows IT Pro eBooks


46 HA Solutions for Windows, SQL, and Exchange Servers

Resource wizard then prompts you for the possible owners (nodes) of this resource group. For a
two-node cluster, add the two nodes as possible owners, as Figure 3 shows.
Next, move disk resources created by the cluster installation program to the new Exchange
resource group. To move a resource, right-click the disk resource and select Move Group. Choose the
Exchange resource group (here, DARA-EVS1-GRP). Repeat this procedure for each resource group
created by the cluster installation program.

Figure 3
Adding nodes as possible owners

You can delete the disk resource groups you changed in the last step (e.g., Group 0, Group 1,
Group 2) because they’re now empty. To delete a resource group, in Cluster Administrator, right-click
the resource group and click Delete. You should now have a cluster group configuration (two
resource groups: a cluster group and a resource group for Exchange) similar to the one that Figure 4
shows.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 47

Figure 4
Cluster group configuration

At this point, you need to create the resources in the Exchange cluster group that each EVS
requires, such as an IP address, a network name, and disk resources. Create the IP address resource
in your Exchange cluster group. To create the IP address resource for the EVS, in Cluster Adminis-
trator, click File, New Resource, which displays the dialog box that Figure 5 shows. Select IP Address
for the resource type and make sure you select the Exchange resource group (not the cluster group)
as the group for the new IP address resource.
Figure 5
New Resource dialogue box

Brought to you by Neverfail and Windows IT Pro eBooks


48 HA Solutions for Windows, SQL, and Exchange Servers

After you click Next, the wizard prompts you to select nodes that can own this resource and to
accept that both nodes can be owners. Next, the wizard asks you to select resource dependencies for
the IP address resource. You don’t need to set any resource dependencies, so click Next. The wizard
then asks you to supply an IP address and subnet mask and choose a network for the IP address.
Make sure you choose Public Network because clients will use this network connection to connect to
the EVS. The heartbeat network is for internal cluster communications only. Click Next, and you
should see the message Cluster Resource created successfully. Bring the IP address resource online by
right-clicking it and clicking Bring Online.
The procedure for creating the EVS network name resource is similar to that for creating the EVS
IP address resource. As before, in Cluster Administrator, click File, New Resource, and select Network
Name from the resource list. Make sure you select the Exchange resource group as the group. Select
the Network Name resource, enter its name, and click Next. The wizard prompts you to accept that
both nodes should be owners of this resource. Click Next. Next, you’re asked to specify dependen-
cies for the Network Name resource. This resource has a dependency on the IP address resource
because the IP address resource must be online before the Network Name can come online. Choose
the IP address resource you created in the previous step and click Next. The wizard now prompts
you to specify the EVS name. At this point, you can also specify whether the EVS name should be
authenticated against AD by using Kerberos authentication and whether it should be registered with a
DNS server. Enable these settings, as Figure 6 shows, and click Next. You should see a message
stating that the Network Name resource was created successfully. At creation time, the Network Name
will be offline. Bring the resource online by right-clicking it and clicking Bring Online.

Figure 6
Network Name Parameters dialogue box

Ready for the Next Step


The cluster is now ready for Exchange 2003 to be installed. However, before you reach for the
Exchange CD-ROM, I recommend that you perform some testing on the cluster to verify that
everything is configured correctly. Microsoft provides a free tool for testing and verifying a cluster

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 49

configuration, called the Cluster Diagnostics and Verification Tool (ClusDiag.exe). You can download
the tool at http://www.microsoft.com/downloads/details.aspx?familyid=b898f587-88c3-4602-
84de-b9bc63f02825&displaylang=en.

Before You Install


Before building your new Exchange 2003 cluster, check that you’ve met all the requirements for
deploying Exchange 2003. Read the guidelines at http://www.microsoft.com/exchange/evaluation/
sysreqs/2003.mspx, and make sure that you’ve completed all the steps described earlier.

Creating an Exchange Virtual Server


By now you’ve configured the cluster group with the cluster name and IP address resource and a
Microsoft Distributed Transaction Coordinator (MS DTC) resource. You’ve also configured an
Exchange resource group that contains disk resources, a cluster network name, and an IP address
resource for your Exchange Virtual Server (EVS). Your next task is to install Exchange 2003 on the
cluster nodes, by following the procedure I outline below.

Document the installation. Document every step of the installation. The documentation shows
other administrators how you built the server and can also be useful for disaster recovery purposes
(i.e., if you have to rebuild the cluster). The easiest way to create documentation is to open a new
WordPad or Microsoft Word document, then for every step of the installation take a screen shot of
the active window by pressing Alt+Print Screen, which copies an image of the window into the
Clipboard. From WordPad, select Edit, Paste Special, and choose Device Independent Bitmap.
Choosing to paste the image as a device-independent bitmap (instead of doing a standard paste)
reduces the document’s size. Store the documentation log somewhere safe (and off the cluster).

Install Exchange 2003 on the first cluster node. Log on to Node 1 by using an account that has
Exchange Full Administrator permissions. Run the Exchange 2003 setup program (located in the
\setup\i386 folder of the Exchange 2003 CD-ROM). You’ll get the error message Exchange Server
2003 has a known compatibility issue with this version of Windows. Ignore this message for now; later
we’ll apply Exchange 2003 SP2 to correct the problem. At the Welcome screen, click Next.
Click I Agree to accept the End User License Agreement (EULA). Under the Action menu, select
Typical. At the next screen, accept the terms of a Per Seat Licensing Agreement. The installation
begins; a completion message is displayed when the installation is done.

Install Exchange 2003 on the second cluster node. Install Exchange 2003 on the second node
(Node 2), repeating the steps you followed for Node 1. You can’t update the binaries to Exchange
2003 SP2 yet; you must create the EVS before you can apply SP2.

Create the EVS. After you’ve installed the Exchange binaries on the cluster nodes, you can create an
EVS. You must create the EVS on the active node (the node that currently owns the EVS) because the
setup program places Exchange databases, transaction logs, and other components on the shared
storage so that each node can access them.
To create the EVS, log on to the active node by using an account that has Exchange Full
Administrator privileges. Open Cluster Administrator; click File, New, Resource; and select Microsoft

Brought to you by Neverfail and Windows IT Pro eBooks


50 HA Solutions for Windows, SQL, and Exchange Servers

Exchange System Attendant. Create this resource in the Exchange cluster group (not the same as the
cluster group), which in our example is DARA-EVS1-GRP, as Figure 7 shows. Click Next.

Figure 7
Creating the EVS

You’re prompted to specify nodes as possible owners for this resource. Both cluster nodes
should be listed as owners by default. Click Next to continue. Next, you’re prompted to supply
dependencies for the Exchange System Attendant resource. Select all the resources (IP address,
network name, and disk resources) in the left pane and click Add to add them as required resources
for the Exchange System Attendant resource. Click Next.
Now you’ll choose an administrative group for your EVS. Select a group and click Next, then
select a routing group for your EVS and click Next. At the next prompt, select a data directory folder
in the shared storage to contain Exchange databases, transaction logs, the SMTP folders, the
full-text–indexing database, and the message-tracking log files. The default location for this folder is
the \exchsrvr folder on a physical disk resource in the Exchange resource group that you created in
Part 1. A limitation of the Exchange setup program is that it places all the components (such as the
databases) in the same folder at installation time. You need to move them manually after the installa-
tion. To help me identify which Exchange components the setup program has placed at installation
time, I usually name the folder by using the convention \exchsrvr_staging-EVSname. (In this cluster
build, I called it \exchsrvr_staging_DARA-EVS1 to indicate that this folder was created at installation
time for that EVS.) Later I’ll explain how you can move these components to other drivers or folders.
The setup program displays a summary screen that lists all the settings, as Figure 8 shows. Click
Finish to accept the installation summary. The Exchange setup program automatically creates
resources for Exchange components such as the Information Store and protocol resources for IMAP
and POP. Upon completion, the setup program displays a message stating that the Exchange System
Attendant cluster resource was created successfully.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 51

Figure 8
A settings summary screen

At this point, you can see in Cluster Administrator that all Exchange resources are offline. Bring
each resource online by right-clicking the Exchange resource and selecting Bring Online. The list of
resources displayed in Cluster Administrator should look similar to the screen that Figure 9 shows.

Figure 9
A list of resources

Brought to you by Neverfail and Windows IT Pro eBooks


52 HA Solutions for Windows, SQL, and Exchange Servers

To verify that the EVS is online, open Exchange System Manager (ESM) and check your adminis-
trative group. DARA-EVS1 appears as a clustered server running Exchange 2003 RTM version 6944.4.
I recommend you now perform a failover test to Node 2 to verify that the EVS can run on both
nodes. To perform a failover by using Cluster Administrator, right-click the Exchange resource group
associated with the EVS (here, DARA-EVS1) and select Move Group. Doing so triggers a failover of
your newly created EVS from Node 1 to Node 2.

Installing Exchange 2003 SP2


After you’ve successfully created an EVS that uses Exchange 2003 RTM, I advise you to upgrade the
cluster to Exchange 2003 SP2. Before you upgrade your cluster to Exchange 2003, be sure to upgrade
any front-end servers to Exchange 2003 SP2. You must apply service packs to front-end servers
before you can upgrade back-end servers. For Exchange 2003, Microsoft introduced a new procedure
for upgrading Exchange service packs. To learn more about this procedure, see “Exchange 2003
Clusters: Rolling Upgrades,” July 2005, InstantDoc ID 46335, on the Windows IT Pro Web site.
To install Exchange 2003 SP2, log on to Node 1 (it should be passive now because you’ve just
performed a failover to Node 2) and apply Exchange 2003 SP2 by running update.exe (located in the
\setup\i386 folder of the Exchange 2003 SP2 CD-ROM). At the Licensing Agreement screen, select
I Agree to accept the License Agreement and click Next. Under the Action column, select Update, as
Figure 4 shows. Select the default action (which is to update all components, such as ESM). Be aware
that when the update procedure is done, you might be prompted to reboot. I was prompted to
reboot because I performed the installation over a Remote Desktop connection.
To complete the upgrade, you need to take the EVS offline while leaving online the network
name, IP address, and storage resources associated with the EVS. To do so, right-click the System
Attendant resource, then select Take Offline. Doing this takes offline Exchange resources such as
Store and Protocol resources for IMAP and POP because they’re all dependent on the Exchange
System Attendant (i.e., EVS) being online.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 53

Figure 10
Installing Exchange 2003 SP2

Next you need to move the Exchange resource group from Node 2 to Node 1 by performing a
failover (right-click the Exchange resource group and select Move Group). Be aware that you can’t
perform this process from a Cluster Administrator session running on Node 2 because the files
required for the upgrade procedure aren’t yet installed on Node 2. The requirement to run an
additional upgrade procedure for Exchange service packs from Cluster Administrator is new in
Exchange 2003. It was first introduced as part of the cluster upgrade procedure from Exchange 2000
Server to Exchange 2003. The procedure is also required to upgrade a cluster from Exchange 2003
RTM to Exchange 2003 SP1.
The Exchange program files are now upgraded to Exchange 2003 SP2 on Node 1. To finish the
upgrade, log on to Node 1. Right-click the System Attendant resource for the EVS and select Upgrade
Exchange Virtual Server. When the upgrade is done, you should see the message The Exchange
Virtual Server has been upgraded successfully.
Node 2 is running the Exchange 2003 RTM version. Install Exchange 2003 SP2 on Node 2 by
running update.exe. At the Licensing Agreement screen, select I Agree to accept the License
Agreement and click Next. Select Update from the Action column. When the SP2 upgrade is finished
on Node 2, reboot if prompted to do so, and when Node 2 has finished restarting, verify that SP2 has
been installed correctly by moving the Exchange resource group from Node 1 to Node 2 as you did
earlier. As a final test, I recommend you reboot each node in turn, starting with Node 1. The EVS
should fail over from Node 1 to Node 2. When Node 1 has returned online and rejoined the cluster,
reboot Node 2 to test the failover from Node 2 to Node 1. These tests will verify that the Exchange
cluster is configured correctly. After each failover finishes, check the Application log for errors.

Brought to you by Neverfail and Windows IT Pro eBooks


54 HA Solutions for Windows, SQL, and Exchange Servers

Post-Installation Tasks and Best Practices


Give yourself a pat on the back: You now have a working Exchange cluster! But before you place
mailboxes on the EVS, you need to perform some key tasks.

Redistribute Exchange components across disk resources. The Exchange cluster Setup
program places the Exchange components in the \data directory folder. (I placed all the components
in H:\exchsrvr_staging_DARA-EVS1 folder when the sample EVS was created.) The following folders
contain the Exchange cluster components for our sample installation:
• E:\exchsrvr_staging_DARA-EVS1\mdbdata contains Exchange .edb files, streaming database (.stm)
files, the checkpoint file, and the transaction logs.
• E:\exchsrvr_staging_DARA-EVS1\mtadata contains the Message Transfer Agent (MTA) folder.
• E:\exchsrvr_staging_DARA-EVS1\mailroot contains the folder structures that the SMTP Virtual
Server uses.
• E:\exchsrvr_staging_DARA-EVS1\exchangeserver_servername contains the full-text–indexing
database associated with the EVS.
• E:\exchsrvr_staging_DARA-EVS1\servername.log contains the message-tracking log files.

Place transaction logs and Exchange databases on separate drives. Placing these entities on
separate drives is a long-established Microsoft best practice, which I strongly advise you to adhere to.
Doing so will help your Exchange server perform better. (The Information Store process writes each
transaction to the database and transaction logs; splitting them across different physical drives
distributes the load on the storage.) More important, though, placing the transaction logs on a drive
physically separate from the database lets you recover data from a backup if you lose the drive that
holds your databases. For more information about placing databases and transaction logs on separate
drives, follow the instructions in the Microsoft article “How to move Exchange databases and logs in
Exchange Server 2003” (http://support.microsoft.com/?kbid=821915).

Move SMTP folders. To move SMTP folders, follow the instructions in the Microsoft article “How to
change the Exchange 2003 SMTP Mailroot folder location” (http://support.microsoft.com/?kbid=822933).

Move the indexing files. To move the full-text–indexing property store and property store logs,
follow the instructions at http://www.microsoft.com/technet/prodtechnol/exchange/guides/
workinge2k3store/e1ea3634-a2c0-40e6-ad50-e9e988ae4728.mspx and in the Microsoft article “XADM:
Recommendations for Using Content Indexing Utilities (Pstoreutl or Catutil) in a Cluster Environment”
(http://support.microsoft.com/?kbid=294821). You use two utilities to move the indexing files:
pstoreutl.exe, which moves the property store to another drive location, and catutil.exe, which moves
the catalog (index) to another drive location.

Back up the cluster. After you’ve moved the necessary components, perform a full backup of your
cluster by using NTBackup. Back up the drives in your local storage and the system state and also
perform a full Exchange database backup.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 55

Install third-party products. Install third-party layered products such as file-based antivirus
software, Exchange-aware antivirus software, and monitoring software. Take care to exclude folders
that contain Exchange databases and transaction logs from a file-based virus scanner because such
scanners can corrupt Exchange databases. (For more information, see the Microsoft article “Overview
of Exchange Server 2003 and antivirus software” at http://support.microsoft.com/?kbid=823166.)

Perform tests. Create a test mailbox on your cluster and create a Microsoft Outlook profile that has
Cached Exchange Mode enabled. Perform some failover tests and record the time it took for the EVS
to go offline and online between cluster nodes. You’ll find this information useful for planning future
maintenance. As you add mailboxes to the cluster, failover times might increase because more
connections will be open to the EVS. On production clusters that have several hundred active client
connections, I’ve seen failovers take 2 to 10 minutes. Failover times depend on many different factors,
such as the hardware specification of cluster nodes, performance of the Exchange storage subsystem,
and number of active connections.

Run Exchange Server Best Practices Analyzer (ExBPA). Run ExBPA against your newly
installed cluster to verify your installation. ExBPA is cluster aware and will analyze the configuration
of the cluster and EVS and generate a report like the one that Figure 11 shows. You can download
ExBPA at http://www.microsoft.com/downloads/details.aspx?familyid=dbab201f-4bee-4943-ac22-
e2ddbd258df3&displaylang=en.

Figure 11
Sample ExBPA report

Brought to you by Neverfail and Windows IT Pro eBooks


56 HA Solutions for Windows, SQL, and Exchange Servers

Planning Your Exchange Cluster Deployment


Careful planning is the key to a successful Exchange Server 2003 cluster deployment. When developing your cluster-
implementation plan, you should consider areas such as training; choosing the right cluster model, hardware, and
storage; preparing the Windows and Exchange infrastructures; and permissions.

Training for Cluster Administrators


Cluster hardware configurations are more complex than single-server deployments, and cluster deployments often
fail because IT employees aren’t adequately prepared. Training should familiarize administrators with cluster
concepts such as the quorum, failover and failback operations, and using the Cluster Administrator tool. Microsoft
Virtual Server 2005, Enterprise Edition, which lets you create virtual clusters, provides an ideal way for cluster
novices to play around with clusters without having to deploy additional cluster hardware for training purposes.

Choosing the Right Cluster Model


Active/passive is the preferred cluster model because it lets you deploy Exchange clusters that can support many
users. However, you should understand some of the limitations that govern failovers on active/passive clusters that
have more than two nodes. On such clusters, only one Exchange Virtual Server (EVS) can be hosted on a node at a
time. Any attempt to move a second EVS to a node currently hosting an EVS will fail. (This failover constraint is
described at http://support.microsoft.com/?kbid=329208.) If you attempt to move an EVS to a node that already hosts
an EVS, you’ll get the error message An error occurred attempting to bring the System Attendant resource online: The
cluster resource could not be brought online by the resource monitor. Error ID: 5018 (0000139a). These limitations
exist to enforce active/passive failovers on clusters with more than two nodes.
Active/passive clusters can be deployed in many different combinations (e.g., an eight-node cluster with four
active and four passive nodes; a seven-node cluster with four active and three passive nodes). If this is your first
cluster deployment, I recommend keeping it simple and using a two-node active/passive cluster. If you deploy it
successfully, you can look at deploying larger clusters.

Choosing the Right Hardware for Cluster Nodes


The goal of your cluster deployment is to provide high availability. In the event of a hardware failure, a failover
operation moves resources from the failed node to another cluster node. During failovers, users might experience
timeouts for a short time as resources are taken offline on the failed node and brought online on the other node. For
each cluster node, you should implement redundant hardware components—such as redundant NICs, power
supplies, connections to shared storage (host bus adapters—HBAs, array controllers), power supplies, and fans—to
reduce the impact of a hardware failure and thus avoid a failover. Connect redundant power supplies to separate
power distribution units to ensure power if a unit fails. Many hardware vendors offer “packaged” clustering solutions
that take the guesswork out of building a cluster hardware specification. A packaged cluster typically includes two
nodes, a built-in heartbeat network between the nodes and a SAN. An example of a packaged cluster is the HP
ProLiant DL380 Packaged Cluster with MSA1000. Whether you choose a packaged cluster solution or choose to
build your own cluster configuration, you must use hardware that’s listed on the Windows Server Catalog site
(http://www.microsoft.com/windows/catalog/server/default.aspx?subid=22&xslt=hardwarehome). Microsoft doesn’t
support configurations that include hardware not in the catalog.

Brought to you by Neverfail and Windows IT Pro eBooks


Chapter 8 Build an Exchange Cluster 57

Choosing the Right Storage


Choosing the right shared-storage infrastructure is a critical part of the planning process. Exchange databases expand
over time, and you might start to run out of disk space if you haven’t provided capacity. Having spare capacity is also
useful for performing database maintenance and disaster recovery exercises. Adding disk space to a cluster might
require future downtime if your storage infrastructure doesn’t support dynamic volume expansion. Estimating your
storage requirements depends on many factors, such as the number of mailboxes you plan to host on each EVS and
the standard mailbox size limit you’ll implement. HP provides a downloadable storage-sizing calculator at
http://h71019.www7.hp.com/activeanswers/cache/71108-0-0-0-121.html.

Preparing the Windows Infrastructure


You should verify that your configuration meets the requirements for deploying an Exchange 2003 cluster. The first
requirement is to have the necessary IP addresses for the cluster. For a two-node cluster that has one EVS, you’ll need
one IP address for each node, one IP address for the cluster, and one IP address for the EVS.
The next requirement is to make sure that all Global Catalog (GC) servers or domain controllers (DCs) in the
same Windows site as the EVS run Windows 2003 or Windows 2000 SP3 or later. Exchange 2003 stores
configuration information in the Active Directory (AD) DCs and GC. DCs hold a complete copy of all objects
belonging to a domain plus a copy of objects replicated in the forestwide configuration naming context (NC). GCs
hold a complete copy of all objects in their own domain plus partial attributes of objects from all other domains
within the forest. I recommend you deploy at least two GCs in the same AD site as your EVS for redundancy. If a GC
goes offline, Exchange can bind to the other GC in the site.
The third requirement is to create a cluster service account, which the Microsoft Cluster service uses to log on at
boot time and form or join a cluster. The cluster service account has logon as service rights on each node and
requires membership in the local Administrators group on each node. I recommend that you don’t use the default
Administrator account as the cluster service account. If the administrator account is disabled, it might cause the
cluster service to fail. I also recommend creating one service account per cluster and linking the username to the
cluster name. For example, for the DARA-CL1 cluster, the service account could be SVC-DARACL1. Creating one
account per cluster reduces the impact of the account being locked out or accidentally disabled. If you have 10
clusters all using the same cluster service account, this cluster account service account becomes a potentially large
single point of failure (a lockout event on that account will cause all clusters to fail). Finally, to reduce the risk of the
service account being disabled, don’t use the service account to log on to a cluster. Instead, create a separate
security group for cluster administrators. Add the users who manage clusters to this security group, and add this
security group to the local Administrators group on each cluster.
Finally, you’ll need to create a Computer account in AD using the Microsoft Management Console (MMC)
Active Directory Users and Computers snap-in for each EVS. I suggest that you create and properly secure this
account in advance. If you’re running your cluster in a security-hardened environment, consult “How to Run
Exchange Server 2003 Clusters in a Security-Hardened Environment” at http://www.microsoft.com/technet/
prodtechnol/exchange/guides/howtorune2k3clusters/c6e8057e-b4f8-4d3f-b488-b27ab3a0255d.mspx, which
describes how you can create a special organizational unit (OU) in AD to hold the computer accounts belonging to
EVS, how to grant the OU permissions to the Cluster Service account, and how to apply the Exchange Security
templates provided by Microsoft as part of the “Exchange Server 2003 Security Hardening Guide.” (You can
download the guide at http://www.microsoft.com/downloads/details.aspx?familyid=6a80711f-e5c9-4aef-9a44-
504db09b9065&displaylang=en.)

Brought to you by Neverfail and Windows IT Pro eBooks


58 HA Solutions for Windows, SQL, and Exchange Servers

Preparing the Exchange Infrastructure


A number of Exchange components aren’t supported or can’t be installed on an Exchange cluster, so you might have
to deploy additional Exchange servers before deploying the cluster. For example, Site Replication Service (SRS),
which replicates configuration between AD and the Exchange Server 5.5 Directory Service (DS), doesn’t run on an
Exchange cluster. Thus, if you plan to implement your cluster in an existing Exchange 5.5 site, you must deploy a
standalone Exchange 2003 server in the site to meet the requirements for your migration. You can find a list of
Exchange components and whether or not they’re supported in a cluster at
http://support.microsoft.com/?kbid=259197.

Setting Permissions
Microsoft built many security enhancements into Exchange 2003, and a number of these improvements are cluster-
specific. In Exchange 2000 Server, the cluster service account required Exchange Full Administrator rights at the
Administrative Group level to create an EVS. With Exchange 2003, the cluster service account doesn’t require any
Exchange-specific permissions delegated to the Administrative Group. To install an EVS, the account you use to
perform the installation must be delegated Exchange Full Administrator rights at the Administrative Group level (or be
a member of a security group that’s been delegated Exchange Full Administrator rights). If the EVS is the first EVS to
be installed in your Exchange organization, the account you use to perform the installation also requires Exchange
Full Administrator rights at the Organization level. (For more information and best practices for setting Exchange
permissions, see the document Working with Active Directory Permissions in Microsoft Exchange 2003 at
http://www.microsoft.com/downloads/details.aspx?familyid=0954b157-5add-48b8-9657-
b95ac5bfe0a2&displaylang=en.)

Brought to you by Neverfail and Windows IT Pro eBooks

You might also like