TFG - DRM Intro V2
TFG - DRM Intro V2
By Sean O Sperry
Version 2.00
Copyright Notice
Copyright IBM Corporation 2005. All rights reserved. May only be used pursuant
to a Tivoli Systems Software License Agreement, an IBM Software License
Agreement, or Addendum for Tivoli Products to IBM Customer or License
Agreement. No part of this publication may be reproduced, transmitted,
transcribed, stored in a retrieval system, or translated into any computer
language, in any form or by any means, electronic, mechanical, magnetic,
optical, chemical, manual, or otherwise, without prior written permission of
IBM Corporation. IBM Corporation grants you limited permission to make hardcopy
or other reproductions of any machine-readable documentation for your own use,
provided that each such reproduction shall carry the IBM Corporation copyright
notice. No other rights under copyright are granted without prior written
permission of IBM Corporation. The document is not intended for production and
is furnished as is without warranty of any kind. All warranties on this
document are hereby disclaimed, including the warranties of merchantability and
fitness for a particular purpose.
U.S. Government Users Restricted Rights -- Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.
Trademarks
IBM, the IBM logo, Tivoli, the Tivoli logo, AIX, Cross-Site, NetView, OS/2,
Planet Tivoli, RS/6000, Tivoli Certified, Tivoli Enterprise, Tivoli Enterprise
Console, Tivoli Ready, and TME are trademarks or registered trademarks of
International Business Machines Corporation or Tivoli Systems Inc. in the
United States, other countries, or both.
Lotus is a registered trademark of Lotus Development Corporation.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
C-bus is a trademark of Corollary, Inc. in the United States, other countries,
or both.
PC Direct is a trademark of Ziff Communications Company in the United States,
other countries, or both and is used by IBM Corporation under license.
ActionMedia, LANDesk, MMX, Pentium, and ProShare are trademarks of Intel
Corporation in the United States, other countries, or both. For a complete list
of Intel trademarks, see http://www.intel.com/sites/corporate/trademarx.htm.
SET and the SET Logo are trademarks owned by SET Secure Electronic Transaction
LLC. For further information, see http://www.setco.org/aboutmark.html.
Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc. in the United States and other countries.
Other company, product, and service names may be trademarks or service marks of
others.
Notices
References in this publication to Tivoli Systems or IBM products, programs, or
services do not imply that they will be available in all countries in which
Tivoli Systems or IBM operates. Any reference to these products, programs, or
services is not intended to imply that only Tivoli Systems or IBM products,
programs, or services can be used. Subject to valid intellectual property or
other legally protectable right of Tivoli Systems or IBM, any functionally
equivalent product, program, or service can be used instead of the referenced
product, program, or service. The evaluation and verification of operation in
conjunction with other products, except those expressly designated by Tivoli
Systems or IBM, are the responsibility of the user. Tivoli Systems or IBM may
have patents or pending patent applications covering subject matter in this
document. The furnishing of this document does not give you any license to
these patents. You can send license inquiries, in writing, to the IBM Director
of Licensing, IBM Corporation, North Castle Drive, Armonk, New York 10504-1785,
U.S.A.
About the Tivoli Field Guides
Sponsor
Tivoli Customer Support sponsors the Tivoli Field Guide program.
Authors
Those who write field guides belong to one of these three groups:
§ Tivoli Support and Services Engineers who work directly with customers
§ Tivoli Customers and Business Partners who have experience using Tivoli
software in a production environment
§ Tivoli developers, testers, and architects
Audience
The field guides are written for all customers, both new and existing. They are applicable
to external audiences including executives, project leads, technical leads, team
members, and to internal audiences as well.
§ Field Guides for technical issues are designed to address specific technical
scenarios or concepts that are often complex to implement or difficult to
understand, for example: endpoint mobility, migration, and heartbeat monitoring.
§ Field Guides for business issues are designed to address specific business
practices that have a high impact on the success or failure of an ESM project, for
example: change management, asset Management, and deployment phases.
Purposes
The Field Guide program has two major purposes:
Availability
All completed field guides are available free to registered customers and internal IBM
employees at the following Web site:
http://www.ibm.com/software/sysmgmt/products/support/Field_Guides.html
Table of Contents
INTRODUCTION.............................................................................................................1
TSM DISASTER RECOVERY CONCEPTS, CONSTRUCTS, AND METHODOLOGIES ...............2
The TSM Database....................................................................................................2
Device Configuration and Volume History ..................................................................2
Copy Storage Pools ...................................................................................................3
Expiration for Off-Site Volumes in Copy Storage Pools...............................................3
Reclamation for Off-Site Volumes in Copy Storage Pools ...........................................3
The Disaster Recovery Manager (DRM).....................................................................3
Site Specific Instructions for DRM ..............................................................................3
The Life-Cycle of a Tape in DRM ...............................................................................4
A Summary of What You Need to Recover TSM ........................................................4
The TSM Daily Schedule Including Disaster Recovery Manager Tasks.......................5
BASIC CONFIGURATION OF THE DRM MODULE ..........................................................17
Preparing the Copy Storage Pool .............................................................................17
Setting Up the Directories and the Recovery Instructions Files .................................20
Setting Up the DRM Parameters ..............................................................................21
Database Snapshot Backup .....................................................................................22
Moving DR Volumes Off-site ....................................................................................23
Creating the DR plan, volhist, devconfig, and q sys ..................................................27
Moving Empty DR Volumes On-Site.........................................................................27
BEST PRACTICE RECOMMENDATIONS AND ADDITIONAL RESOURCES ...........................30
Conclusions and Best Practices ...............................................................................30
Web Sites ................................................................................................................31
Training Overview....................................................................................................31
Redbook Overview...................................................................................................31
Introduction
IBM Tivoli Storage Manager (TSM) is a very flexible product that has a great depth of
functionality. It supports many sophisticated means of disaster recovery preparedness
including electronic vaulting of off-site DR data. Its core functionalities, however, are to
support disaster recovery by producing off-site tape copies of backed up and archived
data and to define the processes for recovering the TSM server in the event of its total
loss.
The concepts and methodologies TSM uses to achieve this functionality are quite
different than virtually all other backup applications. They were designed to provide
maximum flexibility to the TSM Administrator, which, in turn, allows the product to be
configured in many different ways. These different configurations result in a backup
solution that can be configured to meet many solution requirements in terms of amount
of data moved, ways to backup and restore files, cost of the equipment needed to
perform the backup, and disaster recovery.
To the uninitiated, however, the terminology and constructs of a simple TSM Disaster
Recovery implementation can be daunting. The purpose of this Field Guide is to provide
a short, simple explanation of the concepts and methodologies that are necessary to
implement basic TSM Disaster Recovery functionality using the TSM Disaster Recovery
Manager. I will start with a few concepts necessary for TSM disaster recovery and
continue with a basic configuration of the TSM Disaster Recovery Manager (DRM). I’ll
use the new TSM Administration Center version 5.3.2.0 to work with DRM.
This field guide is not intended to be a comprehensive tutorial on the TSM Disaster
Recovery Manager or any kind of “best practices” guide. Its intention is simply to provide
a starting point -- an answer to the question “How do I get the TSM DRM module going
as quickly as possible?” At the end of the Field Guide, I will provide further information
on the best resources for delving further into TSM functionality, planning, architecture,
features, best practices, and disaster recovery.
1
TSM Disaster Recovery Concepts, Constructs, and
Methodologies
Before an administrator can start implementing disaster recovery plans for TSM
managed data, there are several concepts and constructs that must be understood.
There are two types of database backups that are typically used for recovery; full
database backups and snapshot backups. Full database backups are typically done on a
daily basis after all copy storage pools have been updated and they are stored on-site.
Snapshot database backups are full database backups, but they do not reset the
recovery log. Snapshot database backups are typically used for off-site disaster recovery
purposes. They are made on a daily basis following production of a copy storage pool
and are sent off-site with the rest of the daily tapes.
There is, of course, a problem here. If an administrator looses the database, how can
she look in the database to find the correct tape to restore it? The answer to this
question is that TSM allows a flat text file called a volume history file to be created. This
file contains information on all the volumes used by the TSM server, including the
volumes used for database backup. The file is typically created on a daily basis after the
database backup is made, and when restoring the database, the file can be used to find
the tape that contains the appropriate database backup to restore.
Just like the volume history, TSM stores information on connected storage devices in its
database. And just like volume information, an administrator will need to use those
devices to restore the database in the event of its loss. The TSM server, thus, allows
device information to be written to a text file called a device configuration file. This file is
usually created daily and is read by the server when restoring the database from
backup.
2
Copy Storage Pools
TSM backed up and archived data is stored on storage pools which are, by definition,
collections of like media. TSM allows a storage pool to be duplicated to protect against
loss or failure of media. This storage pool backup is called a copy storage pool.
It’s important to understand how data is typically backed up to a TSM server in order to
understand how critical copy storage pools are to TSM. Administrators typically use an
“incremental forever” file backup methodology for file system backups with TSM, and
when this methodology is used, a TSM storage pool will hold only one copy of any given
file. If, for some reason (like a bad tape), the media on which the one copy is stored
were to be lost, the only way to recover the data would be to use a copy storage pool.
I strongly recommend that there be at least one copy storage pool for all data that is
stored on tape. The copy storage pool can be stored at a physical location which is
separate from the primary storage pool (i.e. outside the tape library or off-site) for
disaster recovery purposes.
3
File Intention
RECOVERY.INSTRUCTIONS.GENERAL Includes information such as administrator names,
telephone numbers, and location of passwords.
RECOVERY.INSTRUCTIONS.OFFSITE Includes information such as the offsite vault
location, courier’s name, and telephone numbers.
RECOVERY.INSTRUCTIONS.INSTALL Includes information about server installation and the
location of installation volumes.
RECOVERY.INSTRUCTIONS.DATABASE Includes information about how to recover the
database and about hardware space requirements.
RECOVERY.INSTRUCTIONS.STGPOOL Includes information on primary storage pool
recovery instructions.
ONSITE OFFSITE
COURIER
VAULT
TSM Storage Pools NOTMOUNTABLE
TSM
Server
backup db ...
ONSITERETRIEVE
VAULTRETRIEVE
DB
TSM Database
COURIER
RETRIEVE
The following table summarizes information that is necessary to recover the TSM server and its
data in the event of total loss.
4
Item Comment
TSM Server Operating Before a TSM server can be set up, the operating system and
System patches must be installed.
TSM Server Code and The TSM server code must be installed on the recovery system.
Patches Note that the exact same version and patch level should be used
on the recovery system as was used to back up the database.
Device Drivers Some libraries (usually IBM libraries) use device drivers that come
with the library and are necessary with TSM.
TSM Server Database The TSM database provides all the information necessary to find
data on tape and restore it. Without the database, data in TSM
storage pools is unusable.
Volume History The volume history is a flat file used to find the database backup.
Device Configuration The device configuration is a flat file used to access TSM devices
when the server is not running.
dsmserv.opt The TSM server configuration file sets all the options for the TSM
server. It’s useful to have this file so it does not have to be
recreated.
“q sys” output A query system command provides an overview of the TSM
system. It’s useful to have this output to see how TSM was
configured before the system went down.
Copy Storage Pools Copy storage pools contain the information from the primary pools
that have been sent off-site.
DR Plan from DRM The Disaster Recovery Plan from the DRM manager includes free
form text for disaster recovery. It also contains information and
scripts used to rebuild the TSM server.
I choose to perform all the maintenance in the plan, so I leave all the boxes checked in
the select tasks dialog and click next (Figure 1). For the purposes of using a
maintenance script, I’ll choose to do only a database snapshot backup. So I choose this
option and pick SUNCLASS in the backup server databases dialog (Figure 2).
I wish to backup my disk backup storage pool and my on-site tape pool to my copy pool
DRPOOL, which I will take off-site every day. So on the backup storage pool dialog, I
choose Add a relationship from the actions menu. I choose Backup Pool as the source
and DRPOOL as the destination. I do the same for SUNPOOL (Figure 3).
Each Day, I wish to take the new DRPOOL volumes and my snapshot volumes off-site. I
wish to send them directly to the vault. I choose this option and specify the UNTILL
FULL for advanced library options (Figure 4).
I’m going to prepare a recovery plan on the existing TSM server, so I chose this option
(Figure 5).
5
I then wish to migrate nightly backed up data from disk to tape. So I choose the
BACKUP Pool to migrate (Figure 6).
I take the defaults for expiration and run reclamation for 300 minutes or down to 50% full
for the sequential pool (Figure 7 and 8).
I want the maintenance schedule to run daily at 8:00 AM. So I make this selection and
review the summary on the final wizard screen (Figure 9 and 10).
Figure 1
6
Figure 2
7
Figure 3
8
Figure 4
9
Figure 5
10
Figure 6
11
Figure 7
12
Figure 8
13
Figure 9
14
Figure 10
As an alternative to the Maintenance script, you can create your own schedule for
maintenance on a TSM server. Many people want to implement more than the basic
schedule provided by the Maintenance Wizard. I for example, typically keep a full DB
backup on-site and a DB snapshot backup off-site. Below, I describe my typical schedule
including DR activities and include the commands for setting these activities up in the
step-by-step section of this Field Guide. Depending on the speed of your storage
devices, there can be more efficient ways to set up the schedule such as migrating disk
storage pool data directly to copy pool tape or using tape-to-tape copy.
15
Task Time Comments
and Volhist the database.
Prepare the Morning The DR plan file contains information about how the
DR Plan TSM server is configured.
Off-Site Morning The DR Plan, devconfig, volhist, and db backup are all
Movement sent off-site.
On-Site Afternoon Off-site volumes that are empty are returned and
Movement checked into the library.
Expiration Afternoon Expire old objects in the server database based on
policy.
Reclamation Afternoon Reclaim unused space that has become available on
sequential access media (tapes).
For the off-site movement, the TSM database backup and copy storage pools will be in
the form of tapes. The DR Plan, volhist, and devconfig need to be in a form that is easily
readable and transferable; they can be stored on a floppy disk or sent in an e-mail. This
should be sent off-site on a daily basis as well.
16
Basic Configuration of the DRM Module
Now that I have discussed some of the basic DRM concepts, I will go through setting up
the DRM module and also do some tape movement exercises. Note that for these
exercises, I’ll be using version 5.3.2.0 of the TSM Administration Center. The Disaster
Recovery Manager features are not available in the TSM Admin Center until this version.
Figure 11
17
Figure 12
18
Figure 13
Next, I select SUNPOOL and go to the Storage Pool actions menu. I choose Backup Storage
Pool and I then choose to backup my primary storage pool to the DRPOOL (figure 14 below).
19
Figure 14
I can do a query process and a query actlog to determine when the copy storage
pool is complete (note that this may take quite awhile depending on how many tapes are
in the primary storage pool). After the copy pool is complete, I add the backup stg
SUNPOOL DRPOOL command to the daily schedule or the maintenance plan.
19
Figure 15
21
Figure 16
22
Figure 17
I can do a query process and a query actlog to determine when the database backup is
complete. I add the backup db DEV=SUNPOOL TYPE=DBS command to the daily schedule.
23
Figure 18
I wish to move the volumes to the vault. So I choose the select all button and from the Action
Menu, I choose Move Selected Volumes. In the wizard, I choose to send them directly to the
VALUT state and check the checkbox to specify how the values are ejected (see Figure 19
below). On the advanced options dialog, I set the remove media value to UNTILFULL (see
Figure 20 below). The DR media is ejected from the library. For my Library, I get one process for
each volume. When completed, I can see the tapes stored in the vault by issuing the q drmedia
wherestate=vault command or using the View Disaster Recovery Media action item and
choosing Vault as the state and Snapshot Backup as the backup type (see Figure 21). I add the
move demedia * wherestate=mountable source=dbsnapshot remove=untillfull
tostate=vault command to the daily schedule.
24
Figure 19
25
Figure 20
26
Figure 21
27
Figure 22
As you can see from the screenshot, in my case, I have one tape that needs to be
returned on-site (this happens to be a database backup). I will use the web GUI to return
the tape from its Vault Retrieve status to Onsite Retrieve and then check it back into the
library as a scratch tape.
I select the volume and from the actions menu choose Move Volumes from the Actions
Menu. I choose ONSITERETRIVE for the destination state of the volumes.
Once they are in the library, I also want to create a command file to check them in. So I
choose view Disaster Recovery Media in the Onsite Retrieve state and select Create a
Command File to create a macro to check in the returned tape. I add a command to the
wizard for each volume by setting the command to checkin libv &vol
status=scratch and setting the file to put the commands in to
/opt/tivoli/tsm/macros/checkin (see Figure 23 below). After running
the command, I can check the scratch tape into the library using the command file I
create.
28
Figure 23
In terms of scheduling, a list of tapes to be returned needs to be generated on a daily
basis. Typically, these tapes do not actually get returned until the next day and are
checked in at that time. So I add the query drmedia to generate the list and then
move drmedia to my daily schedule. I also add the checkin to the daily schedule for
the previous day’s tapes.
29
Best Practice Recommendations and Additional
Resources
Conclusions and Best Practices
Although setting up the DRM module in TSM is quite easy, setting up the processes and
procedures used for TSM sever recovery and tape movement can be quite difficult and
time consuming.
First, it is highly recommended that step-by-step instructions be written for setting up the
DR hardware, installing the host OS, installing TSM, restoring the TSM database, and
updating the TSM storage pools. During a disaster, even an experienced TSM
administrator will be under extreme pressure to restore the TSM server as soon as
possible, and having a step-by-step guide for your particular environment can greatly
improve TSM server recovery time.
Second, it’s very important to test the DR plan and TSM server recoverability often and
in an environment that is as realistic as possible. At almost every test, you will
undoubtedly encounter hurdles that you didn’t expect. Testing allows an administrator to
develop confidence and experience. It also allows detailed plans to be verified and
modified so that they are more likely to be accurate in the event of a real disaster.
Finally, it should be understood that the success or failure of a DR plan hinges on the
availability and accurate movement of the off-site tapes. It is very typical for TSM
administrators to assign tape movement activities to operations staff who may not be
experienced or who may not have a high interest in ensuring the success of the DR
operation. Tape handling problems and errors are the largest cause of DR preparedness
issues.
A TSM administrator should take extra precautions to ensure that tapes are not lost in
the rotation. Tape movement processes usually include scripting to ensure that
electronic updates are accurate and errors are dealt with correctly. Sometimes, tapes
are scanned and tracked externally to make sure they are not lost, and these reports are
reconciled with the TSM database daily.
30
Additional Resources
Web Sites
Manuals
http://publib.boulder.ibm.com/tividd/td/tdprodlist.html
This is the home page for all Tivoli Manuals. Chose “S” for Storage Manager. The TSM
Administrator Guide and Administrator Reference for the platform of your TSM server contain
information on the DRM module of TSM and the recovery steps for the TSM server.
Training Overview
IBM Tivoli Disaster Recovery Manager 5.3
http://www-306.ibm.com/software/tivoli/education/A766698B86554B97.html
This course covers using the TSM DRM module to automate the process of doing off-site media
storage and recovering from the loss of a TSM Server. In addition to off-site media rotation,
electronic vaulting is also discussed.
Redbook Overview
IBM Redbooks are great (and one of the only) sources of information on IBM software and
hardware. Due to its maturity, there are numerous good Redbooks covering TSM. Below I have
provided the ones I found most useful for TSM DR.
31