Double-Take RecoverNow User Guide PDF
Double-Take RecoverNow User Guide PDF
Ver s i o n 4 . 0 S P 0 1
User Guide
July, 2012
Double-Take RecoverNow Version 4.0 SP01 User Guide
• DB2, IBM, i5/OS, iSeries, System i, System i5, Informix, AIX 5L, System p, System x, and System z, and WebSphere—
International Business Machines Corporation.
• HP-UX—Hewlett-Packard Company.
• Teradata—Teradata Corporation.
• Intel—Intel Corporation.
• Linux—Linus Torvalds.
• Oracle—Oracle Corporation.
• Sybase—Sybase, Inc.
All other brands and product names are trademarks or registered trademarks of their respective owners.
If you need assistance, please contact Vision Solutions’ SCP Certified CustomerCare team at:
CustomerCare
Vision Solutions, Inc.
Telephone: 1.800.337.8214 or 1.949.724.5465
Email: [email protected]
Web Site: www.visionsolutions.com/Support/Contact-CustomerCare.aspx
Contents
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
scconfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
scsetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
scrt_ra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
scrt_rc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Session restore targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Session Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Process Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Procedure Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
scrt_vfb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
sccfgd_cron_schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
sccfgd_putcfg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
sccfgchk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
sztool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Introduction
The Double-Take RecoverNow User Guide describes how to install,
configure, maintain and administer Double-Take RecoverNow (hereafter
referred to as RecoverNow), data replication software. The table below
shows the chapters in the RecoverNow User Guide.
Chapter Description
Chapter 1, “Overview of Data Use this chapter to learn about the main
Replication Concepts” on page 15 concepts related to data replication.
Chapter 3, “Using the Sizing Tool to Use this chapter to learn how to use the
Calculate LFC Size” on page 43 Sizing tool to calculate the LFC size.
Chapter 7, “Starting and Stopping” on Use the procedures in this chapter to start
page 177 up and shut down both the Production
and the recovery servers.
Chapter Description
Chapter 10, “Working with RecoverNow Use the procedures in this chapter to
Applications” on page 211 rollback application data on the
production server and restore the data to
an earlier point in time.
Chapter 11, “Working with Archived Use the procedures in this chapter to
Data” on page 227 create complete copies of the data on
archive media such as tape.
Chapter 12, “Introduction to Disaster Use this chapter to become familiar with
Recovery” on page 231 disaster recovery concepts.
Chapter 14, “CLI Commands” on Use the Command Line Interface (CLI)
page 261 commands in this chapter to work with
RecoverNow.
Overview
This chapter describes the organization and architecture of RecoverNow, a
continuous data protection system designed for immediate data recovery.
RecoverNow is a software only product for IBM servers running on IBM
AIX 5L, AIX 6.1, and AIX 7.1 operating systems.
You also have the option to configure one or many replicated server(s); this
allows for another data replica which is maintained in a cascaded fashion
from the recovery server. The hardware requirements of the replicated
server are similar to that of the recovery server. For the purposes of the
RecoverNow system architecture, the replicated server role appears exactly
as a recovery server. Functionally they are the same with the exception that
the production server’s data is never restored from a replicated server, only
from the recovery server role.
without impact to the production server. It also enables the trade off of
storage capacity for time, which is at the core of the RecoverNow system
and the source of its advanced functionality.
• The applications and associated files and volumes that you want to
protect.
• Archiving systems.
Related Topics:
RecoverNow Datatap
The AIX Logical Volume Manager (LVM) maintains the hierarchy of
logical structures that manage disk storage. RecoverNow kernel extensions,
or datataps, reside logically above the LVM layer inside the operating
system kernel. Furthermore, these datataps are logically below the file
system level and handle block level transfers. The datatap receives a buf
structure from the file system layer in the case of a file system write
operation or from the application in the case of a raw Logical Volume write.
Data is then processed and sent onto the LVM (Logical Volume Manager)
layer. For read operations from storage, data passes through untouched.
The datatap is loaded on both the production server and the recovery server.
On the production server, the datatap is responsible for splitting data write
operations. Each write results in a write to the intended protected volume as
well as to a redo log.
RecoverNow Journal
Two specific structures are used to contain the redo and undo logs in the
RecoverNow architecture. The After Image Log File Container (AILFC)
and Before Image Log File Container (BILFC) are used to hold these logs.
The entire set of logs is known as a journal to RecoverNow, and on the
recovery server, associated redo logs together with undo logs form the
journal. The journal is often illustrated as a single pool, and these logs are
block storage devices that do not interact with resident file systems or their
cache buffers.
RecoverNow Agents
RecoverNow uses the following agents:
• Restore Agent (RA)—A primary agent that runs on the recovery server
LCA Agent
Shipping logs from the production server to the recovery server is the
responsibility of the LCA. The LCA reads from the journal any redo log
information that has been closed, or sealed, and this information is then
shipped over one or more IP networks to an agent that runs on the recovery
server. Both agents bind and communicate over the same socket. Socket
port addresses can either be default addresses or they can be
programmatically selected.
ABA Agent
On the other side of the socket and running on the recovery server, the ABA
is collecting log information. The ABA receives the redo log information in
the time order it was created on the production server, and then stores this
information in recovery logs. Remember, these are block storage devices
that do not interact with resident file systems. As the ABA receives the data,
it dynamically creates optimized State Map Transactions (SMTX). The
blocks identified are then sorted in ascending device/block order. Block
ordering is a more efficient organization for applying modifications to the
replicated data, or replica, on the recovery server.
Before the modifications are applied to the replica, yet another block
storage device is written to with information that would allow the replica to
step backward in time. This storage device is called the undo log and
appears to be nothing more than a logical volume to the volume manager.
Once the undo log information is saved on disk, the redo log can be applied
to the replica to bring it up to date with the data on the production server.
AA Agent
The AA, or Archive Agent, also runs on the recovery server. It is used to
extend RecoverNow’s rollback capabilities by recording redo and undo logs
to media. The AA currently works with Tivoli Storage Manager (TSM). The
AA uses the TSM API to send archive requests to TSM. When the logs are
archived, they are always spooled in pairs. Depending on the TSM
configuration the data is stored on media. A redo and an undo log are
always together when the AA stores on media. This gives RecoverNow the
ability to restore the data to any point in time. By unwinding the data with a
course grained undo log, then applying fine grained redo log information to
the log, the state of the replicated data can be restored to any point in time.
RA Agent
Restoration is handled by the RA and runs on the recovery server. It does
not, however, run continuously like the other agents. It can
programmatically be executed from the command line or through the GUI.
The RA deals with the following types of restore operations:
Replication
RecoverNow runs automatically on a production server, creating a mirrored
copy of protected data on the recovery server. For increased availability, it is
recommended that the recovery server be a remote machine. The following
illustration shows storage data flow during RecoverNow replication:
NOTE
Data does not pass through the datatap on the recovery server.
The ABA sweeps through the log files in time order and uses the metadata
reads from the replica to calculate the amount of change required to apply
the working log file and stores this information in the undo log. The ABA
then reads from the redo log and applies the modification in block order to
the replica.
Journal Configuration
RecoverNow uses the following journals:
Production Journal
The production journal holds redo log buffers until the logs are transferred
to the recovery server. Then the logs are available to receive new
application write data. Sizing the journal properly prevents the recovery
server from falling so far behind the production server that dynamic
recovery must occur for the recovery server to catch up. If the journal is too
small, then transfers between the production server and the recovery server
are performed more frequently than is efficient. If the journal is too big,
then the recovery server may fall so far behind the production server that
dynamic recovery must occur.
Recovery Journal
The recovery journal is on the recovery server, and holds redo and undo
logs, that act as RecoverNow’s internal rollback window. If you are using
external archive media such as tape, then the size of the journal on the
recovery server is not critical to the ability to restore data. The larger the
recovery journal, the larger the internal rollback window, which implies
faster access to redo and undo logs during production restores in that
window.
The size of the recovery journal is proportional to write throughput and the
required internal rollback window.
The journal on the recovery server should be at least 256MB. Note that this
is twice the space recommended for the minimum on the production server,
because the recovery server contains both redo logs and undo logs.
By increasing the size of the logs, processing is reduced and the elimination
of common blocks in the undo logs is more efficient. Decreasing log size
results in a more up-to-date replica on the recovery server, because log
transfers occur more frequently.
When you determine the best log size, keep these conditions in mind:
• The journal should contain at least eight logs
• Maximum log size is one-half of the available RAM but not greater
than 512 MB
To calculate log size, you need an estimate of average write throughput, and
the required processing rate. For the required processing rate, if
RecoverNow processes one log every 60 seconds, the replica will be one
minute behind the production system.
Number of Logs
number of logs = (journal size) / (log size)
Even though the calculation for the number of log files appears trivial, keep
in mind that the number of log files can affect performance. If enough log
files are available on the production server, RecoverNow does not have to
rely on state maps during an outage, because it has not run out of log files to
take in data. A state map contains information about data changes for each
storage device protected by RecoverNow. It can be used to reconstruct data
changes if the underlying data is corrupted or lost. During peak usage, when
an application is writing data faster than the network can transmit, extra log
files enable the system to buffer during these peak periods without having to
rely on state maps, eliminating the risk of a restore blackout window. On the
recovery server, a sufficient number of log files allows activity to be
buffered in the event that the tape drive or library is taken offline.
RecoverNow Snapshots
Snapshots use significantly less space and are more efficient than data
mirrors. A mirror is an up-to-date copy of data for a logical volume. Two or
more complete copies can exist at the same time, although only one copy is
seen or used by an application, so mirrors require double or more the
amount of disk space than the original data.
Notice from the above figure that data is passing through the datatap on the
recovery server in the case of reads and writes to snapshot data.
RecoverNow uses a different set of device minor numbers when dealing
with snapshots, so that the datatap knows which log files to access in a
specific order. For example, when a write operation is directed at the
snapshot it is actually written to the copy-on-write (COW) log instead. If
the data has not been modified, then a read operation would come from the
snapshot. If the data has been modified, then the read would come from the
copy-on-write log. Keep in mind that the snapshot is the representation of
the application data at a specific point in time.
Related Topics:
Recovery
Generally, there are two types of recovery restorations. A production restore
is a rollback in time which takes place in the protected volumes on the
production server. The other type of restore, a virtual restore, is a rollback
in time which is executed over a read-writable virtual image of the protected
volumes which reside on the recovery server.
Production restores are useful for a database “crash” where the database
will not come up. By recovering an image of the actual production database
to some point in the past directly on the production disk itself, RecoverNow
can rollback a crashed database in minutes rather than hours or days for the
most disastrous operational situation a database can encounter.
Related Topics:
a single, unified view. VSP also provides services and portlets for
performing activities common to products.
For detailed information on working with VSP, refer to Getting Started with
Vision Solutions Portal (2.0.05.00_VSP_Getting_Started.pdf) packaged
with RecoverNow for AIX.
Use this chapter to prepare your RecoverNow system for its initial
configuration.
The production journal is the storage that contains all of the logs. A single
log is transferred to the recovery server when that log is filled. For example,
if each LFC is 64MB and there are 100 production LFCs, then the
production journal is 6400MB. When the current LFC is filled with
approximately 64MB of write I/O data (there is some additional metadata),
it will be transferred to the recovery server.
All logs in the production journal are redo logs. They contain information
that moves the disk image of the application forward through time when the
information is applied. This is called rolling forward.
Half of the logs in the recovery journal are redo logs, and half are undo logs.
Undo logs contain information that moves a disk image of the application
back through time when they are applied. This is called rolling back.
The recovery server also contains the snapshot journal. The snapshot
journal is the space on the recovery server where RecoverNow stores
copy-on-write information and write-cache data for snapshots.
The following table shows the variables that are used for estimating journal
sizes and log sizes. You need these estimates in order to configure
RecoverNow:
Concept Meaning
Peak throughput Maximum write rate of the application during a business cycle.
NOTE
Use a tool such as iostat to estimate throughput. You can also use
the Sizing tool to estimate throughput. For more information, refer
to “Using the Sizing Tool to Calculate LFC Size” on page 43.
The goal of sizing the production journal properly is to prevent the recovery
server from falling so far behind the production server that dynamic
recovery must occur for the recovery server to catch up. If the production
journal is too small, then transfers between the production server and the
recovery server are performed more frequently than is efficient.
• RecoverNow must not fall into dynamic recovery when write spikes
exceed bandwidth.
If you are not using external archive media, then the size of the recovery
journal is critical for data protection. Rollbacks cannot extend beyond the
logs that exist on the recovery server. You must estimate average throughput
and calculate recovery journal area based on the length of the desired
average restore window.
• Throughput
• Required internal rollback window
IMPORTANT
You can also use the Sizing tool to calculate write journal pool
size. For more information, refer to “Using the Sizing Tool to
Calculate LFC Size” on page 43.
When you determine the best log size, keep these conditions in mind:
NOTE
Use the following equation to ensure that the space you allocate
for LFCs coincides with the physical partition size of the VG
where the LFC LVs are allocated. This enables you to utilize all
the space in a LV. This is not a requirement, you can elect to not
utilize all the available space in a LV.
Where y
3. Using this space, calculate the LFC size and the number of production
and recovery LFCs.
If event markers are used in multiple Context IDs, each Context ID must
have a unique event marker file and scripts that calls rtmark.
The following examples show the parameters you need to use in your
application; an event marker file, and a script that calls rtmark:
#example call in your application
<Context ID> <name> <description>
# example event mark script that calls rtmark and emf_1 file
rm -f /tmp/emf_1
printf "name = ${2}\n" >/tmp/emf_1
printf "description = ${3}\n" >>/tmp/emf_1
sync
NOTE
Event markers are lost on failover and failback because the
rollback window is reset. The production server’s event_marker
file is cleared during a failback to the configured production
server. The failover server’s event_marker file is cleared during a
failover to the configured failover server.
If “rtstop” fails for a Context ID it will be marked failed and then continue
to do “rtstop” on any remaining Context IDs. After all the Context IDs are
processed the “exit” status will be set to “0”, or a value equal to the number
of Context IDs that were previously marked as failed. The reason for failure
will be recorded in the “/usr/scrt/log/rn_shutdown.out” file. A non “0” exit
will abort the AIX shutdown.
IP name dependency?
Is DNS enabled?
Is NIS exported?
Release:
• Policy domain
• Policy set
• Management class
• Backup copy
• Archive copy
Storage pools:
• Disk
• Tape
Policy domain:
• Type
• Number
• Shared?
You can use the Sizing Tool to calculate configuration values before
RecoverNow is installed. It is also useful to run the tool after
RecoverNow is installed to determine if the number of LFCs or WJ
percentage needs to be adjusted.
• “Running the Sizing Tool from the RecoverNow Sizing Tool GUI” on
page 44
NOTE
The RecoverNow Installation Wizard and the smit installation
program provide the sztool command for the command line Sizing
Tool and the sztool_gui command you can use to access the
RecoverNow Sizing Tool GUI.
The RecoverNow Sizing Tool GUI window displays. The first tab,
Introduction displays, by default.
Introduction Tab
The Introduction page describes how you use the sizing tool. For detailed
information, click Help. This button displays the URL to access the Vision
Solutions Support web site. From this site you can download documentation
that describes how you use the sizing tool. In addition, you are provided
with CustomerCare support email and phone numbers. Click Exit to exit the
RecoverNow Sizing Tool GUI.
• To select individual LVs, use the check box next to the LV, to select the
LVs that will be protected by RecoverNow.
• Click the Run Disk Discovery Again button to re-discover the LVs.
The table below describes the parameters that you can modify:
Parameter Description
Collection Interval Specifies how many times you want to collect data. The
Count default value is 24 hours.
Collection Interval Specifies how many minutes to wait between each data
Minutes collection interval. The default value is 60 minutes.
Parameter Description
Lfc Size (MB) Specifies the size for the RecoverNow LFC. The default value
is 16 MB.
Replication Outage Specifies the hours that the production server can not send LFCs
Hours to the replicated server. When this occurs, the LFCs will begin
to backup on the Production Sever, until there are no more LFCs
available. Once RecoverNow runs out of LFCs, it marks the
regions which require synchronization in the state map as dirty.
These dirty regions will automatically be synchronized when
the LFCs become available. CDP functionality will resume as
soon as the resynchronization completes. More LFCs are
required as outage time increases. The default value is 8 hours.
CDP Window Specifies how many hours the data can go back in time to
Hours restore data from the recovery server to the production server.
The window size determines the number of LFCs on the
recovery server. The default value is 8 hours.
Snapshot Duration Specifies the number of hours you want to keep a snapshot
Hours valid. As the snapshot duration hours increase you need to
increase the Write Journals disk space. The default value is 8
hours.
• The Run button becomes active, when you select an LV(s) and specify
values for the LV parameters.
NOTE
Before you click the RUN button, start your application on the
selected LVs, and ensure that your application has a heavy load, so
the tool collects enough data to reflect the activity for a worst case
scenario.
NOTE
You can view /tmp/sztool/sztool.log on the production server,
once the back end sztool job (scripts) is running,
• Detailed logs from latest run—This section shows a scrollable text area
containing detailed statistics for the sztool script sizing log file. The log
file name is /tmp/sztool/sztool.log Click the Display Log button to
display the results derived from the original log file. The columns show:
– Logical Volume
– IO Count
– Kb read
– Kb written
– Kbps (kilobit per second)
• Try different parameters to get results from the already collected data.
You can edit the parameters shown below to see different log file
results.
– Lfc Size (MB) Low. Refer to “Lfc Size (MB)” on page 47.
– CDP Window Hours. Refer to “CDP Window Hours” on page 47.
– Replication Outage Hours. Refer to “Replication Outage Hours” on
page 47.
– Snapshot Duration Hours. Refer to “Snapshot Duration Hours” on
page 47.
NOTE
This output will not overwrite the /tmp/sztool/sztool.log file
contents.
3. Click the Display Log button if you wish to re-display the results
derived from the original log file.
NOTE
A pdf version of the Chart is automatically saved in
/tmp/sztool/DiskWriteChart.pdf.
NOTE
Before you run the Sizing tool you must have performed the
installation steps described in “Installing the Sizing Tool” on
page 44.
configuration file is for user to specify LV names and other run time
parameters.
2. Review the diskinfo file and determine which LVs RecoverNow should
protect.
The table below describes the configuration file parameters that you can
modify:
Parameter Description
Parameter Description
4. Start your business application on the selected LVs. The load of the
businesses application should be as close to the worst case scenario to
ensure a meaningful result.
sztool
script
Command Description
Options
sztool If issued for the very first time, the working directory, diskinfo
file and sztool.cfg file are generated. You should review the
diskinfo file and then modify sztool.cfg, accordingly. You can
then re-run sztool.
sztool -l When the log file is created, this command prints out the
calculation results for different LFC sizes based on the existing
log file. For example, sztool -l32 prints out the results when the
LFC size is at 32M. sztool -l16 -l512, prints out all the
calculation results from 32MB to 512MB. You cannot have
spaces between -l and the LFC size number. Only screen output,
there is not any delay or sleep.
sztool -x Executes the sztool and prints the file name and line number of
the statement for debugging purposes. For debugging, use
sztool_main -x to view screen output.
LAN
Production
Recovery
Server
Server
Datatap
LAN
WAN
Production Recovery
Server Server Replicated Server
Production WAN
Server Remote Recovery
Server
NOTE
The recovery server becomes the failover production server in the
bi-directional configurations. For more information about RecoverNow
failover operations, refer to Chapter 12 “Introduction to Disaster
Recovery” on page 231.
LAN
Production
Recovery
Server
Server
The LCA/ABA agents are used in a bi-directional manner. These agents need to
be configured before the failover/failback operations are used. For more
information on RecoverNow failover/failback operations, refer to Chapter 12
“Introduction to Disaster Recovery” on page 231.
LAN
WAN
Production Local Recovery
Server Server
Replicated
Server
Overview
This chapter describes RecoverNow, RecoverNow Portal Application and
Vision Solutions Portal (VSP) installation procedures.
Before you begin, ensure that you review support information, system
requirements, and decide on your preferred configuration. One you have
installed the RecoverNow components you can work with VSP. Refer to
“Logging in to Vision Solutions Portal” on page 127.
NOTE
RecoverNow supports Internet Protocol version 6 (IPv6).
NOTE
AIX 5L version 5.3 TL5 64-bit kernel requires APAR IY92292. It
is currently available at the IBM Web site.
IMPORTANT
The /home/usr/scrt/ directory is created when the scrt user is
created in AIX. Do not copy any files to this directory because
RecoverNow deletes this directory during the de-installation
process.
• The required disk space to install VSP and the RecoverNow portal
application is 280 MB in /opt.
NOTE
VSP supports Internet Protocol version 6 (IPv6).
• Base RecoverNow
You must install the Base RecoverNow software on each AIX cluster
node directly.
• Sizing Tool
You can use the Sizing Tool to calculate configuration values before
RecoverNow is installed. However, it is also useful to run the tool after
RecoverNow is installed to determine if the number of LFCs or WJ
percentage needs to be adjusted. Refer to “Using the Sizing Tool to
Calculate LFC Size” on page 43.
• Documentation
• Either ssh and scp or rexec and rcp must be allowed. If ssh fails then
rexec and rcp is used.
• To use rcp the ~root/.rhosts file must have the local host and user name.
• Check /etc/services to find the ports used by exec and shell and check
that those ports are not blocked.
• This requires that the “echo” port is not blocked. This is usually defined
as port 7 in /etc/services
NOTE
If you exit the installation wizard you can view the detailed errors
in the RecoverNow_4.n.n.n_Install.log
NOTE
If you are installing GeoCluster and RecoverNow, refer to the
Double-Take Availability Integration Guide.
1. Download the RecoverNow and GeoCluster License Key Wizard from the
Vision Extranet (Click Customer/Partner Login at www.visionsolutions.com)
under the License Keys menu option.
4. On the Specify Node screen, specify a node where you want to apply license
keys. The node can be the name or IP address of the local node or a node in
the network.
5. Click Next.
6. On the Node Login screen, to obtain new license keys for RecoverNow, log in
to the node as root (AIX) or as an administrator (Windows).
7. Click Next.
9. On the Confirm Node Selection screen, you can add or remove nodes that will
be used by the wizard to install or upgrade. To add a node:
10. On the License Key Locations screen, specify or browse to the location
of the license key file for RecoverNow obtained from Vision Solutions
for each node.
12. On the License Key Check screen, click Next to display the Summary
screen.
AIX Installation
This section describes:
3. Click Next.
The Terms And Conditions screen displays. Read and accept the terms
of the License Agreement.
4. Click Next.
5. Select either:
7. Click Next.
IMPORTANT
The following steps detail how to create a new installation
9. Click Next.
10. Enter the node name—Points the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
11. Click Test Connection—A dialog displays indicating whether or not the
connection to the node was successful.
NOTE
You must log into the node as root.
18. Enter the node name or IP address, and click Add to have RecoverNow
software installed on this node.
23. Select the node for which the documentation will be installed.
a. Click Next, if you have a license key files from Vision Solutions.
Proceed to step 26 on page 74.
b. Click Continue Without License Keys.
The Continue Without License Keys screen displays.
without license keys but cannot use it until valid license keys are
applied. Proceed to step 26 on page 74.
• Select Contact Vision Solutions to get new license keys and
click Next, the Contact Vision Solutions screen displays. Use one
of the following methods to procure license keys.
– On the Internet—Log in to your account at:
VisionSolutions.com/SupportCentral
– Email—Copy and paste the information on the Contact Vision
Solutions screen into your email message. When you contact
Customer Accounting to request a license, you will be asked
to provide the machine ID (uname -m) of your servers along
with the hostname, and your OS. Email your information to
Customer Accounting at [email protected] and
request a license key. A license file will be generated and
emailed to you.
– Telephone—(800) 337-8214
NOTE
Once you procure the license keys from Vision Solutions, click
Next on the License Key Location to continue the installation.
Proceed to step 26 on page 74
NOTE
If you are reinstalling RecoverNow, the Product Shutdown
Required screen displays. You must shut down RecoverNow
before you reinstall it.
The License Key Check screen briefly displays (not shown) to validate
your license keys, then the Installing Double-Take RecoverNow screen
displays showing the results of the installation process for the
RecoverNow software.
3. Click Next.
4. Enter the node name—Points the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
6. Click Next.
NOTE
You must log into the node as root.
9. Click Next.
11. Enter the node name or IP address, and click Add to have RecoverNow
software installed on this node.
16. After you have installed the RecoverNow Portal Application, and the
Vision Solutions Portal (VSP) you can log into VSP. You can select one
of the highlighted nodes to launch VSP and log into VSP. See “Logging
in to Vision Solutions Portal” on page 127.
User Roles
The installation process creates the scrt group in /etc/group, identifying the
category of users allowed to access the portal application.
IMPORTANT
The root user is always allowed access to the portal application.
Reinstall RecoverNow
Before you reinstall RecoverNow:
• You must stop your application and RecoverNow on the node(s) where
RecoverNow is being reinstalled.
3. Click Next.
4. Enter the node name—Points the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
6. Click Next.
NOTE
You must log into the node as root.
9. Click Next.
13. Specify the password and click Log In, then click Next
The Retrieving Installation Information screen displays. After the
installation information is successfully retrieved, the RecoverNow
Options screen displays.
16. After you specify license keys and they are validated, click Next.
17. You must manually ensure that RecoverNow is shutdown, the wizard
does not perform this task. Click Next.
Once the installation wizard has verified that RecoverNow has been
shutdown the Shutdown Verification Complete screen displays.
20. Click Next to reinstall the portal application and the Vision Solutions
portal.
21. Select Yes, to shut down the Vision Solutions Portal and install the
Vision Solutions Portal and the portal application.
NOTE
You can also decide to skip the Vision Solutions Portal and the
portal application reinstall.
When the Vision Solutions Portal and the portal application reinstall
completes, a screen briefly appears stating that the reinstall was a success.
Then the Installation Complete screen displays.
22. After you have reinstalled RecoverNow, you can log into VSP. You can
select one of the highlighted nodes to launch VSP and log into VSP. See
“Logging in to Vision Solutions Portal” on page 127.
Upgrade RecoverNow
Before upgrading RecoverNow to the current version:
• You must stop your application and RecoverNow on the node(s) being
upgraded.
Use one the following methods to list all installed fixes (epkg files).
a. From the command line: emgr -l
b. From SMIT: use fast path by entering emgr
Use one the following methods to remove installed RecoverNow fixes.
3. Click Next.
4. Enter the node name—Points the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
NOTE
You must log into the node as root.
9. Click Next.
11. Enter the Node name or IP address and click Add. The node is added to
the list.
15. Select the node for which the documentation will be installed.
17. After you specify license keys and they are validated, click Next.
18. You must manually ensure that RecoverNow is shutdown, the wizard
does not perform this task. Click Next.
Once the installation wizard has verified that RecoverNow has been
shut down, the Shutdown Verification Complete screen displays.
If you have efixes, the Fix Removal Required screen displays. You must
manually remove the efixes, the wizard does not perform this task.
Refer to page 92 for information on how to remove efixes.
After you have upgraded the RecoverNow Portal Application, you can log
into VSP. See “Logging in to Vision Solutions Portal” on page 127.
a. Mount the CD
b. Run bin/AIX/machine_id.bin
2. Send the following information in email to Customer Accounting at:
[email protected]
The email should contain the following:
• Company Name
• Product(s) for which you are requesting a license
• Machine ID
• Operating system
• Operating system version
• Node name (hostname)
2. Extract and copy the appropriate directory and files from your PC to the
AIX server. Depending upon your OS and bit-level, choose one of the
following directory paths for a 32 or 64 bit kernel:
• esFiles/52/32/
• esFiles/52/64/
• esFiles/53/32/
• esFiles/53/64/
• esFiles/61/64/
• esFiles /71/64
For example cd 53
• esFiles/52/32/
• esFiles/52/64/
• esFiles/53/32/
• esFiles/53/64/
• esFiles/61/64/
• esFiles /71/64
11. Select the current directory as the INPUT device/directory and enter a
dot (.).
User Roles
The installation process creates the scrt group in /etc/group, identifying the
category of users allowed to access the portal application.
IMPORTANT
The root user is always allowed access to the portal application.
Log Files
There are two types of log files:
If you receive the License validation failed message ensure that the
information specified in license.inf is correct and that the output of the
hostname command matches the hostname in license.inf. If the problem
persists, email or contact Customer Support. Refer to the readme.txt file for
contact information.
License Expiration
When the license expires for RecoverNow, the application on the
production server is not affected. However, data replication to the recovery
server will be stopped, and you will no longer be able to use the Continuous
Data Protection functionality. You can check the license file for the data
replication component for information about the expiration of the license.
The file is named: /usr/scrt/run/node_license.properties.
NOTE
RecoverNow verifies the contents of the license file. Do not alter
this file.
IMPORTANT
You must shutdown RecoverNow and unload the drivers before
you run the uninstall program. Use the following command:
rtstop -FC <Context ID>
4. Click Next.
5. Enter the node name—Points the installer to the target node where you
want to uninstall RecoverNow.
6. Click Next.
7. Enter the node name or IP address, and click Add to have RecoverNow
software uninstalled on this node.
8. Click Next.
11. Select the components you want to uninstall from each node and click
Next.
12. You must manually ensure that RecoverNow is shutdown, the wizard
does not perform this task. Click Next.
Once the uninstall wizard has verified that RecoverNow has been
shutdown the Shutdown Verification Complete screen displays.
If you have efixes, the Fix Removal Required screen displays. You must
manually remove the efixes, the wizard does not perform this task.
IMPORTANT
You must shutdown RecoverNow and unload the drivers before
you run the uninstall program. Use the following command:
rtstop -FC <Context ID>
1. Enter smit
NOTE
The RecoverNow software can only be installed on an AIX
machine. Only VSP and the portal application can be installed on
a Windows machine.
The Installation Wizard runs and displays the Vision Solutions Portal
Welcome screen.
8. Click Next
The Terms And Conditions screen displays. Read and accept the terms
of the License Agreement.
9. Click Next.
10. Once the VSP and portal application status is verified, the Installation
Options screen displays.
11. You can decide to start the portal server when Windows starts and after
the installation completes.
NOTE
If the portal application is currently active, the Shut Down Vision
Solutions Portal screen displays. The installation wizard will
enable you to shut down VSP.
15. After you have installed the RecoverNow portal application, you can
log into VSP. You can select the highlighted machine-name address to
launch VSP and log into VSP. See “Logging in to Vision Solutions
Portal” on page 127.
2. Click Uninstall.exe.
The Uninstall Wizard for Vision Solution Portal runs and displays the
Vision Solutions Portal Welcome screen.
3. Click Next.
4. Once the VSP and portal application status is verified, the Options
screen displays.
6. When the shut down of VSP completes, the Ready to Uninstall screen
displays.
7. Click Next.
Post-Installation Tasks
These sections contain the post-installation tasks:
http://server:port
The server is the IP address or host name for the node on which the
VSP server is installed and active. The default port number is 8410. For
example, if the VSP server was installed on node vsp-53, you would
copy the following url into the address field in your browser window:
http://vsp-53:8410
2. The portal appears showing the Log In page. Log in using your user ID
and password. Depending on the platform, the user ID and password
may be case-sensitive.
NOTE
If you have a problem logging into VSP refer to the
Troubleshooting chapter in Getting Started with Vision Solutions
Portal (2.0.00.07_VSP_Getting_Started.pdf) packaged
with RecoverNow for AIX.
After you have logged in, the portal opens to the Home page. A default
portal connection exists for the node on which you logged in.
For detailed information on working with VSP, refer to Getting Started with
Vision Solutions Portal (2.0.00.07_VSP_Getting_Started.pdf)
packaged with RecoverNow for AIX.
2. Note the entry that precedes it. The identifier for this entry is an
argument in the mkitab command. The mkitab command inserts the
RecoverNow boot command after the identifier into the /etc/inittab file.
This causes RecoverNow to start automatically before the protected
application during a reboot.
Related Topics
After you install RecoverNow, VSP and the portal application, you can
use the RecoverNow Replication Group Wizard or the Command line to
configure new replication groups and change, rename, and delete existing
ones.
2. Click Configuration
3. Click New.
This starts the Replication Group Configuration wizard and the New
Replication Group Servers panel displays.
Field Description
Servers—Section for specifying the host name or IP address for the servers in
this replication group.
Production Select from the list of portal connections that are associated
with the instance. The host name from the portal connection
is used for the server. This ensures the newly configured
replication group will appear in the instance.
Recovery host Specify a host name or IP address for the server in the
name or IP recovery role.
address
Replicated host Specify a host name or IP address for the server in the
name or IP recovery role.
address
4. Click Next.
The New Replication Group Servers panel displays. Use this panel to
log into the failover server, specified in the previous panel.If you have
not already logged in to all of the nodes, this panel displays.
5. Specify the username and password and click Log In. Log in to each
server to retrieve information.
NOTE
The user must be either root or a user that is in the scrt group.
Passwords cannot be blank
The New Replication Group login panel contains the following fields:
Field Description
Log in Status Displays the list of servers in the replication group and their
login status.
Server Displays the name of the server being logged in to. This is
the host name from the portal connection if a portal
connection was used to specify the server.
User Specify the user ID to log in with. This defaults to the user
ID from the portal connection if a portal connection was
used to specify the server.
7. Click Next.
The New Replication Group Names panel contains the following fields:
Field Description
NOTE
Only displays when the Failover server field on
the Servers panel has a value other than Do Not
Failover.
8. Click Next.
9. Click Add.
10. The Add Logical Volumes dialog displays. For detailed information
refer to the RecoverNow online help.
Column Description
Logical Volume Displays the logical volumes configured for this replication
group.
Volume Group - Displays the name of the volume group on the production
Production server for the logical volume being protected.
Volume Group - Displays the name of the volume group on the recovery
Recovery server where the replica for the logical volume being
protected is located. When there is a replicated server, the
Volume Group - Recovery and Volume Group - Replicated
are set to the Default Volume Group and cannot be changed.
Column Description
Volume Group - Displays the name of the volume group on the replicated
Replicated server where the replica for the logical volume being
protected is located. Displayed only if the replication group
has a replicated server configured. When there is a
replicated server, the Volume Group - Recovery and Volume
Group - Replicated are set to the Default Volume Group and
cannot be changed.
Size (GB) Displays the size in gigabytes (GB) of the logical volume.
Type Displays the type of logical volume. For example, raw, jfs,
jfs2, and jfs2log.
Mount Point Displays the mount point for the logical volume.Typically
around 20 characters but can be as long as 2048. If length
exceeds 76, the text is truncated with an ellipsis in the
middle
12. Click Next on the New Replication Group Logical Volumes panel.
The New Replication Group Replication IP Addresses panel displays.
Use this panel to specify IP labels or addresses that will be used
specifically for replication. By default, replication uses the IP addresses
of the servers. There are two options:
NOTE
Production and Recovery server sections are always displayed.
The Replicated server section is only displayed if a replicated
server is configured.
Field Description
Replication IP Indicates if the user wants to use the server IP address for
Addresses replication or specify alternates. Possible values:
• Use server IP addresses for replication
• Use specified IP addresses for replication
Field Description
Recovery Server Identifies the name of the server in the recovery role.
- Host name
Recovery Server Identifies the resolved IP address from the host name.
- IP address Possible values are any valid IPv4 and IPv6 addresses.
The New Replication Group Containers panel displays. Use this panel
to configure how data is moved between servers. Containers are used by
internal processes and replication to move the changed data between
servers. A larger total container size provides a larger rollback window.
Smaller sized containers may replicate more frequently. Specify the
quantity and size of the containers, the default volume group where the
containers are located, and the number of logical volumes to use to
balance IO and improve replication performance.
Field Description
Number of This value is the same as the recovery server and cannot be
containers on changed. Updating the recovery server value also updates
replicated this value. Only displayed if a replicated server is
configured.
Size of each Specify the size of each container in MB. Since the size
container must match on all configured servers, the recovery and
replicated values are display only. Possible values are 2, 4,
8, 16, 32, 64, 128, 256, and 512. Default is 16.
Total size Displays the total space required for the containers on each
server.
Default volume Specify the default volume group for each server.
group
Use alternate Indicates if the you want to specify volume groups and
volume groups physical volumes that will be used for the containers used
or physical for replication. If checked, the Replication Containers panel
volumes for is displayed. If not checked, the Replication Containers
replication panel is skipped. This box, is unchecked, by default.
containers
NOTE
The Production and Recovery server sections are always
displayed. The Replicated server section is only displayed if a
replicated server is configured.
Use the New Replication Group Replication Containers panel to select the
volume groups and physical volumes where you want to locate the
containers used specifically for replication
Field Description
Total container Displays the total space (in MB) required for the containers
size on the server.
Volume Group Displays the list of volume groups on the specified server.
Add Use this button to adds the volume group to the list and
defaults the physical volume to Any.
Field Description
Remove Removes the volume group from the list. This action is
available after you add a volume group.
The New Replication Group Container Options panel displays. Use this
panel to specify if you want to use compression during replication. Specify
if you also want to send partially filled containers after a specified amount
of time. Otherwise, containers are sent when they become full.
Field Description
Field Description
The New Replication Group Snapshot Buffer panel contains the following
fields:
Field Description
Snapshot Indicates how much space to reserve for the snapshot buffer.
Buffers - Size The value is a percent of the size of the logical volumes that
have been selected to protect. Valid values are integers from
1 to 100. Default is 10.
Warning Indicates how full the snapshot buffer must be before you
threshold are warned that it is filling up.Valid values are integers from
1 to 100. Default is 75.
The New Replication Group Tivoli Storage Manager panel displays. The
Tivoli Storage Manager (TSM) can archive containers which allows you to
rollback or take snapshots farther back in time. Full backups of the server
where the TSM client is running can also be performed.
The New Replication Group Tivoli Storage Manager panel contains the
following fields:
Field Description
TSM client - Displays the name of the server where the TSM client is
Server running. This is always the recovery server. Enabled only
when Enable Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the user ID for TSM to use to log into the server
User ID where the TSM client is running. Enabled only when Enable
Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the password for TSM to use to log into the server
Password where the TSM client is running. Enabled only when Enable
Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the location of the TSM options file on the TSM
Options file client server. The default location is
/usr/tivoli/tsm/client/ba/bin/dsm.opt. You can
specify any valid path and file name. Enabled only when Enable
Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the domain for TSM to use. Enabled only when
Domain Enable Tivoli Storage Manager (TSM) is checked.
TSM server Specify the host name or IP address of the server where the
TSM server software is running. Valid values are any valid
IPv4 and IPv6 addresses or host name. Enabled only when
Enable Tivoli Storage Manager (TSM) is checked.
The content of this panel is the same as the Configuration Summary section
in the Replication Group Configuration window. Refer to the RecoverNow
online help for additional information for the Replication Group
Configuration window. Refer to “Replication Group Configuration
Window” on page 169.
IMPORTANT
When you initially configure a replication group the Finish button
will be enabled, so that you can save the configuration to the
servers and initialize the configuration.
When the new configuration is saved and validated, you can view the
progress of the configuration initialization in the Configuration
Initialization Progress section in the Replication Group Configuration
window. As each step is successfully completed, a checkmark appears next
to the step. When you create a new configuration, RecoverNow runs
commands for each step. The table below describes steps and commands
that are run when you create a new configuration.
Notes:
Related topics
• You cannot change the replication group name or context ID with the
Change Replication Group wizard.
2. Click Configuration.
3. Click Change from the Actions dropdown. There are two possible
results:
4. Click Next.
The Change Replication Group Servers panel displays. Use this panel to
log into the failover server, specified in the previous panel. This panel
displays if you are not logged in.
5. Specify the username and password and click Log In. Log in to each
server to retrieve information When you run commands, context IDs are
used to identify the replication group. The context IDs specified have
been defaulted to unique IDs on the servers in this replication group.
6. Click Next.
7. Click Next.
8. Click Next.
Use the Remove Logical Volumes dialog, shown below, to remove the
selected logical volumes from the replication group. These logical
volumes will no longer be protected. For detailed information refer to
the RecoverNow online help.
IMPORTANT
If you make configuration changes on the Container panel you
could lose CDP. For example, if you change the Size of each
container field, you will lose CDP. This will be reflected in the
Change Replication Group Summary panel displays, as shown
page 161.
IMPORTANT
When you decide to make configuration changes to the replication
group, they can only be saved if the replication group is stopped.
Use the Replication Group portlet on the Replication page to stop
the replication group.
16. To save and initialize this configuration on the specified servers, click
Finish. To cancel, click Cancel. To view a summary of the changes you
made, refer to “Replication Group Configuration Window” on
page 169. The content of this panel is the same as the Configuration
Summary section in the Replication Group Configuration window.
Refer to the RecoverNow online help for additional information for the
Replication Group Configuration window. Refer to “Replication Group
Configuration Window” on page 169.
When the configuration is saved and validated, you can view the progress of
the configuration initialization in the Configuration Initialization Progress
section in the Replication Group Configuration window. As each step is
successfully completed, a checkmark appears next to the step. When you
change an existing configuration, RecoverNow runs commands for each
step.
The step and command sequence can change depending upon the
configuration changes that you make. The table below describes steps and
commands that are run when you change a configuration:
NOTE
The steps that are run depend on what is changed in the
configuration.
Related Topic:
2. Click Configuration.
3. Click Rename from the Actions dropdown. There are two possible
results:
• Specify the new replication group name, and press OK. To view a
summary of the changes you made, refer to “Replication Group
Configuration Window” on page 169.
2. Click Configuration.
3. Click Delete from the Actions dropdown. There are two possible
results:
Initialize a Context
Use the command line to initialize a context on the production, recovery,
and replication servers. Execute the following command on each server.
scsetup -C <Context ID> -M
When changing either of the scconfig options (-a, -b), the command returns
output displaying the values for both options. If the frequency to check
value is changed to zero, the command output will also display “Send
Partial Container Automatically is not active”.
Because RecoverNow manages Logical Volumes, not all the AIX LVM
commands are supported. The table below shows the LVM commands
supported when the RecoverNow drivers are loaded.
chlv OK
chlvcopy OK
chvg OK
defragfs OK
exportvg OK
extendlv OK
extendvg OK
importvg OK
lslv OK
lsvg OK
mirscan OK
mklv OK
mkvg OK
readlvcopy OK
redefinevg OK
reducevg OK
rmlvcopy OK
splitlvcopy OK
splitvg OK
synclvodm OK
unmirrorvg OK
varyonvg OK
The default port assignments shown are not added to the/etc/services file
when RecoverNow is installed.
sc<Context ID>aa_channel 5747/tcp # Archive Agent
sc<Context ID>ra_channel 5748/tcp # Restore Agent (scrt_rs)
sc<Context ID>ca_channel 5749/tcp # Restore Client Agent
sc<Context ID>aba_dchannel 5750/tcp # Assured Backup Agent
sc<Context ID>lca_dchannel 5751/tcp # Log Control Agent
sc<Context ID>aa_achannel 5752/tcp # Archive Agent
sc<Context ID>aba_channel 5753/tcp # Assured Backup Agent
sc<Context ID>lca_channel 5754/tcp # Log Control Agent
The default port assignment shown below for the Configuration Daemon is
added to the/etc/services file when RecoverNow is installed.
scconfigd 7835/tcp
• Change the port number for the scconfigd entry in the /etc/services file
on all servers to an unused port number.
2. tail –f var/log/EchoStream/es_syslog.out
To start RecoverNow:
4. Click OK, to start the replication group for the selected servers.
The Start Replication Group dialog remains displayed until the action
completes successfully.
NOTE
Applications must be stopped before stopping replication. File
systems are unmounted and data is no longer protected when
replication is stopped.
To stop RecoverNow:
4. Click OK, to stop the replication group for the selected servers.
The Stop Replication Group dialog remains displayed until the action
completes successfully.
NOTE
Steps 1 and 2 are done for the first start after RecoverNow is
configured.
1. Stop any applications that are using the RecoverNow PVS (Production
Volume Set) LVs (Logical Volumes).
2. Make sure that any filesystems associated with the PVS LVs are
unmounted. Use the AIX “umount” command.
rtumnt -C <Context ID>
5. Start RecoverNow.
rtstart -C <Context ID>
This displays:
# rtstart -C 34
Loading Double-Take RecoverNow Recovery Server Drivers
Starting scrt_aba
Overview
RecoverNow enables you to restore a complete copy of the data on the
production server to any time in the past. You can quickly restore a
database that has crashed and rollback the data to a point before a logical
corruption occurred.
RecoverNow lets you create and use snapshots for production restores
and snapshot-based backups to media such as tape. You can also create a
snapshot when you want to use a copy of the data on the recovery server.
Having a snapshot—a read or write copy of the data—on the recovery
server enables you to investigate and use the data without affecting the
operation of the production server. For example, you can:
• Test whether the snapshot is the correct one to use for rolling back the
data on the production server.
• Navigate to the Snapshot Details portlet, and select Create from the
dropdown.
3. Date and Time—Specify the date and time, within the rollback window,
from where the snapshot will be created. The default is the most recent
date in the rollback window. It is only displayed when Point in Time is
selected.
6. Click OK.
NOTE
The DATEMSK environment variable must be set to the full-path
of a file that contains the date format template.
The following shows two examples based on the different date formats.
where:
• %m–Month
• %d–Day
• %y–Year
• %H–Hour
• %M–Minute
• %S–Second
IMPORTANT
The date format must be specified exactly as it is in the
DATEMSK environment variable. For example, if you do not
specify seconds (%S) in DATEMSK, you cannot specify it in the
command.
To create a snapshot:
1. On the recovery server, make sure all snapshot filesystems are
unmounted before trying to release the snapshot.
rtumnt -C <Context ID>
2. Make sure that a snapshot does not already exist on the recovery server.
scrt_ra -C <Context ID> -W
If you enter 05/15/09 09:33:40 as the date and 34 for the Context ID,
the output is similar to the following:
scrt_ra -C34 -D "05/15/09 09:33:40"
You have requested a virtual incremental LFC restore to
time (1159536820) Fri May 15 09:33:40 2009
c(ontinue) or a(bort)? c
Making SNAP /dev/rsnc1lif_bk_1, 66.306
Making SNAP /dev/rsnc1lif_bk_2, 66.310
Making SNAP /dev/rsnc1lif_bk_3, 66.314
Making SNAP /dev/rsnc1dbmf_bk_1, 66.318
Making SNAP /dev/rsntestlv, 66.450
After you install RecoverNow, VSP and the portal application, you can use
the RecoverNow Replication Group Wizard or the command line to
configure new replication groups and change, rename, and delete existing
ones. Refer to “Configuring Replication Groups” on page 131.
• On the recovery server, the replica LV (“pt” LV) “MAX LPs:” size is
not exceeded.
NOTE
To manually extend the replica (“pt” LV), use the
extend_replica_lv command to force the expansion of a Replica
LV (Logical Volume):
• On the recovery server, the write journal associated with the file system
is not extended.
The filesystem can only be extended within the limits of the state map size.
The default state map supports a filesystem size of 1 TB
The default state map size is 33280, which allows a filesystem to expand to
1 TB.
The state map limit is calculated as follows. The default region size is 4
MB.
NOTE
If you cannot unmount filesystems, specify the fuser -c
command to locate the processes that are holding the filesystem(s)
open.
NOTE
If you cannot unmount filesystems, specify the fuser -c
command to locate the processes that are holding the filesystem(s)
open.
5. On the production and recovery servers, use rtattr to change the state
map size. The new size must be a multiple of 512.
rtattr -C <Context ID> -o smb<LV> -a Size -v <new size>
10. On the production and recovery servers, use rtdr to create a failover
context if one was configured.
rtdr -C <Primary Context ID> -F <Failover Context ID>
setup
11. On the production server, use scconfig to wipe the state maps clean
(0.000% dirty).
scconfig -WC <Context ID>
13. On the production server, use “smit chfs” command to extend the
filesystem.
4. Stop RecoverNow.
/usr/scrt/bin/rtstop -C <Context ID> -Fk
NOTE
Any snapshot journal can be increased in size since all snapshot
volumes are available for use during snapshot creation/use.
7. Use the following command to find the snapshot journal volume sizes:
/usr/scrt/bin/rtattr -C <Context ID> -t
SCRT/containers/WJournal -a Size
8. Select one of the snapshot journals and increase the size of the volume.
/usr/scrt/bin/rtattr -C <Context ID> -o <ObjectName>
-a Size -v <new size value in bytes>
10. Remove the existing snapshot journal volume that you want to increase
rmlv -f <ObjectAttributeValue from step 9 on page 197
minus the /dev/r prefix>
For example, if there are 4 logical volumes /lvfs1, /lvfs2, /lvfs3, /lvfs4 that
use the same jfslog /dev/fsloglv00 and RecoverNow only protects /lvfs1,
You need to need have a:
NOTE
When using inline logs with jfs2 filesystems, there is no jfslog
isolation requirement.
1. If a jfslog exists in the volume group that is not currently part of the
PVS, you can assign that jfslog to the filesystem that is being removed
from the PVS.
umount /jfsold
where nonrtjfslog is a jfslog that exists in the volume group but is not
part the PVS.
where <newjfslog> is the name of the jfslog that you are creating for the
non-protected filesystem to use
errnotify:
en_name = "SCRT_LFC_READ_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
errnotify:
en_name = "SCRT_LFC_WRITE_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
errnotify:
en_name = "SCRT_NETWORK_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
errnotify:
en_name = "SCRT_ABORT_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
ODM Commands
The errnotify stanzas are added to the errnotify ODM using the
following command.
odmadd /tmp/ern_1
• Mark state map bitmaps dirty for all devices or specified devices while
RecoverNow is active.
Where
-C is the context
-B marks statemap bitmaps dirty for all devices or specified devices while
RecoverNow is active and performs a full synchronization of the Production
Volume Set or one LV in the Production Volume Set.
NOTE
You can also use the -L option with the scconfig command for
specific logical volumes to mark the state map zero percent dirty.
3. Save the protected data on the production server to tape or disk. You
must save at the LV level in block sequence.
– Save to tape:
dd if=/dev/db2 of=/dev/rmt0 bs=1024
– Save to disk:
dd if=/dev/db2 of=/dev/db2bu bs=16m
6. Restore the data from tape or disk to the Replica on the recovery server.
NOTE
The restore must be done to the pt LVs.
7. Start RecoverNow on the recovery server. The changes made after the
save to tape or disk synchronize to the recovery server.
Overview
RecoverNow supports Live Partition Mobility for partitions running AIX
5300-07 or later or AIX 6.1 or later or AIX 7.1 or later on POWER6 or
POWER7 technology-based systems. Live Partition Mobility allows a
partition that is replicating with RecoverNow to be migrated to another
system without interrupting replication.
IMPORTANT
The es_migrate script is registered at installation time if the
partition is capable of migration. In which case you do not have to
register the es_migrate script. You should initially verify if it is
registered, if not, proceed to register the es_migrate script.
Migrating a Partition
Each time you migrate a partition, RecoverNow is licensed to run on the
migrated partition for up to 30 consecutive days. If the license is valid for
less than 30 days, RecoverNow will be licensed to run on the number of
days remaining on the original partition. When you migrate back to original
partition the license expiration date is returned to its original value.
IMPORTANT
Vision Solution recommends that you do not make changes to the
RecoverNow configuration when the partition has been migrated.
If the partition has been migrated, and configuration changes are
necessary, you must load and work with the configuration from the
migrated partition. When you change a configuration where one or
more nodes have been migrated, always rediscover the Host ID
after loading the configuration.MPORTANT
Overview
You can apply the redo and undo logs from the recovery server to rollback
application data on the production server and restore the data to an earlier
point in time. When you rollback the application data the information is
synchronized with the replica on the recovery server similar to any other
write action.
Keep in mind that the majority of all database recovery operations require
database repairs that are typically performed by DataBase Administrators
(DBAs). Also, the remaining minority of recovery operations need database
resurrections that are done by IT administrators. In the burning repair
scenario, for example, RecoverNow makes the DBA's job more precise.
LAN
Recovery
Production Server
Server
Data tap
Data tap
Application Undo
Undo Replica
Data Storage Storage
Logs Logs
In the resurrection scenario, the database is crashed and will not be coming
back up. If you need to rollback to 55 seconds, you do not need to restore
last day’s full backup and replay a full week of database redo (roll forward)
logs. In this case, RecoverNow can rollback a totally crashed and burned
database in a matter of moments—many orders of magnitude faster.
Keep in mind that during a production restore, you do not need the database
instance up. In fact, the database cannot be up while RecoverNow rolls its
image around on disk. By definition, the database is down since it crashed.
As a result, the best and fastest way to restore your database is while it is
LAN
Recovery
Production Server
Server
Snapshot
Containers Containers
Application Replica
Data Storage Storage COW
In this repair scenario, only some pieces of data are corrupted but the
production database is still running. After you restore a historical,
non-burning image of the database on the snapshot, you can pull pieces out
of it and put them back into the live production database to repair the bad
pieces of data. You can put out the fire with information. RecoverNow
restores automatically on the back-end snapshot.
• “Step 10: Mount the Volumes for the Context” on page 219
EXAMPLE:
In this example, the Context ID is 1. The volume that needs to be replaced is
/dev/rrtctx1. Display information about the volume.
sclist –C 1 –t pdfc
The output shows that /dev/rrtctx1 belongs to volume group rtvg1. The
volume size is shown in bytes (1073741824).
Object: <loglv00>, Type: <SCRT/containers/PDFC>, Serial <355>:
or
scrt_ra –C <Context ID> –t <LFCID>
The allowable formats for times entered with the –D option are specified in
the file identified by the DATEMSK environment variable.
EXAMPLE:
In this example, the LFCID is 158. The Context ID is 1.
scrt_ra –C 1 -t 158
IMPORTANT
Do not try to validate the information by running the application
on the production server after the data is restored. If the data turns
out to be incorrect, you need to rollback the data included in all
previous attempts to restore data.
If the data is valid, stop the application. Note the Context ID and the time or
LFCID used to create the snapshot.
If the data is not valid, then stop the application and create the snapshot
again as described in “Step 3: Create a Snapshot on the Recovery Server”
on page 215. Continue until you have a snapshot with valid data.
EXAMPLE:
The Context ID in this example is 1. Enter the following command:
scrt_ra -C 1 –W
or
scrt_ra –C <Context ID> –t <LFCID>
EXAMPLE:
dd if=/dev/rrtctx1 ibs=1024k |
-------------------------------------------------------------
--- Dynamic SuperTransaction recovery complete ---
-------------------------------------------------------------
You will be prompted for the time or LFCID that you used to create the
snapshot.
EXAMPLE:
scrt_rc –C 1
rc>
At this prompt, enter l for LFCID or t for “time.” Use the same information
that you used to create the snapshot in “Step 3: Create a Snapshot on the
Recovery Server” on page 215.
> l
You may need to use fsck on the file systems before they can be mounted. If
this is necessary, then unmount the volumes using the rtumnt command, run
fsck, and then mount the volumes using the rtumnt command shown above.
NOTE
Even if you decide later that you would like to pick a better time or
a LFCID, you can roll forward or rollback to that point. This is
possible because RecoverNow keeps all of the change information
on the recovery server.
Before starting the rollback, validate the rollback location using a snapshot
and stop all applications that are using the logical volumes. Use the
Replication Group portlet on the Replication page to stop the replication
group. During the rollback, file systems must be unmounted. For rollback
progress, see the Production Server Rollback portlet on the Recovery page.
NOTE
Data changed on the production server between the rollback
location and your most recent changes will be lost.
4. Click Next
be rolled back to. Select either Point in Time, Container ID, or Event
Marker:
rc>
6. Enter c to continue.
Rolling LFC restore status
--------------------------
Production at LFCID 34
Production at LFCID 32
Production at LFCID 30
Production restored to LFCID 30.
Backingstore remains stable at LFCID 36
rc>
7. If you are satisfied that LFCID 30 is correct, then enter commit. You
should see output similar to the following:
committing....
RestoreServer is down, exit code 0.
Restore Client session complete.
3. On the recovery server, start the production restore shell. Enter the
following command:
scrt_rc -C <Context ID>
rc>
Available VFBs:
--------------------------------------------------------
1201269025 (LFCID: 1342): Fri Jan 25 08:50:25 2008
1193078943 (LFCID: 1986): Mon Oct 22 14:49:03 2007
1192817678 (LFCID: 1456): Fri Oct 19 14:14:38 2007
rc>
If you are restoring from archived data, then the LFCID that you choose
identifies data that is no longer stored on the recovery server.
7. Enter a LFCID <example 1360>
8. Enter c to continue.
NOTE
Although you have rolled back the production server to LFCID
1360, the snapshot on the recovery server still contains
information up to and including LFCID 1464. This allows you to
verify that LFCID 1360 is the one you want to rollback to. If it is
not, enter abort. Determine the proper LFCID and start the
production restore shell as described in step 3 on page 228.
10. On the production server, mount the protected volumes. Enter the
following command:
rtmnt -C <Context ID> -f
Introduction
RecoverNow provides highly available data recovery and protection
through failover, resync, and failback operations on both local and remote
recovery servers, providing solutions for local failovers, as well as remote
failovers.
• resync— bring the production server data up to date with the recovery
server data
These operations use remote mirror Backing Store File Container (BSFC)
replicas as the “failover-production” data. This requires only a
configuration change and no data movement, resulting in reducing the time
that is required to resync the volumes after switching sites.
Use of the statemap tracking mechanism increases the efficiency of the data
resync, as it minimizes the amount of data transfer that must take place. The
Production application continues to run during the resync operation.
Since the failover process puts the volumes into a suspended state, changes
are tracked within a statemap. Assuming that recording these changes is
enabled, only the changed data is sent to the production site to synchronize
the volumes. This reduces the time required to complete the failback
operation.
NOTE
You can choose to configure the recovery server or a replicated
server, if you have one in your configuration, as the failover server
where you temporarily run the production system.
• Names of all special files associated with devices (access points to LVs)
• Filesystem information
Similar to a primary context, the failover context has all the information for
all the servers in a configuration. However, the difference is that the
configuration settings the failover context contains are derived from the
primary context and that it shares several attributes with the primary
context.
NOTE
You must setup one Failover Context for each additional server in
a configuration where you need to create only one Primary
Context for your production server.
Use the RecoverNow disaster recovery and failover rtdr command to:
Syntax
rtdr -C <ID> -fhmnqv failover | resync | failback
rtdr -C <ID> -F <ID> [-s <hostname>] [-fhmnv] setup
IMPORTANT
Use the -f attribute with the rtdr command with caution. If you
do not resync, this attribute forces a failback and leaves the
Production Volume Set (PVS) file systems and replica out of sync.
NOTE
If you have changed a primary context you must delete the
corresponding failover context and recreate the failover context.
• Failover
• Resync
• Failback
where:
NOTE
If there is more than one target server and you do not want to use
the default for the Failover Server, then use the -s <Hostname>
option on the rtdr command to specify the Failover Server. Refer
to “Syntax” on page 235 for the rtdr command.
2. Mount filesystems:
rtmnt -C <Context ID>
4. Unmount filesystems:
rtumnt -C <Context ID>
2. Mount filesystems:
rtmnt -C <Context ID>
4. Unmount filesystems:
rtumnt -C <Context ID>
You can perform failover operations after you validate the replica.
Run a Procedure
The Procedures portlet and Steps portlet on the Procedures page will
guide you through each step.
1. Use one of the following methods to run a procedure. Use the method
that is most convenient for you based on the page you are currently
on:
2. Use one of the following methods to run the next step in the procedure
or retry a failed step:
NOTE
Before failing over, all applications using the logical volumes
must be stopped. You may want to validate that your applications
run properly on the failover server by using a snapshot and
running your applications with the snapshot.
The Steps portlet shows the steps for the Planned Failover procedure
selected in the Procedures portlet.
The following table shows what steps are run to perform a planned failover.
The Sequence Number shows the sequence number of the step in the
procedure. The sequence begins with integers starting at 10 and
incremented by 10. The purpose of sequence numbers is to help identify
problem steps when communicating with customer care.
Sequence Step Dialog for this step Command for this step
Number
NOTE
When the procedure completes, and you are ready to move
production back to the configured production server, run the
Failback procedure.
NOTE
If a replicated server is not configured for this replication group,
and this is the last step in the procedure, when this step completes,
applications can be started on the production server.
The Steps portlet shows the steps for the Unplanned Failover procedure
selected in the Procedures portlet.
The following table shows what steps are run to perform an unplanned
failover. The Sequence Number shows the sequence number of the step in
the procedure. The sequence begins with integers starting at 10 and
Sequence Step Dialog for this step Command for this step
number
40 Failover the “Resume Unplanned Failover dialog: rtdr -C <Context id> -q failover
replication group. Failover replication group: Server
Server roles roles change” on page 248
change.
70 Start replication on “Resume Planned Failover dialog: Start rtdr -C <Context id> -q
the new production replication on new production server” on resync
server page 245
2. Click OK, to confirm the snapshot location and to the start the
unplanned failover.
4. You may need to use more than one snapshot to find a valid location.
You can
NOTE
The servers in the replication group name have changed now that
the replication group has failed over.
NOTE
If a replicated server is not configured for this replication group,
and this is the last step in the procedure, when this step completes,
applications can be started on the production server.
The Steps portlet shows the steps for the Failback procedure selected in the
Procedures portlet.
The following table shows what steps are run to perform a failback. The
Sequence Number shows the sequence number of the step in the procedure.
The sequence begins with integers starting at 10 and incremented by 10.
The purpose of sequence numbers are to help identify problem steps when
Table 4. Failback
Sequence Step Dialog for this step Command for this step
number
10 Unmount the file systems “Run Failback Procedure rtumnt -C <Context id>
on the current production dialog: Unmount file systems
server on current production server”
on page 251
NOTE
After this step completes, applications can be started on the
configured production server.
• Failover Operations
NOTE
If you have changed a primary context you must delete the
corresponding failover context and recreate the failover context.
• Failover
• Resync
• Failback
where:
NOTE
If there is more than one target server and you do not want to use
the default for the Failover Server, then use the -s <Hostname>
option on the rtdr command to specify the Failover Server. Refer
to “rtdr” on page 264 for the rtdr command.
2. Mount filesystems:
rtmnt -C <Context ID>
4. Unmount filesystems:
rtumnt -C <Context ID>
2. Mount filesystems:
rtmnt -C <Context ID>
4. Unmount filesystems:
rtumnt -C <Context ID>
You can perform failover operations after you validate the replica.
Failover Operations
This section describes how to perform the failover operations from the
production server to the recovery or replicated server when the production
server has failed.
IMPORTANT
Do not perform a failover restore on an invalidated restore target.
After validating the replica, the failover procedures are failover,
resync, and failback.
NOTE
Do not do execute scrt_ra -X -F if you want to failover to the
latest point in time.
NOTE
This stops the aba for the primary context.
Performing Resync
A resync operation is required when the Production Volumes and Recovery
Replica Volumes diverge. This occurs after a failure of the production
server and a failover to the recovery or replicated server. When the
application is started on the recovery or replicated server the updates result
in a divergence from the data on the production server.
After restoring the original production server, use the resync operation to
ensure that the production server data is current with the recovery server
data.
• On recovery server:
start lca for failover context
IMPORTANT
The resync operation assumes the original production data was not
lost, and is available in its entirety after the production server is
revived. If the production data is lost, the statemap on the recovery
or replicated server must be marked as dirty prior to resync. This
forces a complete region recovery and initializes the production
data.
Wait for the region recovery completes before performing the failback
procedure, as described in “Performing Failback” on page 258.
NOTE
Performing a complete region recovery should be avoided since
this will require production down time and significant network
resources.
You can mark only specific state maps dirty, using the -L option of the
scconfig command. Refer to the RecoverNow Chapter 14 “CLI
Commands” on page 261for more information.
Performing Failback
IMPORTANT
Before you failback ensure that you stop the application on the
recovery server.
The data can be restored to the production server from the recovery or the
replicated server using the following procedure after the necessary volume
groups and logical volumes have been recreated.
7. If the state maps are not clean on the recovery server wait until all the
data is synchronized to the production server.
12. On the production server execute the following command which will
remove the snapshot created in step 8.
scrt_ra -WC <failover context>
13. On the recovery and replicated server execute the following command
which will failback to the primary context.
rtdr -qC <Failover Context ID> failback
14. On the production server execute the following command which will
failback to the primary context.
rtdr -qC <Failover Context ID> failback
extend_replica_lv
Usage
You can use the extend_replica_lv command to force the expansion of a
Replica LV (Logical Volume) that is associated with a specified PVS
(Production Volume Set) LV, so that the Replica LV will be equal in size to
the PVS LV. This command will only run on the production server and the
LCA must be active.
Syntax
extend_replica_lv -C <Context ID> -L <PVS LV>
extend_replica_lv -h help
-C <Context ID>
-L <PVS LV>
-h Help, prints this usage
NOTE
This command is only required for PVS LVs that are extended and
have no associated filesystem, or PVS LVs that have an associated
filesystem with an outline log and the filesystem is extended.
rtattr
Syntax
rtattr -C ID [-a attribute] [-o object] [-t type]
rtattr -C ID -a attribute -v value {-o object | -t type}
rtattr -h
-a Attribute for query/edit (ObjectAttributeName)
-C <Context ID>
-h Help, prints this usage
-o Object for query/edit (ObjectName)
-t Type for query/edit (ObjectType)
-v Value for edit (ObjectAttributeValue)
You can use the –v parameter with the commands to edit. If you do not
specify the –v parameter only query is available.
Usage
Use this command to query and change attributes in the RecoverNow ODM
(Object Data Manager) files:
• SCCuAttr
• SCCuObj
• SCCuRel
Example 1
View all the machine hostids:
rtattr -C <Context ID> -a HostId
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 4
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f7"
ObjectAttributeType = "ulong"
SerialNumber = 4006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
SCCuAttr:
ObjectName = "replica"
ConfigObjectSerial = 8
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f2"
ObjectAttributeType = "ulong"
SerialNumber = 8006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 16
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f3"
ObjectAttributeType = "ulong"
SerialNumber = 16006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
Example 2
View only the production server’s hostid:
rtattr -C <Context ID> -o production -a HostId
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 16
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f3"
ObjectAttributeType = "ulong"
SerialNumber = 16006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
rtdr
Syntax
rtdr -C <ID> [-fhmnqv] failover | resync | failback
rtdr -C <ID> -F <ID> [-s <hostname>] [-fhmnv] setup
-C Context ID (of the "primary" context)
-F Failover Context ID
-f Forced execution (use with caution)
-h Help, prints usage
-m Man style help
-n No execution, just print commands
-q quiet, do not ask for confirmation
-s Select failover site server from multiple recovery
servers (default is first replication hop's server.)
NOTE
The f option prompts for confirmation unless combined with q.
Usage
This command manages RecoverNow's disaster recovery processes as well
as failover and failback operations. Given a primary context <X>
configured on both a “Production” and a “Recovery” Server, note:
NOTE
A failover context associated with a configured primary context
must be created and setup prior to executing a failover. The
failover Context ID is arbitrary, but must be unique on the
associated servers.
• Prior to failover, you should validate the data integrity of the Replica,
and if necessary, validate the data if necessary.
scrt_ra -C <X> -X
NOTE
Do not perform a failover restore on an invalidated restore target.
After validating the replica, the disaster recovery procedure is
failover, resync, then failback.
After failover, start the application on the recovery server. It will be the
acting production environment until failback. All data modifications
will be tracked and shipped back to the original production server by
resync.
After reviving the original production server, use resync to bring the
production server data up to date with the recovery server data.
NOTE
Resync assumes the original production data was not lost, and is
available in its entirety after the production server is revived. In
the event that production data was lost, statemap on the recovery
server must be dirtied prior to resync to force a complete region
recovery, and re-initialize the production data.
• To dirty all statemaps, in the failover context on the recovery server (the
acting production server):
rtstop -C <X> -F
scconfig -C <X> -M
NOTE
Do not perform a failover restore on an invalidated restore target.
After reviving the original production server, use resync to bring the
production server data up to date with the recovery server data.
NOTE
Resync assumes the original production data was not lost, and is
available in its entirety after the production server is revived. In
the event that production data was lost, statemap on the recovery
server must be dirtied prior to resync to force a complete region
recovery, and re-initialize the production data.
• To dirty all statemaps, in the failover context on the recovery server (the
acting production server):
rtstop -C <X> -F
scconfig -C <X> -M
rtstart -C <X>
rtmark
Syntax
rtmark [-C ] [-s <num>|-d <str>] [-iV] [<file>|-]
rtmark -rC <Context ID>
rtmark -h
-C ID Event is specific to Context ID.
-d <str> Date string, overrides event time.
-h Help, display this message.
-i Interactive query for event attributes.
-r Copies event marks from the production server to the
Recovery Server
-s <num> Seconds since epoch, overrides event time.
-V Print version.
<file> File containing the event mark attributes.
Usage
Event Markers are tags that mark points in time or points in process that are
significant to you for the purposes of recovery. An Event Marker can be
selected as the Recovery Point Objective (RPO) during a data restore. They
are typically needed for applications which cannot take advantage of
RecoverNow’s Any Point-In-Time (APIT) data restores along with
applications which do not have live transactional durability on disk.
The following is an example of a script that could be called, with as many
arbitrary attributes to the event that you want, in addition to the time and
date attribute automatically assigned by rtmark. The customer-defined
attributes between the cat line and the second EOF would also be added to
the event. The entire event would be replicated to the recovery server, and
available for viewing and selection during restores.
#!/usr/bin/ksh
cat <<-EOF | /usr/scrt/bin/rtmark -C <Context ID> -
name = test1
description = "This is a test."
owner = dave
priority = 2
another_attribute = “Just another attribute”
EOF
rtmnt
Syntax
rtmnt [-C ID][-fn]
Parameters
Parameter Description
Usage
This command is used to mount all file systems associated with the context
specified.
See Also
rtumnt, sccfgd_putcfg
rtstart
Syntax
rtstart [-C ID][-BMnNp]
Parameters
Parameter Description
Usage
This command is used to load the RecoverNow data tap and to start the data
replication processes. On the production server, rtstart also mounts the
protected file systems.
See Also
rtstop
rtstop
Syntax
rtstop [-C ID][-FfknS]
Parameters
Parameter Description
Usage
This command is used to stop the data replication process of RecoverNow,
and optionally to unload the data tap.
See Also
“rtstart” on page 269
rtumnt
Syntax
rtumnt [-C ID][-Dfn]
Parameters
Parameter Description
Usage
This command is used to unmount all the volumes associated with the
specified context.
See Also
“rtmnt” on page 268 and “sccfgd_putcfg” on page 283
sclist
Syntax
sclist -t TYPE [-bR] [-A ATTR [ ... ]] [-R] [-C ID] [-d X]
sclist -t TYPE -o ATTR=VALUE [-bR] [-A ATTR [ ... ]] [-C ID]
[-d X]
sclist -a [-A ATTR [ ... ]] [-C ID] [-d X]
sclist -r SERIAL [-r SERIAL ...] [-b | -c] [-C ID] [-d X]
sclist -O SERIAL [-O SERIAL ...] [-C ID] [-d X]
sclist [-BeiIjJlLmMpPstSTvVX] [-I] [-D[D]] [-C ID] [-d X]
sclist -h [-z]
sclist -fZ
-a Query all objects
-A ATTR Query specific attribute (repeatable)
-b Be Brief, useful for scripting output
-B List of StateMap bitmap devices
-c Expansive, if possible, expand on output.
-C ID Operate on Context ID.
Parameter Description
-C ID Operate on Context ID
Parameter Description
-o ATTR=val Query within type list for attribute ATTR equal to val
-t type Type of object to query (enter sclist –hz for list of object
types)
Usage
This command provides information about containers used in RecoverNow.
See Also
“sccfgd_putcfg” on page 283
scconfig
Usage
Use this command to manage DataTap devices and drivers.
Syntax
scconfig -l [-cfinERtv] [-C ID] [-d X] [-I name]
scconfig -u | -U [-finEv] [-C ID] [-d X] [-I name]
scconfig -r [-nv] [-L name ...] [-C ID] [-d X] [-I name]
scconfig -M | -W | -P | -B [-nv] [-L name ...] [-C ID] [-d X]
scconfig -S [-C ID] [-d X] [-nv]
scconfig -s [-C ID] [-d X]
scconfig -t | -q | -Q | -h
scconfig -C ID -a seconds [-b percent]
scconfig -V
Parameter Description
Parameter Description
NOTE
The functionality to send partial containers automatically is also
provided by the Replication Group Configuration Wizard on the
Replication Group Container Options panels page 145 and
page 157 in the section “Send partially filled containers
automatically.” For additional details, refer to “Send partially
filled containers automatically” on page 146.
See Also
“sccfgd_putcfg” on page 283
scsetup
Makes or removes the Logical Volumes (LVs) used by RecoverNow in a
specific protection context, such as LFCs. Note however that scsetup will
not remove production LVs in the PVS or their associated replica LVs. Run
this command after defining and saving a context configuration using the
RecoverNow Replication Group Wizard.
After you have defined a context, scsetup creates a log file and containers
(logical volumes) in the specified volume group.
Syntax
scsetup -M [-ijlnprsv] [-C ID] [-d X] [-o role] [ -t TYPE ]
scsetup -R [-inv] [-C ID] [-d X] [-o role] [ -t TYPE ]
scsetup -E [-cinv] [-C ID] [-d X] [-o role]
scsetup -I [-cinv] [-C ID] [-d X] [-o role]
scsetup -L [-inv] [-C ID] [-d X]
scsetup -X [-inv] [-C ID] [-d X]
scsetup -F [-inv] [-C ID] [-d X]
scsetup -h
-C ID Operate on Context ID.
-c Clear destination device files prior to export/import.
-d X Debug level of X (0-9).
-E Export production volumes.
Parameter Description
-C ID Operate on Context ID
-s Skip setting or clearing bitmaps for statemap (if there are any)
Parameter Description
Usage
Preparation for RecoverNow data protection.
-F Failover preparation. PDFC LV names are moved to BSFC LV
names, and vice versa.
-i Ignore volume manager errors.
-I Import production volumes.
NOTE
Type must be of SCRT/container/*, and specified as the associated
“Class” name (see “sclist -hz” for a list).
scrt_ra
Syntax
scrt_ra -t <> [-C ID] [-V <>] [-d X] [-fFlLv]
scrt_ra -D <> [-C ID] [-V <>] [-d X] [-fFlLv]
scrt_ra -S <> [-C ID] [-V <>] [-d X] [-fFlLv]
scrt_ra -V <> [-d X] [-aflLv]
Parameters
Parameter Description
-v Verbose.
Parameter Description
Usage
This command is the Restore Agent and is used to create snapshots on the
recovery server.
See Also
“rtmnt” on page 268, and “rtumnt” on page 271
scrt_rc
Syntax
scrt_rc [-C ID] [-d X] [-p X] [-h[v]] [-v] [-V]
-d Debug level of X (0-9)
-h Help, display this message
-C Operation on Context ID (default is 17)
-p Ping agent X (aba|lca|rs), ref is 0 if up
-v Verbose help
-V Print version
Usage
The restore client is an interactive command line interface, or shell, for
production data restore. To enter the shell, type scrt_rc -C<ID> at the unix
command prompt on the recovery (a.k.a. backup) server.
NOTE
The -p option the scrt_rc command will not start the shell, but
instead will return with agent status.
• LFC level
• Date/Time
Session Termination
A restore session may be terminated either with an abort or a commit
command. When aborted, all restored devices are brought back to
pre-session levels. When committed, all restored devices remain at the last
target of the session.
A commit does not remove any forward or reverse incremental data from
the RecoverNow time line which allows for a subsequent restore to a time
after the committed target, if necessary. In fact, the restore itself is included
in the time line which allows it to be undone.
Process Overview
RecoverNow performs a production data restore by writing reverse block
incremental data directly into the raw Logical Volumes (LVs) of the
Production Volume Set (PVS), rolling those LVs back in time as a single
consistency group. The PVS is treated as a consistency group since it
encapsulates the entire storage footprint of the protected application. The
application's referential data integrity is always maintained.
All block I/O during the restore occurs at the Logical Volume Manager
(LVM) layer, below all file systems and/or databases associated with the
protected application. In RecoverNow, the reverse block incremental data is
recorded in odd numbered LFCs, the Before Image Log File.
Containers (BILFCs), which are also raw LVs and reside on the
backup/recovery server, or in external tape archives, if any.
The length of the restore window is a function of how many BILFCs are
available to RecoverNow, the size of the BILFC, and the average
application write rate. Tape archives are used to extend the restore window.
During a restore, the PVS LVs must be opened exclusively for writing by
RecoverNow. No other application may have the LVs opened for writing.
All associated databases and file systems must be unmounted.
Two agent daemons work together to perform a production restore. On the
production server, the Log Creation Agent (LCA) receives BILFC
transmissions and makes the BILFC writes to the PVS. On the
General Procedure
1. Ensure required agent daemons are running.
Procedure Notes
1. The rtumnt -Cx command will perform a switch [scconfig -Cx -S]
automatically.
2. One After Image Log File Container (AILFC) may be sent during the
restore to fine tune to the nearest second. BILFCs are optimized for I/O
throughput, while AILFCs maintain individual write fidelity.
3. Backup and recovery server are synonymous.
scrt_vfb
The Tivoli Storage Manager must be defined in the RecoverNow
configuration before using this command.
Syntax
scrt_vfb [-bdDflLnUVrR] [-s <path to validation script>] [ -C ID ]
Usage
This command is used to create a virtual full backup.
Parameters
Parameter Description
-h Help.
-V Create VDevs.
sccfgd_cron_schedule
This command manages entries in cron for RecoverNow Virtual Full
Backups (VFB). The Tivoli Storage Manager must be defined in the
RecoverNow configuration before using this command.
Syntax
sccfgd_cron_schedule <Op> <Context_id> [<sched_type>]
[<cron_info>] [<vfb_opts>]
where:
Op -[a|q|d] for (add|query|delete respectively)
sched_type [once|daily|weekly|monthly]
Usage
This command is used to schedule a virtual full backup.
Examples
sccfgd_cron_schedule add 3 daily 15:3:*:*:*
sccfgd_cron_schedule delete 3
sccfgd_cron_schedule query 3
sccfgd_putcfg
Syntax
sccfgd_putcfg primary_context_ID failover_context_ID
Parameters
Parameter Description
primary_cont Primary Context ID (an existing context that has been created
ext_ID for normal operation)
Usage
This command is used to load the RecoverNow configuration file into the
RecoverNow ODM by creating and loading a failover context configuration
based on a previously loaded primary context configuration.
sccfgchk
Syntax
sccfgchk-C <Context ID>
Parameters
Parameter Description
Parameter Description
-v Verbose
Usage
This command is used to check a configuration before RecoverNow is
started. Issue this command on each node after the configuration is
initialized and before it is started.
sztool
Syntax
sztool
Parameters
sztool script
Command Description
Options
sztool If issued for the very first time, the working directory,
diskinfo file and sztool.cfg file are generated. You should
review the diskinfo file and then modify sztool.cfg,
accordingly. You can then re-run sztool.
sztool script
Command Description
Options
sztool -l When the log file is created, this command prints out the
calculation results for different LFC sizes based on the
existing log file. For example, sztool -l32 prints out the
results when the LFC size is at 32M. sztool -l16 -l512,
prints out all the calculation results from 32MB to
512MB. You cannot have spaces between -l and the LFC
size number. Only screen output, there is not any delay or
sleep.
Usage
You can use the Sizing Tool (sztool) to calculate configuration values before
RecoverNow is installed. It is also useful to run the tool after RecoverNow
is installed to determine if the number of LFCs or WJ percentage needs to
be adjusted. For more information, refer to Chapter 3, “Using the Sizing
Tool to Calculate LFC Size” on page 43.
In this mode:
• There are two production nodes with shared disks between them.
NOTE
You must manually copy and load the RecoverNow configuration
onto the failover production server.
8. On the primary production server, create a file with the Primary Context
ID configuration.
10. On the failover production server use rthostid to obtain its “HostId”.
/usr/scrt/bin/rthostid
11. On the failover production server edit the production HostId stanza in
the /tmp/C<Primary Context ID>.cfg file. Replace the contents
of the “ObjectAttributeValue” field with the output from the
"rthostid" command.
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 15
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "6CABA7DF"
ObjectAttributeType = "ulong"
SerialNumber = 15006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
12. On the failover production server edit the backup HostId stanza in the
/tmp/C<Failover Context ID>.cfg file replacing the content of
the “ObjectAttributeValue” field with the output from the
“rthostid” command.
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 4
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "5FBBC3EF"
ObjectAttributeType = "ulong"
SerialNumber = 4006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 11
46,50,54,57,64,67,70,73..75,82..93,95...
51,54,57,61,65,69,82...
es_ha_config 1 82
The Primary Context ID is 1, and the new device major number 82, is
available on both production servers.
The Primary Context ID is 1, and the new device major number 82, is
available on both production servers.
21. Before starting RecoverNow on the production server, you must stop
your application and unmount the protected filesystems.
if ! /usr/scrt/bin/rtumnt -C${CID} ;
then
echo "${DATE_CMD}: rtumnt failed" >$RTSTOPLOG 2>&1
for i in `/usr/scrt/bin/sclist -C${CID} -f`
do
fuser -k $i
done
if ! /usr/scrt/bin/rtumnt -C${CID} ;
then
IMPORTANT
The RecoverNow drivers must be loaded before the RecoverNow
protected filesystems are mounted or written to. This is managed
during the execution of the RecoverNow startup process.
PowerHA determines which filesystems to mount based on the
information provided in the Resource Group configuration. If no
filesystems are specified, PowerHA will mount all filesystems in
all Volume Groups that defines in the resource group. This
scenario is not preferred for a RecoverNow environment since
RecoverNow will not be started before the filesystems are
mounted.
/usr/scrt/bin/rtstart -C<PrimaryContextID>.
• Verify that the volume groups defined in the Resource Group are
online in concurrent mode.
• Verify that the /usr/scrt/run/c<Primary Context ID> and
/usr/scrt/run/c<Failover Context ID> filesystems are
mounted.
• Verify that all of the RecoverNow protected filesystems are
mounted.
• Verify that the Service IP Address is aliased on the Ethernet
Network Interface.
• Verify that RecoverNow is replicating to the recovery server. View
the log files: /var/log/EchoStream/scrt_lca-<Primary
Context ID>.out on the production server and
/var/log/EchoStream/scrt_aba-<Primary Context
ID>.out on the recovery server.
NOTE
Both scenarios require that you manually perform Failover
operations.
Unplanned Failover
In this scenario, both Highly Available Production servers are unavailable
due to a disaster. For example, an entire site is lost due to a disaster such as
a flood or hurricane.
--------------------------------------------------
Start: 1300122094 (LFCID: 305000): Mon Mar 14 13:01:34
2011
End: 1300112259 (LFCID: 304502): Mon Mar 14 10:17:39
2011
Available VFBs:
-------------------------------------------------------
No recorded VFBs.
Once you have located an optimal restore point, remove the snapshot.
Proceed to step 3 to Backup the replica or to step 4 to perform a failover
restore or a Failover to the Latest Point in the Data.
If none of the protected data was lost on the production server, refer to A
resync operation is required when the Production Volumes and Recovery
Replica Volumes diverge.
Performing Failback
Before you failback ensure that you stop your application on the recovery
server. Refer to “Performing Failback” on page 258.
Planned Failover/Resync/Failback
In this scenario, the administrator has a scheduled maintenance period and
switches operations that run on the production server to the designated
recovery server.
Performing Failover
Perform the following steps for a planned failover:
Verify that the failover process completed, the failover output will
display:
---Failover Context ID <Failover ContextID> is
enabled. ---
Performing Resync
A resync operation is required when the Production Volumes and Recovery
Replica Volumes diverge. This occurs after a failover to the Recovery or
Performing Failback
Before you failback ensure that you stop your application on the recovery
server.
6. Verify that the resync operation has completed, use esmon to check the
LFC usage count
esmon <FailoverContextID>
Example:
esmon 740
Mar 22 17:13:11 Total LFC Size=3025M, Free size=3014M,
Used Size=11M, Usage=1/100
Used Size=11M - 11 megabytes of data has not
been sent to the Production Server
The failback process will transfer the used LFCs to the production
server.
In this mode:
• There are two recovery nodes with shared disks between them.
• A Resource Group can only be moved between the two recovery nodes,
and the RecoverNow roles of the nodes never changes.
IMPORTANT
You must manually copy and load the RecoverNow configuration
onto the Failover recovery server.
8. On the primary recovery server, create a file with the Primary Context
ID configuration.
9. On the primary recovery server, create a file with the Failover Context
ID configuration.
• odmget -q ContextID=<Failover Context ID> SCCuObj
SCCuAttr SCCuRel >/tmp/C<Failover Context ID>.cfg
• Copy this configuration file to /tmp on the Failover Recovery
Server.
10. On the Failover recovery server use rthostid to obtain its “HostId”.
/usr/scrt/bin/rthostid
11. On the Failover recovery server edit the backup HostId stanza. In the
/tmp/C<Primary Context ID>.cfg file. Replace the contents of
the ObjectAttributeValue field with the output from the
rthostid command.
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 15
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "6CABA7DF"
ObjectAttributeType = "ulong"
SerialNumber = 15006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
12. On the Failover recovery server edit the production HostId stanza in the
/tmp/C<Failover Context ID>.cfg file replacing the contents of
the ObjectAttributeValue field with the output from the
rthostid command.
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 4
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "5FBBC3EF"
ObjectAttributeType = "ulong"
SerialNumber = 4006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 11
if ! /usr/scrt/bin/rtumnt -C${CID} ;
then
echo "${DATE_CMD}: rtumnt failed" >$RTSTOPLOG 2>&1
for i in `/usr/scrt/bin/sclist -C${CID} -f`
do
fuser -k $i
done
if ! /usr/scrt/bin/rtumnt -C${CID} ;
then
echo "${DATE_CMD}: rtumnt failure" >$RTSTOPLOG 2>&1
exit 1
fi
fi
IMPORTANT
The RecoverNow drivers must be loaded before the RecoverNow
protected filesystems are mounted or written to. This is managed
during the execution of the RecoverNow startup process.
PowerHA determines which filesystems to mount based on the
information provided in the Resource Group configuration. If no
filesystems are specified, PowerHA will mount all filesystems in
all Volume Groups that defines in the resource group. This
scenario is not preferred for a RecoverNow environment since
RecoverNow will not be started before the filesystems are
mounted.
/usr/scrt/bin/rtstart -C<PrimaryContextID>
• Verify that the volume groups defined in the Resource Group are
online in concurrent mode.
• Verify that the /usr/scrt/run/c<Primary Context ID> and
/usr/scrt/run/c<Failover Context ID> filesystems are
mounted
• Verify that the Service IP Address is aliased on the Ethernet
Network Interface.
• Verify that RecoverNow is replicating to the recovery server, view
log file /var/log/EchoStream/scrt_lca-<Primary
Context ID>.out on the production server and
/var/log/EchoStream/scrt_aba-<Primary Context
ID>.out on the recovery server.
NOTE
Both scenarios require that you manually perform Failover
operations.
Unplanned Failover
In this scenario, the production server is unavailable due to a disaster. For
example, an entire site is lost due to a disaster such as a flood or hurricane.
--------------------------------------------------
-----Start: 1300122094 (LFCID: 305000): Mon Mar 14
13:01:34 2011
End: 1300112259 (LFCID: 304502): Mon Mar 14
10:17:39 2011
Available VFBs:
--------------------------------------------------
-----
No recorded VFBs.
Once you have located an optimal restore point, remove the snapshot.
Proceed to step 3 to Backup the replica, or to step 4 to perform a
failover restore or a failover to the Latest Point in the Data.
3. On the active recovery server, if you have TSM or SysBack, backup the
replica. This provides additional data protection by keeping complete
copies of the data on archive media such as tape. Refer to Chapter 11,
“Working with Archived Data” on page 227.
If none of the protected data was lost on the production server, refer to
“Performing Resync” on page 256.
Performing Failback
Before you failback ensure that you stop your application on the active
recovery server. Refer to “Performing Failback” on page 258.
Planned Failover/Resync/Failback
In this scenario, the administrator has a scheduled maintenance period and
switches operations that run on the production server to the active recovery
server.
Performing Failover
1. On the production server stop your application.
Verify that the failover process completed, the failover output will
display:
---Failover Context ID <Failover ContextID> is
enabled. ---
Performing Resync
A resync operation is required when the Production Volumes and Recovery
Replica Volumes diverge. This occurs after a failover to the recovery or
replicated server. When the application is started on the Recovery or
replicated server the updates to the Replica Volumes result in a divergence
from the data on the production server. Refer to “Performing Resync” on
page 256.
Performing Failback
1. Before you failback ensure that you stop your application on the active
recovery server.
2. Verify that the resync operation has completed, use esmon to check the
LFC usage count
esmon <FailoverContextID>
Example:
esmon 740
Mar 22 17:13:11 Total LFC Size=3025M, Free size=3014M,
Used Size=11M, Usage=1/100
Used Size=11M - 11 megabytes of data has not
been sent to the Production Server
The failback process will transfer the used LFCs to the production
server.
Prerequisites
Before you begin, keep in mind the following:
There may be special cases where you would delay bringing the
Production_Server Resource Group offline on the recovery server.
Normally, this would mean that no data is being synchronized to the
production server. In that case, you can manually execute the rtdr -C
<Primary Context ID> resync command on the production server. This
starts the ABA but it will not be monitored by PowerHA for AIX.
Failback Procedure
1. Move the Production_Server Resource Group to the Production node.
2. Bring the Recovery_Server Resource Group online on the Recovery
node.
• Names used such as, Cluster Nodes, Resource Groups are arbitrary. The
integrator can choose to use any names.
• Notification scripts are not provided and are the responsibility of the
integrator.
The following RecoverNow scripts are provided for the PowerHA for AIX
configuration. These scripts require parameters -C <Primary Context ID>
and for the first two scripts optionally -P if called from the
Production_Server Resource Group. These scripts will log to "/usr/scrt/log"
if the "HACMP Log File Parameters" have "Debug Level" set to "high".
/usr/scrt/bin/production_failback_acquire
/usr/scrt/bin/production_failover_release
/usr/scrt/bin/ABA_Monitor
/usr/scrt/bin/LCA_Monitor
###############################################################################
# Main Entry Point
################################################################################
PROGNAME=${0##*/}
[[ ${VERBOSE_LOGGING} == high ]] &&
{
rm -f /tmp/${PROGNAME}.out
exec 1> /tmp/${PROGNAME}.out
exec 2>&1
PS4='[${PROGNAME}][${LINENO}]'
set -x
}
printf "$(date) ******** Begin ${PROGNAME} ********\n"
/usr/scrt/bin/production_failback_acquire -C <Primary Context ID> -P
if ((${?}!=0))
then
printf "$(date) Production Server start failed.\n"
exit 1
fi
printf "$(date) Production Server start successful.\n"
if ((${?}!=0))
then
printf "$(date) Double-Take_RecoverNow_85_Application_Start failed.\n"
exit 1
fi
printf "$(date) Double-Take_RecoverNow_85_Application_Start successful.\n"
################################################################################
###############################################################################
# Main Entry Point
###############################################################################
PROGNAME=${0##*/}
[[ ${VERBOSE_LOGGING} == high ]] &&
{
rm -f /tmp/${PROGNAME}.out
exec 1> /tmp/${PROGNAME}.out
exec 2>&1
PS4='[${PROGNAME}][${LINENO}]'
set -x
}
printf "$(date) ******** Begin ${PROGNAME} ********\n"
if ((${?}!=0))
then
printf "$(date) Double-Take_RecoverNow_85_Application_Stop failed.\n"
exit 1
fi
printf "$(date) Double-Take_RecoverNow_85_Application_Stop successful.\n"
/usr/scrt/bin/production_failover_release -C <Primary Context ID> -P
if ((${?}!=0))
then
printf "$(date) Production Server stop failed.\n"
exit 1
fi
printf "$(date) Production Server stop successful.\n"
###############################################################################
Note: The value for "Stabilization Interval" depends on the time required
to reset the LFCs on Failover. This depends on the number of LFCs
on the Recovery Server and the system performance. With 20,000
LFCs it could typically take up to 15 minutes.