Assure MIMIX For AIX User Guide
Assure MIMIX For AIX User Guide
Version 5.2
Notices
Trademarks
See www.precisely.com for information about our valuable trademarks.
The following are trademarks or registered trademarks of their respective organizations or companies:
• AIX, AIX 5L, AS/400, DB2, eServer, FlashCopy, IBM, Informix, i5/OS, iSeries, MQSeries, OS/400, Power,
PowerHA, System i, System i5, System p, System x, System z, and WebSphere—International Business
Machines Corporation.
• Adobe and Acrobat Reader—Adobe Systems, Inc.
• HP-UX—Hewlett-Packard Company.
• Teradata—Teradata Corporation.
• Intel—Intel Corporation.
• Linux—Linus Torvalds.
• Excel, Internet Explorer, Microsoft, Windows, and Windows Server—Microsoft Corporation.
• Mozilla and Firefox—Mozilla Foundation.
• Java, Solaris, Oracle—Oracle Corporation.
• Red Hat—Red Hat, Inc.
• Sybase—Sybase, Inc.
• UNIX and UNIXWare—the Open Group.
All other brands and product names are trademarks or registered trademarks of their respective owners.
Support
Getting technical support: Customers with a valid maintenance contract can get technical assistance via
Support. There you will find product downloads and documentation for the products to which you are entitled,
as well as an extensive knowledge base.
Note: Some screenshots in this User Guide may not reflect the most recent product names.
Contents
Chapter 1 Overview of Data Replication Concepts 15
Introduction............................................................................................................... 15
Roles in the system architecture .............................................................................. 16
Assure MIMIX for AIX replication groups .............................................................. 16
Assure MIMIX for AIX clusters .............................................................................. 16
Assure MIMIX for AIX file container ...................................................................... 16
Assure MIMIX for AIX datatap............................................................................... 17
Assure MIMIX for AIX journal ................................................................................ 18
Support for non-root user ownership and permissions for protected Logical Volumes
19
Support for data encryption in Assure MIMIX DR for AIX ..................................... 19
Assure MIMIX for AIX agents ................................................................................... 20
LCA agent ............................................................................................................. 20
Super transaction and RWB in the LCA log file.................................................. 20
ABA agent ............................................................................................................. 20
AA agent................................................................................................................ 21
RA agent ............................................................................................................... 21
Replication................................................................................................................ 22
Journal configuration ................................................................................................ 22
Production journal ................................................................................................. 23
Recovery journal ................................................................................................... 23
Recovery log sizing ............................................................................................ 23
Log file configuration ................................................................................................ 24
Log size estimate .................................................................................................. 24
Number of logs ...................................................................................................... 24
Assure MIMIX for AIX snapshots ............................................................................. 25
Recovery .................................................................................................................. 26
Assure Unified Interface ........................................................................................... 26
Assure MIMIX for AIX with the Assure UI portal....................................................... 27
Assure MIMIX for AIX with the command line interface ........................................... 27
Chapter 2 Planning your Environment 29
Allocating space for Assure MIMIX for AIX logs and journals .................................. 29
Guidelines for production journal size ...................................................................... 30
Production journal size estimate 1 ........................................................................ 31
Production journal size estimate 2 ........................................................................ 31
Production journal size estimate 3 ........................................................................ 31
Production journal size best estimate.................................................................... 31
Guidelines for recovery journal size ......................................................................... 32
Recovery journal size estimate 1 .......................................................................... 32
Recovery journal size estimate 2 .......................................................................... 32
Recovery journal size estimate 3 .......................................................................... 32
Guidelines for volume groups................................................................................... 33
Guidelines for selecting volumes to be protected..................................................... 33
Guidelines for snapshot journal size ........................................................................ 34
Guidelines for log size .............................................................................................. 34
Determining storage requirements ........................................................................... 35
Application information checklist .............................................................................. 37
Database information checklist ................................................................................ 37
Domain name server checklist ................................................................................. 38
Introduction
The Assure MIMIX for AIX User Guide describes how to install, configure,
maintain and administer data replication software.
A combination of High Availability clustering and data replication ensures an
effective and efficient disaster recovery solution. Assure MIMIX HA for AIX is
the clustering component that automates the detection and recovery of
applications and their dependent resources from various system and human
failures. Assure MIMIX DR for AIX, is the replication component that provides
Continuous Data Protection (CDP) and a high level of data protection from both
disasters and data corruption. Assure MIMIX HA for AIX and Assure MIMIX
DR for AIX, working together, comprise the Assure MIMIX for AIX solution.
• Provides “Support for non-root user ownership and permissions for protected
Logical Volumes” on page 19.
• The applications and associated files and volumes that you want to protect.
• Archiving systems.
Refer to “Using the Assure UI portal to configure replication groups” on page 108.
• Use the Replication Group Configuration Wizard. Refer to the New Replication
Group Container Options panel, and the field, “Send partially filled containers
automatically” on page 153.
• Use the command line using the scconfig command. Refer to “scconfig” on
page 329.
To update the Assure MIMIX DR for AIX ODM and the recovery server, two
completely separate configuration changes are required and must be executed in the
following order:
• The first configuration change is to remove the affected Logical Volumes from
the configuration. Refer to “Change a replication group’s configuration” on
page 130.
• The second configuration change is to add the removed Logical Volumes to the
configuration. Refer to “Change a replication group’s configuration” on
page 130.
NOTE
Assure MIMIX DR for AIX uses TLS 1.2 as the protocol for encryption
and self-signed certificates for authentication.
LFCs are created unencrypted on the production server and placed into the LFC
pool. Before the LFCs are transferred to the recovery server, they are first encrypted
and then transferred to the recovery server. Once transferred to the recovery server,
the Assured Backup Agent (ABA) unencrypts the LFCs and stores them in the
received LFC pool before applying the LFC.
You can enable or disable encryption in the Configuration Wizard with the Use
encryption during replication option, which is on the New Replication Group
Container Options panel. See page 120 and page 121 for details on the New
Replication Group Container Options panel. You can also use this option on the
Change Replication Group Container Options panel. See page 143.
• Log Creation Agent (LCA)—The primary agent that runs on the production
server, called the Log Creation Agent (LCA).
• Assured Backup Agent (ABA)—A primary agent that runs on the recovery
server
• Archive Agent (AA)—A primary agent that runs on the recovery server
• Restore Agent (RA)—A primary agent that runs on the recovery server
LCA agent
Shipping logs from the production server to the recovery server is the responsibility
of the LCA. The LCA reads from the journal any redo log information that has been
closed, or sealed, and this information is then shipped over one or more IP networks
to an agent that runs on the recovery server. Both agents bind and communicate over
the same socket. Socket port addresses can either be default addresses or they can be
programmatically selected.
ABA agent
On the other side of the socket and running on the recovery server, the ABA is
collecting log information. The ABA receives the redo log information in the time
order it was created on the production server, and then stores this information in
recovery logs. Remember, these are block storage devices that do not interact with
resident file systems. As the ABA receives the data, it dynamically creates
Before the modifications are applied to the replica, yet another block storage device
is written to with information that would allow the replica to step backward in time.
This storage device is called the undo log and appears to be nothing more than a
logical volume to the volume manager. Once the undo log information is saved on
disk, the redo log can be applied to the replica to bring it up to date with the data on
the production server.
AA agent
The AA, or Archive Agent, also runs on the recovery server. It is used to extend
Assure MIMIX for AIX’s rollback capabilities by recording redo and undo logs to
media. The AA currently works with Tivoli Storage Manager (TSM). The AA uses
the TSM API to send archive requests to TSM. When the logs are archived, they are
always spooled in pairs. Depending on the TSM configuration the data is stored on
media. A redo and an undo log are always together when the AA stores on media.
This gives Assure MIMIX for AIX the ability to restore the data to any point in
time. By unwinding the data with a course grained undo log, then applying fine
grained redo log information to the log, the state of the data can be restored to any
point in time.
RA agent
Restoration is handled by the RA and runs on the recovery server. It does not,
however, run continuously like the other agents. It can programmatically be
executed from the command line or through the GUI. The RA deals with the
following types of restore operations:
Virtual restores occur on the backup server. Production restores are restores in
which all volumes defined in a context are rolled back together on the production
server.
On the production server, write operations to storage are split by the datatap. One
copy of the data is delivered to the protected volume. The other copy of data is
combined with metadata and stored in the redo log. The LCA sweeps through the
production journal and reads any log that has been filled, or sealed, then transmits
the log file over TCP/IP to the ABA which stores the log file to disk on the recovery
server.
NOTE
Data does not pass through the datatap on the recovery server.
The ABA sweeps through the log files in time order and uses the metadata reads
from the replica to calculate the amount of change required to apply the working log
file and stores this information in the undo log. The ABA then reads from the redo
log and applies the modification in block order to the replica.
Journal configuration
Assure MIMIX for AIX uses the following journals:
Production journal
The production journal holds redo log buffers until the logs are transferred to the
recovery server. Then the logs are available to receive new application write data.
Sizing the journal properly prevents the recovery server from falling so far behind
the production server that dynamic recovery must occur for the recovery server to
catch up. If the journal is too small, then transfers between the production server and
the recovery server are performed more frequently than is efficient. If the journal is
too big, then the recovery server may fall so far behind the production server that
dynamic recovery must occur.
Recovery journal
The recovery journal is on the recovery server, and holds redo and undo logs, that
act as Assure MIMIX for AIX’s internal rollback window. If you are using external
archive media such as tape, then the size of the journal on the recovery server is not
critical to the ability to restore data. The larger the recovery journal, the larger the
internal rollback window, which implies faster access to redo and undo logs during
production restores in that window.
The size of the recovery journal is proportional to write throughput and the required
internal rollback window.
The journal on the recovery server should be at least 256MB. Note that this is twice
the space recommended for the minimum on the production server, because the
recovery server contains both redo logs and undo logs.
By increasing the size of the logs, processing is reduced and the elimination of
common blocks in the undo logs is more efficient. Decreasing log size results in a
more up-to-date replica on the recovery server, because log transfers occur more
frequently.
When you determine the best log size, keep these conditions in mind:
• Maximum log size is one-half of the available RAM but not greater than 512
MB
To calculate log size, you need an estimate of average write throughput, and the
required processing rate. For the required processing rate, if Assure MIMIX for AIX
processes one log every 60 seconds, the replica will be one minute behind the
production system.
Number of logs
number of logs = (journal size) / (log size)
Even though the calculation for the number of log files appears trivial, keep in mind
that the number of log files can affect performance. If enough log files are available
on the production server, Assure MIMIX for AIX does not have to rely on state
maps during an outage, because it has not run out of log files to take in data. A state
map contains information about data changes for each storage device protected by
Assure MIMIX for AIX. It can be used to reconstruct data changes if the underlying
data is corrupted or lost. During peak usage, when an application is writing data
faster than the network can transmit, extra log files enable the system to buffer
during these peak periods without having to rely on state maps, eliminating the risk
of a restore blackout window. On the recovery server, a sufficient number of log
files allows activity to be buffered in the event that the tape drive or library is taken
offline.
Snapshots are stored in a different location than the replica so that the replica can
continue to march along in time. The snapshot, however, is frozen with respect to
the replica. Again, using the analogy of a photograph, you can now draw on the
photograph and it does not effect the original subject of the photograph. The ability
to modify the snapshot is accomplished by using a copy-on-write log file.
Notice from the above figure that data is passing through the datatap on the recovery
server in the case of reads and writes to snapshot data. Assure MIMIX DR for AIX
uses a different set of device minor numbers when dealing with snapshots, so that
the datatap knows which log files to access in a specific order. For example, when a
write operation is directed at the snapshot it is actually written to the copy-on-write
(COW) log instead. If the data has not been modified, then a read operation would
come from the snapshot. If the data has been modified, then the read would come
from the copy-on-write log. Keep in mind that the snapshot is the representation of
the application data at a specific point in time.
Related Topics:
Recovery
Generally, there are two types of recovery restorations. A production restore is a
rollback in time which takes place in the protected volumes on the production
server. The other type of restore, a virtual restore, is a rollback in time which is
executed over a read-writable virtual image of the protected volumes which reside
on the recovery server.
For a production restore, Assure MIMIX DR for AIX must have exclusive I/O
access to the protected volumes. The application must be stopped, and the file
systems must be unmounted. Assure MIMIX DR for AIX is the only process that
should be allowed to write into the protected volumes during a production restore.
The control over the protected volumes and the information stored by the Assure
MIMIX DR for AIX process allow an undo of data corruption faster than the
corruption occurred.
Production restores are useful for a database “crash” where the database will not
come up. By recovering an image of the actual production database to some point in
the past directly on the production disk itself, Assure MIMIX DR for AIX can
rollback a crashed database in minutes rather than hours or days for the most
disastrous operational situation a database can encounter.
In contrast, a virtual restore is useful for database repair. In this case, an image of the
database is rolled back to some point in the past on the snapshot which resides on
the recovery server. Select pieces of the data can then be extracted and copied into
the production database.
Related Topics:
• “Using the Assure UI portal to perform failover and failback” on page 275.
• “Using the command line to perform failover and failback for 2-node
non-clustered replication groups” on page 299.
To log into the Assure UI portal, refer to “Logging in to the Assure UI portal” on
page 88.
NOTE
If you have a problem logging into the Assure UI portal, refer to the Assure
Unified Interface User’s Guide packaged with Assure MIMIX for AIX.
Use this chapter to prepare Assure MIMIX for AIX for its initial configuration.
• “Allocating space for Assure MIMIX for AIX logs and journals” on page 29
Allocating space for Assure MIMIX for AIX logs and journals
Assure MIMIX for AIX records write information into log files. Logs are block
storage devices of the same size. Typically, multiple logs are contained in a
Logical Volume. The pool of disk storage that contains the logs is called a journal.
A production journal exists on the production server and a recovery journal exists
on the recovery server.
The production journal is the storage that contains all of the logs. A single log is
transferred to the recovery server when that log is filled. For example, if each LFC
is 64MB and there are 100 production LFCs, then the production journal is
6400MB. When the current LFC is filled with approximately 64MB of write I/O
data (there is some additional metadata), it will be transferred to the recovery
server.
Half of the logs in the recovery journal are redo logs, and half are undo logs. Undo
logs contain information that moves a disk image of the application back through
time when they are applied. This is called rolling back.
The recovery server also contains the snapshot journal. The snapshot journal is the
space on the recovery server where Assure MIMIX for AIX stores copy-on-write
information and write-cache data for snapshots.
The following table shows the variables that are used for estimating journal sizes
and log sizes. You need these estimates in order to configure Assure MIMIX for
AIX:
Concept Meaning
NOTE
Use a tool such as iostat to estimate throughput. You can also use the
Sizing tool to estimate throughput. For more information, refer to “Using
the Sizing Tool to Calculate LFC Size” on page 41.
The goal of sizing the production journal properly is to prevent the recovery server
from falling so far behind the production server that dynamic recovery must occur
for the recovery server to catch up. If the production journal is too small, then
transfers between the production server and the recovery server are performed more
frequently than is efficient.
• The amount of data in throughput spikes that exceed system bandwidth that
Assure MIMIX for AIX can sustain without falling into dynamic recovery
• Assure MIMIX for AIX must not fall into dynamic recovery when write spikes
exceed bandwidth.
If you are not using external archive media, then the size of the recovery journal is
critical for data protection. Rollbacks cannot extend beyond the logs that exist on
the recovery server. You must estimate average throughput and calculate recovery
journal area based on the length of the desired average restore window.
• Throughput
Production
Scalable No OK OK
NOTE
AIX may refer to the Small volume group type as Normal or Original
volume group type.
• Assure MIMIX for AIX Replication Group managed by an Assure MIMIX for
AIX cluster
IMPORTANT
You can also use the Sizing tool to calculate write journal pool size. For
more information, refer to “Using the Sizing Tool to Calculate LFC Size”
on page 41.
Decreasing log size results in a more up-to-date replica on the recovery server
because log transfers occur more frequently.
When you determine the best log size, keep these conditions in mind:
Where y
• Maximum log size is one-half of the available RAM on the recovery server.
• Required processing rate. For example, if Assure MIMIX for AIX processes
one log every 60 seconds, the recovery server will be one minute behind the
production server.
3. Using this space, calculate the LFC size and the number of production and
recovery LFCs.
IP address dependency?
Is DNS enabled?
Is NIS exported?
Release:
• Policy domain
• Policy set
• Management class
• Backup copy
• Archive copy
Storage pools:
• Disk
• Tape
Policy domain:
• Type
• Number
• Shared?
You can use the Sizing Tool to calculate configuration values before Assure
MIMIX for AIX is installed. It is also useful to run the tool after Assure MIMIX
for AIX is installed to determine if the number of LFCs or WJ percentage needs to
be adjusted.
This chapter describes:
• “Running the sizing tool from the Assure MIMIX for AIX Sizing Tool GUI”
on page 42
• “Running the sizing tool script from the command line” on page 49
• “Installing Assure MIMIX for AIX using the installation wizard” on page 69
NOTE
The Assure MIMIX for AIX Installation Wizard and the smit installation
program provide the sztool command for the command line Sizing Tool
and the sztool_gui command you can use to access the Assure MIMIX for
AIX Sizing Tool GUI.
Running the sizing tool from the Assure MIMIX for AIX
Sizing Tool GUI
You will need to install and enable XWindows on your workstation. You will need
to set up a DISPLAY environment variable on the AIX node so that it can launch the
Sizing Tool GUI on the workstation.
See “Installing X11 for the Sizing Tool” on page 417 for more information.
To access the Assure MIMIX for AIX Sizing Tool GUI, type
/usr/scrt/sztool/sztool_gui from the command line.
The Assure MIMIX for AIX Sizing Tool GUI window displays. The first tab,
Introduction displays, by default.
There are four Assure MIMIX for AIX Sizing Tool GUI tabs:
Introduction tab
The Introduction page describes how you use the sizing tool. For detailed
information, click Help. This button displays the URL to access the Precisely
Support web site. From this site you can download documentation that describes
how you use the sizing tool. In addition, you are provided with Support email and
phone numbers. Click Exit to exit the Assure MIMIX for AIX Sizing Tool GUI.
• To select individual LVs, use the check box next to the LV, to select the LVs that
will be protected by Assure MIMIX for AIX.
• Click the Run Disk Discovery Again button to re-discover the LVs.
The table below describes the parameters that you can modify:
Parameter Description
Collection Interval Specifies how many times you want to collect data. The
Count default value is 24 hours.
Collection Interval Specifies how many minutes to wait between each data
Minutes collection interval. The default value is 60 minutes.
Lfc Size (MB) Specifies the size for the Assure MIMIX for AIX LFC. The
default value is 16 MB.
Replication Outage Specifies the hours that the production server can not send LFCs
Hours to the recovery server. When this occurs, the LFCs will begin to
backup on the production sever, until there are no more LFCs
available. Once Assure MIMIX for AIX runs out of LFCs, it
marks the regions which require synchronization in the state
map as dirty. These dirty regions will automatically be
synchronized when the LFCs become available. CDP
functionality will resume as soon as the resynchronization
completes. More LFCs are required as outage time increases.
The default value is 8 hours.
CDP Window Specifies how many hours the data can go back in time to
Hours restore data from the recovery server to the production server.
The window size determines the number of LFCs on the
recovery server. The default value is 8 hours.
Snapshot Duration Specifies the number of hours you want to keep a snapshot
Hours valid. As the snapshot duration hours increase you need to
increase the Write Journals disk space. The default value is 8
hours.
• The Run button becomes active, when you select an LV(s) and specify values
for the LV parameters.
NOTE
Before you click the RUN button, start your application on the selected
LVs, and ensure that your application has a heavy load, so the tool collects
enough data to reflect the activity for a worst case scenario.
Click the Exit Sztool button to exit from sztool. The backend data
collection job will not be interrupted. You can come back later to check
the results via the View and Analyze Log tab.
• Results—This section shows general information for the sztool script sizing log
file. Click the Display Log button to display the results derived from the
original log file. You can view:
– Production server Number of LFCs and the production server disk space
requirements
– Recovery server Number of LFCs and the recovery server disk space
requirements
– Minimum Write Journal Percentage
• Detailed logs from latest run—This section shows a scrollable text area
containing detailed statistics for the sztool script sizing log file. The log file
name is /tmp/sztool/sztool.log Click the Display Log button to display the
results derived from the original log file. The columns show:
– Logical Volume
– IO Count
– Kb read
• Try different parameters to get results from the already collected data. You can
edit the parameters shown below to see different log file results.
– Lfc Size (MB) Low. Refer to “Lfc Size (MB)” on page 45.
– CDP Window Hours. Refer to “CDP Window Hours” on page 45.
– Replication Outage Hours. Refer to “Replication Outage Hours” on
page 45.
– Snapshot Duration Hours. Refer to “Snapshot Duration Hours” on page 45.
The sztool script executes against the collected data to display log file
information in the Results section and the Detailed logs from last run
section.
NOTE
This output will not overwrite the /tmp/sztool/sztool.log file contents.
3. Click the Display Log button if you wish to re-display the results derived
from the original log file.
NOTE
A pdf version of the Chart is automatically saved in
/tmp/sztool/DiskWriteChart.pdf.
NOTE
Before you run the Sizing tool you must have performed the installation
steps described in “Installing the sizing tool” on page 41.
2. Review the diskinfo file and determine which LVs Assure MIMIX for AIX
should protect.
The table below describes the configuration file parameters that you can
modify:
Parameter Description
4. Start your business application on the selected LVs. The load of the businesses
application should be as close to the worst case scenario to ensure a meaningful
result.
6. When the tool completes, check the log file or the AIX window. At the bottom
of the log file or AIX window, the “<<---------<" lines indicate the
production and recovery server number of LFCs, and the percentage of Write
Journal (WJ). The log file also contains detailed information for each LV IO
statistics for each data collection interval. The standard log file is called
sztool.log. An additional copy of the log file is also created,
sztool.log-MM_DD_YYYY-HH:MM:SS. For example:
sztool.log-02_19_2010-14_22_19, where HH is the 24 hour interval.
sztool
script
Command Description
Options
sztool If issued for the very first time, the working directory, diskinfo
file and sztool.cfg file are generated. You should review the
diskinfo file and then modify sztool.cfg, accordingly. You can
then re-run sztool.
sztool -l When the log file is created, this command prints out the
calculation results for different LFC sizes based on the existing
log file. For example, sztool -l32 prints out the results when the
LFC size is at 32M. sztool -l16 -l512, prints out all the
calculation results from 32MB to 512MB. You cannot have
spaces between -l and the LFC size number. Only screen output,
there is not any delay or sleep.
sztool -x Executes the sztool and prints the file name and line number of
the statement for debugging purposes. For debugging, use
sztool_main -x to view screen output.
Assure MIMIX for AIX supported configurations are determined by the number
of datataps (kernel level software that intercepts application writes) and recovery
servers. Some of the configurations supported by Assure MIMIX for AIX are
presented in this chapter.
• “Replication using 4096 byte or 512 byte disk block sizes” on page 53
You can replicate your source data, as long as the source and target disks have the
same DISK BLOCK SIZE. For example, you can replicate your source data that
resides on disks with a DISK BLOCK SIZE of 512 bytes to target disks with a
DISK BLOCK SIZE of 512 bytes. Additionally, Assure MIMIX DR for AIX
Containers must reside on disks with a DISK BLOCK SIZE of 512 bytes.
You can replicate your source data that resides on disks with a DISK BLOCK
SIZE of 4096 bytes to target disks with a DISK BLOCK SIZE of 4096 bytes.
Additionally, Assure MIMIX DR for AIX Containers must reside on disks with a
DISK BLOCK SIZE of 4096 bytes.
IMPORTANT
A replication group cannot contain a mix of 4096 byte and 512 byte disk
block size types.
LAN
Production
Recovery
Server
Server
Datatap
When developing Assure MIMIX for AIX solution, the WAN bandwidth as well as
application peak throughput write I/O rates need to be considered.
Production WAN
Server Remote Recovery
Server
NOTE
The recovery server becomes the failover production server in the
bi-directional configurations. For more information about Assure MIMIX
for AIX failover operations, refer to Chapter 14 “Introduction to Disaster
Recovery” on page 267.
LAN
Production
Recovery
Server
Server
A production and recovery server configuration is supported when you use Assure
MIMIX HA for AIX with Assure MIMIX DR for AIX. The production server is
where the production applications reside. The recovery server, maintains data,
initializes rapid recoveries of production data, virtualizes Any Point-In-Time
recovery, and maintains enterprise processes off-loaded from the production server.
The figure above shows what happens during normal operation of a production to
recovery server configuration, the Primary Context is active on the production and
recovery servers. In this mode:
• The application started by Assure MIMIX HA for AIX and Assure MIMIX DR
for AIXon the production Server writes data to the local storage.
• An Assure MIMIX DR for AIX “Data Tap” intercepts these writes as they are
written to the logical volume and additionally sends them to the recovery server.
The recovery server writes the same, ordered data to another, independent storage
set. These servers may be connected via a LAN or WAN.
• Cascade replication
• Disk Networks
The following sample output shows what the migration report will look like:
################################################################################
EchoCluster 3.6.0.2 -> Assure MIMIX for AIX 5.1 Migration Report:
Volume groups removed from p520-71: ID100, ID298, ID37, ID435, ID502, ID65, ID72
Disk NetworkInterfaces removed from p520-71: ID463
Volume groups removed from p520-72: ID100, ID298, ID37, ID435, ID469, ID65, ID72
Disk NetworkInterfaces removed from p520-72: ID464
Applications (no Assure MIMIX DR for AIX contexts defined) removed: ID268, ID451
Applications (highly available production server configuration) removed: ID320
AppGroups removed: ID271, ID323, ID454
Disk networks removed: ID462
Highly available production server contexts removed: ID539
Upgraded.\data\ID4ccd0527-c558-4c0c-8c35-af20bb975994\104.xml
Overview
This chapter describes Assure MIMIX for AIX, the Assure MIMIX for AIX portal
application and the Assure UI server installation procedures.
Before you begin, ensure that you review support information, system
requirements, and decide on your preferred configuration. Once you have
installed the Assure MIMIX for AIX components you can work with the Assure
UI portal. Refer to “Logging in to the Assure UI portal” on page 88.
• “Installing Assure MIMIX for AIX from the command line” on page 66
• “Using smit to install Assure MIMIX for AIX, the Assure UI server, and the
portal application” on page 78
• “Uninstalling the Assure UI server and the portal application from Windows”
on page 87
NOTE
Assure MIMIX for AIX supports Internet Protocol version 6 (IPv6).
– Before you use the Assure MIMIX for AIX Installation Wizard, you must
add these lines to the /etc/ssh/sshd_config file on each AIX server, and then
restart sshd:
• PermitRootLogin yes
• KexAlgorithms +diffie-hellman-group1-sha1
• Ciphers +blowfish-cbc
– When using the wizard, you must log in as the root user.
– Before you use the Assure MIMIX for AIX Installation Wizard on AIX 7.3,
you must add these lines to the /etc/ssh/sshd_config file on each AIX server,
then restart sshd:
• PermitRootLogin yes
• KexAlgorithms +diffie-hellman-group1-sha1
• Ciphers +3des-cbc
– When using the wizard, you must log in as the root user.
• Assure MIMIX for AIX requires 2 MB in root and 125 MB in /usr. The 125 MB
in /usr is for the /usr/scrt directory which contains binaries, libraries,
configuration, and logs.
IMPORTANT
The /home/usr/scrt/ directory is created when the scrt user is created in
AIX. Do not copy any files to this directory because Assure MIMIX for
AIX deletes this directory during the de-installation process.
• The amount of RAM required for the Assure MIMIX for AIX application
depends on the size of the protected data (StateMap size) and log size. The
maximum log size is one half of RAM.
• The disk space requirement for Assure MIMIX for AIX file containers is
approximately 500 MB for a small configuration.
• For each Replication Group, allow approximately 200 MB in /var for log
files.
• Assure MIMIX for AIX requires at least 128 MB for logs on the production
server, and at least 256 MB for logs on the recovery server for LFCs. The LFCs
on the recovery server contain the undo and redo logs.
• The calculation for the undo and redo logs is based on the required recovery
window and the network outage protection size. Refer to “Determining storage
requirements” on page 35. If you use the snapshot journal, ensure that you take
into account its size. Refer to “Guidelines for snapshot journal size” on
page 34.
• The operating system and maintenance level must be correctly installed on all
nodes. Each node should be at the same operating system version, and within
two release levels of each other.
• You should have at least 110MB of temporary disk space in /usr/sys for
installation. The Cluster services will need an additional 86MB free space in
/usr.
NOTE
ICMP traffic should be allowed on both the host IP address
(HostIPAddress) and all replication IP addresses (IPAddress,
IPAddressAlt1-IPAddressAlt8).
Example
You have an active, reachable IP address 10.10.1.2 (production server), and you
want to ping 192.168.1.2 (recovery server).
To reach the destination, 192.168.1.2, network traffic flows through the 10.10.1.1
router, which will forward the ICMP packets to the 192.168.1.* network.
If you specify,
# ping 192.168.1.2
and there is no response, this indicates that the ICMP packets sent by the ping
command are possibly dropped by the router for the network between the two
systems.
• The required disk space to install and run the Assure UI portal and the Assure
MIMIX for AIX portal application is approximately 1 GB in /opt, of which 375
NOTE
The Assure UI portal supports Internet Protocol version 6 (IPv6).
• The Assure UI portal and the Assure MIMIX for AIX portal application
• Documentation
Note: The Sizing tool is automatically installed when you install the Assure
MIMIX DR for AIX component.
• The wizard includes Java 8 JRE, which is used during installation processes.
The Java Runtime Environment (JRE) included in the wizard has a .dll with
dependencies on C:\Windows\System32\vcruntime140.dll. The
vcruntime140.dll must be present on the PC where the wizard will be used to
allow the wizard to properly open and run. The file is part of the Microsoft
Visual C++ software runtime library. If the file is missing, you can install or
reinstall the Microsoft Visual C++ Redistributable Package from Microsoft's
• Either ssh and scp or rexec and rcp must be allowed. If ssh fails then rexec and
rcp is used.
• To use rcp the ~root/.rhosts file must have the local host and user name.
• Check /etc/services to find the ports used by exec and shell and check that those
ports are not blocked.
• There is the option available to send a ping to the AIX system to determine if it
is reachable.
• This requires that the “echo” port is not blocked. This is usually defined as port
7 in /etc/services
1. Request a license for each node as described in Request license keys from AIX
or Windows.
3. Ensure each target node is connected by the network to the production and
recovery server(s). Make sure that host names and IP addresses of Assure
MIMIX for AIX servers are accessible from the workstation.
1. From https://support.precisely.com/
3. On the Welcome screen, click Next and navigate to the License Key Locations
screen.
4. On the License Key Locations screen, click Continue Without License Keys.
5. On the Continue Without License Keys screen, select Contact Precisely to get
new license keys, and click Next.
Licensing notes:
• If you upgrade your system and the system model changes, it is required that
you obtain and apply a new license file. There is a 30-day grace period to obtain
the new license file.
– When an Assure MIMIX for AIX license key expires replication will be
suspended until a new license key file is applied on the server(s) that has
the expired license key files. Use the Update License Keys wizard to
update license keys for Assure MIMIX for AIX software already
installed on a server. To access the wizard, run Setup.exe. The Assure
MIMIX for AIX Splash screen displays. You can then select the License
Keys wizard.
– If the processor resources are increased to a value that is greater than the
value specified in the license key file, the license key will become Not
valid. Replication will be suspended until the processor resources value is
decreased to the value specified in the license key file or you obtain and
apply a new license key file with a greater number of CPUs.
For example
# mkdir -p /usr/sys/inst.images/5.2.4
NOTE
If your AIX node cannot be unzipped, then download MX_5240.tgz.
# gzip -d MX_5240.tgz
# tar xvf MX_5240.tar
Examples:
install_mx.sh -i -e
install_mx.sh -i -c
• Install Assure MIMIX HA for AIX & Assure MIMIX for AIX
install_mx.sh -i -c -e
install_mx.sh -u -e -p
6. After initiating the installation procedure, you must accept the End User License
Agreement (EULA). You can navigate through it by pressing "Enter" or skip it
by pressing "q". When you reach the end of the EULA, the following message
appears:
You can accept the message typing 'yes', and the installation will proceed. If no
valid response is received after ten attempts, the installation process will fail.
7. Request license keys from AIX
Follow the below steps to request license keys for Assure MIMIX for AIX
from an AIX system:
From https://support.precisely.com/
The following table summarizes the information you must provide Customer
Accounting when requesting a new license key:
Provide 5.2
licensing
information for
Customer
Accounting AIX Command Example
LPAR (partition uname -f “03CC409102B99
ID) 203”
(System Model)
Number of CPUs lparstat -i Use the “Online
Virtual CPUs”
value
Product 5.2
Major.Minor
number
Maintenance Optional
expiration
Licensing notes:
• When you upgrade your system and the system model changes, you must
obtain and install a new license file. There is a 30-day grace period to obtain
the updated license file.
• When an Assure MIMIX for AIX license key expires, replication stops until
a new license key file is applied to the server(s) containing the expired
license key files.
After you apply the license, run the validate license command:
/usr/scrt/bin/validate_license
IMPORTANT
Although the installation wizard is still available, it is no longer included
with the software. Please contact help if you require the Installation
Wizard.
1. Download the Assure MIMIX for AIX installation program from the:
• Click Setup.exe, to display the Assure MIMIX for AIX Installation splash
screen. Then, click Install Assure MIMIX for AIX.
• Execute install.bin, to run the Assure MIMIX for AIX Installation Wizard.
The Assure MIMIX for AIX Installation Wizard runs and displays the
Assure MIMIX for AIX Welcome screen.
Click Next.
3. The Terms And Conditions screen displays. Read and accept the terms of the
License Agreement. Click Next.
4. The Select Product screen displays. Select Assure MIMIX for AIX. Click Next.
NOTE
You must log into the node as root.
The Documentation Options screen displays. Select the node for which the
documentation will be installed and click Next.
• If you have license key files from Precisely, click Next. Proceed to step 11
on page 72.
• Click Continue Without License Keys.
9. The Continue Without License Keys screen displays. Select one of the
following and click Next.
10. If you selected to contact Precisely to get new keys, the Contact Precisely screen
displays. Use one of the following methods to procure license keys.
NOTE
Once you procure the license keys from Precisely, click Next on the
License Key Location (see page 71) to continue the installation. Proceed to
step 11 on page 72.
11. The License Key Check screen briefly displays to validate your license keys,
then the Installing Assure MIMIX for AIX screen displays showing the results
of the installation process for the Assure MIMIX for AIX software.
12. When the Assure MIMIX for AIX installation completes the Install Assure
Unified Interface screen displays. Select the nodes where you want to install
Assure Unified Interface.
Click Next.
NOTE
To enable HTTPS (SSL) refer to the Assure Unified Interface User’s Guide
packaged with Assure MIMIX for AIX.
14. After you have installed Assure MIMIX for AIX, the Assure MIMIX for AIX
portal application, and the Assure UI server, you can log into the Assure UI
portal. You must have a valid user ID and password for the node on which the
Assure UI server is installed. See “Logging in to the Assure UI portal” on
page 88.
NOTE
You must log into the node as root.
a. If you want to install the Assure UI portal on more than one node, enter a
node name or IP address in the Node or IP address field, and click Add.
b. Click Next.
The Retrieving Installation Information screen displays temporarily.
5. After the installation information is successfully retrieved, the Install Assure
Unified Interface screen appears showing the node on which the Assure UI
server will be installed. Click Next.
NOTE
To enable HTTPS (SSL), refer to the Assure Unified Interface User’s
Guide packaged with Assure MIMIX for AIX.
After you have installed the Assure MIMIX for AIX portal application and the
Assure UI server, you can log into the Assure UI portal. You can access the
portal from the Launch Portal button. You must have a valid user ID and
password for the node on which the Assure UI server is installed. See “Logging
in to the Assure UI portal” on page 88.
User roles
The installation process creates the scrt group in /etc/group, identifying the
category of users allowed to access the portal application.
IMPORTANT
The root user is always allowed access to the portal application.
• You must stop your application and Assure MIMIX for AIX on the node(s)
where Assure MIMIX for AIX is being reinstalled.
• If you have Assure MIMIX for AIX fixes (epkg files) installed they must be
uninstalled before you can perform the reinstall. Assure MIMIX for AIX fixes
(epkg files) begin with the letters ES.
Use one the following methods to list all installed fixes (epkg files).
1. Download the Assure MIMIX for AIX installation program from either the:
• Click Setup.exe, to display the Assure MIMIX for AIX Installation splash
screen. Then, click Install Assure MIMIX for AIX.
• Execute install.bin, to run the Assure MIMIX for AIX Installation Wizard.
3. The Assure MIMIX for AIX Installation Wizard runs and displays the Assure
MIMIX for AIX Welcome screen. Click Next.
4. The Terms And Conditions screen displays. Read and accept the terms of the
License Agreement and click Next.
5. The Select Product screen displays. From the Select Product screen select
Assure MIMIX for AIX and Assure Unified Interface and click Next.
NOTE
You must log into the node as root.
Click Next.
• If you select Use existing license keys, the License Key Check screen
displays.
• If you select Specify the location of license keys, the License Key Location
screen displays.
After you specify license keys and they are validated, click Next.
12. The Product Shutdown Required screen displays. You must manually ensure
that Assure MIMIX for AIX is shut down, the wizard does not perform this task.
Click Next.
14. When the Assure MIMIX for AIX reinstall completes, the Install Assure
Unified Interface screen appears. Click Next to reinstall Assure Unified
Interface and the portal application.
When the Assure Unified Interface and the portal application reinstall
completes, the Installation Complete screen displays.
16. After you have reinstalled Assure MIMIX for AIX, you can log into the Assure
UI portal. You can select one of the highlighted nodes to launch the Assure UI
portal and log in. See “Logging in to the Assure UI portal” on page 88.
• You must stop your application and Assure MIMIX for AIX on the node(s)
being upgraded.
• If you have Assure MIMIX for AIX fixes (epkg files) installed they must be
uninstalled before you can perform the upgrade. Assure MIMIX for AIX fixes
(epkg files) begin with the letters ES and EC.
Use one the following methods to list all installed fixes (epkg files).
1. Obtain the Assure MIMIX for AIX installation program from either the:
• Click Setup.exe, to display the Assure MIMIX for AIX Installation splash
screen. Then, click Install Assure MIMIX for AIX.
• Execute install.bin, to run the Assure MIMIX for AIX Installation Wizard.
The Assure MIMIX for AIX Installation Wizard runs and displays the
Assure MIMIX for AIX Welcome screen.
Click Next.
3. The Terms And Conditions screen displays. Read and accept the terms of the
License Agreement and click Next.
4. The Select Product screen displays. From the Select Product screen select
Assure MIMIX for AIX and Assure Unified Interface and click Next.
a. Enter the node name—Points the installer to the target node where you want
to install Assure MIMIX for AIX. Alternatively, you may enter an IP
address.
b. Click Next.
7. The Specify Nodes screen displays.
a. Enter the Node name or IP address and click Add. The node is added to the
list.
b. Click Next.
8. The Node Login screen displays. Specify the password and click Log In, then
click Next.
After you specify license keys and they are validated, click Next.
12. The Product Shutdown Required screen displays. You must manually ensure
that Assure MIMIX for AIX is shutdown, the wizard does not perform this task.
Click Next.
13. Once the installation wizard has verified that Assure MIMIX for AIX has been
shut down, the Shutdown Verification Complete screen displays. Click Next.
14. If you have efixes, the Efix Removal Required screen displays. You must
manually remove the efixes, the wizard does not perform this task. Refer to
page 76 for information on how to remove efixes.
15. The Verify Efix Removal screen displays. When the efix verification
successfully completes the Efix Removal Verification Complete screen
displays. Click Next.
16. The Installing screen displays. When the installation upgrade completes, the
Install Assure Unified Interface screen displays.
Click Next.
17. The Shut Down Assure Unified Interface screen displays. It is recommended to
shut down the Assure Unified Interface. Click Next.
18. The Installing Assure Unified Interface screen displays. When the Assure
Unified Interface installation completes, the Installation Complete screen
displays.
After you have upgraded the Assure MIMIX for AIX portal application, you can
log into the Assure UI portal from the Launch Portal button. You must have a
valid user ID and password for the node on which the Assure UI server is
installed. See “Logging in to the Assure UI portal” on page 88.
Click Done.
a. Mount the CD
b. Run bin/AIX/machine_id.bin
2. Check the Precisely Support web page https://support.precisely.com/ for the
email address for your geographical location in the Contact Us information
displayed when you have signed into the Support website. Send the following
information in email to Precisely Customer Accounting:
• Company Name
• Product(s) for which you are requesting a license
• Machine ID
• Operating system
• Operating system version
• Node name (hostname)
Step 2: Install Assure MIMIX for AIX, the Assure UI server, and the portal
application
1. Download the Assure MIMIX for AIX installation program from the Precisely
Support web page at https://support.precisely.com/.
2. Extract and copy the appropriate directory and files from your PC to the AIX
server.
NOTE
Before selecting the components you want to install, see “Installing using
native interface commands” on page 80 on page 76 for more information.
13. After you have made your selections, press OK to initiate smit and install the
selected components.
After you have installed Assure MIMIX for AIX and, the Assure MIMIX for AIX
portal application, you can log into the Assure UI portal. See “Logging in to the
Assure UI portal” on page 88.
• “Installing Assure MIMIX for AIX, Assure UI server, and the portal application
on the same node” on page 80
• “Installing the Assure UI server and the portal application on a separate node”
on page 81
• Vision.Common
• Vision.EchoCluster
• Vision.EchoStream
• Vision.MXWS
NOTE
Vision.VSP <version> VSP must be installed before installing
Vision.VSP <version> Link to IBM JDK.
Select the following fileset to install Vision.VSP <version> Link to IBM JDK:
• Vision.VSP.MIMIX.pa
NOTE
Vision.VSP <version> VSP and Vision.VSP <version> Link to IBM
JDK must be installed before installing the Assure MIMIX for AIX portal
application.
• Vision.Common
Select the following fileset to install Vision.VSP <version> Link to IBM JDK:
• Vision.VSP.MIMIX.pa
NOTE
Vision Common, Vision.VSP <version> VSP and Vision.VSP
<version> Link to IBM JDK must be installed before installing the
Assure MIMIX for AIX portal application.
User roles
The installation process creates the scrt group in /etc/group, identifying the
category of users allowed to access the portal application.
IMPORTANT
The root user is always allowed access to the portal application.
Log files
There are two types of log files:
• install.log—Stores all user inputs in the directory from which the installer is run
if you have copied your inputs to an AIX machine. This log file also stores
details of the installation process.
If you receive the License validation failed message ensure that the
information specified in license.inf is correct and that the output of the hostname
command matches the hostname in license.inf. If the problem persists, email or
contact Customer Support. Refer to the readme.txt file for contact information.
License expiration
When the license expires for Assure MIMIX for AIX, the application on the
production server is not affected. However, data replication to the recovery server
will be stopped, and you will no longer be able to use the Continuous Data
Protection functionality. You can check the license file for the data replication
component for information about the expiration of the license. The file is named:
/usr/scrt/run/node_license.properties.
NOTE
Assure MIMIX for AIX verifies the contents of the license file. Do not
alter this file.
• Shutdown Assure MIMIX for AIX and unload the drivers using the following
command:
rtstop -FC <Context ID>
1. Download the Assure MIMIX for AIX installation program. Choose from one
of the following methods:
• Click Uninstall Assure MIMIX for AIX on the Assure MIMIX for AIX
Installation splash screen or click uninstall.exe.
• Run uninstall.bin.
3. Run the Uninstall wizard to remove Assure MIMIX for AIX from your nodes.
The Assure MIMIX for AIX Uninstall Wizard runs and displays the Assure
MIMIX for AIX Welcome screen. Click Next.
a. Enter the node name—Points the installer to the target node where you want
to uninstall Assure MIMIX for AIX.
b. Click Next.
5. The Specify Nodes screen displays. Click Next.
6. The Node Login screen re-displays. Log in to the node and press Next.
Select the components you want to uninstall from each node and click Next.
You must manually ensure that Assure MIMIX for AIX is shutdown, the wizard
does not perform this task. Click Next.
9. Once the uninstall wizard has verified that Assure MIMIX for AIX has been
shutdown the Shutdown Verification Complete screen displays. Click Next.
11. The Verify Efix Removal screen displays. When the fix verification successfully
completes, the Efix Removal Verification Complete screen displays.
Click Next to uninstall Assure MIMIX for AIX and Assure Unified Interface.
When the uninstall process completes, the Uninstall Complete screen displays.
IMPORTANT
You must shutdown Assure MIMIX for AIX and unload the drivers before
you run the uninstall program. Use the following command:
rtstop -FC <Context ID>
1. Enter smit
NOTE
The Assure MIMIX for AIX software can only be installed on an AIX
machine. Only the Assure UI server and the portal application can be
installed on a Windows machine.
The Installation Wizard runs and displays the Assure UI Portal Welcome screen.
Click Next.
You can decide to start the portal server automatically after the installation
completes. Click Next.
NOTE
If the portal application is currently active, the Shut Down Assure Unified
Interface screen displays. The installation wizard will enable you to shut
down Assure Unified Interface.
4. The Installing screen displays, showing the status of the installation. Click Next.
NOTE
To enable HTTPS (SSL) refer to the Assure Unified Interface User’s
Guide.
6. After you have installed the Assure MIMIX for AIX portal application, you can
log into the Assure UI portal. Select the highlighted machine-name address to
launch the Assure UI portal and log in. See “Logging in to the Assure UI portal”
on page 88.
When the shut down of Assure Unified Interface completes, the Ready to
Uninstall screen displays. Click Next.
Post-installation tasks
These sections contain the post-installation tasks:
http://server:port
The server is the IP address or host name for the node on which the Assure UI
server is installed and active. The default port number is 8410. For example, if
the Assure Unified Interface server was installed on node vsp-53, you would
copy the following url into the address field in your browser window:
http://vsp-53:8410
2. The portal appears showing the Log In page. Log in using your user ID and
password. Depending on the platform, the user ID and password may be
case-sensitive.
NOTE
If you have a problem logging into the Assure UI portal, refer to the Assure
Unified Interface User’s Guide packaged with Assure MIMIX for AIX.
After you have logged in, the portal opens to the Home page. A default portal
connection exists for the node on which you logged in.
NOTE
Refer to the Assure Unified Interface User’s Guide packaged with Assure
MIMIX for AIX.
IMPORTANT
Do not enable automatic startup of Assure MIMIX HA for AIX if your
nodes are in a clustered environment protected by Assure MIMIX HA for
AIX.
After you install Assure MIMIX DR for AIX on your production and recovery
servers, you can enable automatic startup of Assure MIMIX DR for AIX, only if
your nodes are not in a clustered environment protected by Assure MIMIX DR for
AIX, so that it is started as part of the boot process. Assure MIMIX DR for AIX
needs to be started before the application starts so that it can protect the application
data.
1. Using the cat command, determine the entry for the protected application in the
/etc/inittab file.
2. Note the entry that precedes it. The identifier for this entry is an argument in the
mkitab command. The mkitab command inserts the Assure MIMIX DR for AIX
boot command after the identifier into the /etc/inittab file. This causes Assure
MIMIX DR for AIX to start automatically before the protected application
during a reboot.
This command inserts the sccfgd_boot command after the identifier mxws in
the /etc/inittab file. It is recommended to use mxws as the identifier so that
/usr/scrt/bin/sccfgd_boot is executed after
/opt/visionsolutions/mxws/service/bin/start_mxws.sh is started.
Example:
Jan 03 11:18:06 sccfgd_boot
Loading MIMIX DR for AIX context 1
Mounting MIMIX DR for AIX file system /xyz1
The current volume is: /dev/lvxyz1
Primary superblock is valid.
Mounting MIMIX DR for AIX file system /xyz2
The current volume is: /dev/lvxyz2
Primary superblock is valid.
0513-059 The scrt_lca-1 Subsystem has been started. Subsystem
PID is 5243042.
When you start the listener, a new node_config.properties file is created only if the
file does not already exist. To maintain the integrity of an upgraded
node_config.properties file, in which any of the default configuration parameters
have been modified:
• Rename the current node_config.properties file.
• When you start the listener, ensure that you move any modifications made to the
original/renamed node_config.properties file to the new node_config.properties
file.
• stop_listener.sh
• start_listener.sh
Problem
A router blocks ICMP type 0.
Solution
Use the ICMP_TYPE parameter to specify a different ICMP type that will not
be blocked. You must use the same value on both nodes.
Related topics
In service pack 5.2.4 and above the Assure MIMIX for AIX product includes an
interface that uses the AIX System Management Interface Tool (SMIT) to manage
the Assure MIMIX for AIX product. This chapter describes using this SMIT
interface
• “Nodes” on page 95
• “Operations” on page 98
smit mimix
From this menu, you can select the relevant option and explore further to
accomplish the task.
Scrolling Use the up and down arrows to scroll from line to line.
Cancel/Return PF3 or ESC+3
List PF4 or ESC+4
Entry Fields • ‘*’ in left hand column is a required field.
• ‘#’ in right hand column signifies numeric value
only.
• ‘+’ in right hand column signifies only specified
values allowed.
• ‘[ ]’ entry is changeable – no list
• No ‘[ ]’. Entry is changeable via list (PF4)
More…n More pages/options exist. Use the page up/page down
to navigate.
Red text The requested operation failed.
Green text The requested operation was successful.
All the options can also be accessed using the smit command line.
Nodes
To work with nodes, move the cursor to highlight Nodes of the MIMIX Main
Menu and press enter. The Nodes configuration menu will appear. From here nodes
can be added, changed, shown, or removed.
Add a node
To add a node,
Change a node
To change a node,
Show nodes
To show nodes, move the cursor to Show Nodes and press enter. All currently
defined nodes and IP addresses are shown.
Remove a node
To remove a node,
Replication Groups
To work with replication groups, on the MIMIX Main Menu move the cursor to
Replication Groups and press enter. The Replication Groups configuration menu
will appear. From here replication groups can be added/configured, removed, or
shown.
NOTE
This action may take several minutes to complete. Return to the previous
menu once the command status is OK.
1. Move the cursor to Add Volumes to a Replication Group and press enter.
2. Select a replication group to add discovered volumes on the resulting Select
Replication Group panel and click enter.
Discovered volumes will appear on the resulting “Select Logical Volumes”
panel.
3. Select one or more volumes by moving the cursor to them and pressing F7.
The Select a Target Volume Group panel appears.
NOTE
A preceding ‘@’ indicates the volume was already added. After all
desired volumes are selected, press enter.
4. Select one or more recovery volume groups by moving the cursor to them
and pressing F7. Once all desired volume groups have been selected, press
enter. The request to add the volumes to the replication group is submitted.
1. Move the cursor Remove Volumes from a Replication Group and press
enter.
Select one or more volumes by moving the cursor to them and pressing F7. Once all
desired volumes are selected, press enter. The request to remove volumes from the
replication group is submitted.
1. Move the cursor to highlight Remove a Replication Group and press enter.
2. Select a replication group to remove from the resulting Select Replication
Group panel, and press enter. The request to remove the replication group is
submitted.
Operations
To do replication group operations such as start, stop, failover or failback, on the
Main Menu move the cursor to Operations and press enter. The Operations menu
will appear.
NOTE
Stopping a replication group will attempt to unmount protected
filesystems, potentially resulting in an application/database outage.
• Sync - Stop processes will synchronize all dirty regions and then stop a
Replication Group. This means that agents will be stopped and drivers
unloaded.
• No-Sync - Stop processes stop the replication group. This means that
agents will be stopped and drivers unloaded. Processing will not
synchronize dirty regions prior to stopping.
• Agent-Only - Stop processes will stop the agents and keep drivers
loaded.
3. Select the pertinent mode and press enter. The request to stop the replication
group is submitted.
• Normal - Failback processes will synchronize over all dirty regions from
the configured recovery, current production to the configured production,
current recovery and perform Failback so that replication will take place
from the configured production to the configured recovery.
• After One-Sided Failover - Failback processes will failback the
configured recovery node only. The configured production node still has
primary context active.
LCA/ABA Hard Reset and Synchronize Logical Volumes should be
performed after After One-Sided Failover Failback operation to resume
replication from the configured production to the configured recovery.
Advanced Configurations
The Advanced Configuration menu allows the user to do the following:
To do any of these options, move the cursor to Advanced Configurations from the
MIMIX Main Menu and press enter. The Advanced Configurations menu
appears.
1. Move the cursor to Remove RG From a Local Database and press enter.
2. Select a replication group from the resulting Select Replication Group
panel, and press enter. The request to remove the selected replication group
is submitted.
When a configuration is created, it is only stored on the node where it was created.
As a result, all operations are only available on that node. Additionally, it is useful
to have the configuration on more than one node for the operations to run from other
nodes as well.
For example, there are 2 nodes aix-01 and aix-02. Replication groups were created
on aix-02. All smit operations (start, stop, failover etc.) can only be performed from
aix-02. If a user wants to run “smit mimix” commands on another node (aix-01),
use this command to export it from aix-02 to aix-01. After the configuration is
exported all smit operations are available on the node where it was exported to.
Snapshots/Rollbacks
To work with snapshots and rollbacks, on the MIMIX Main Menu move the cursor
to Snapshots/Rollbacks and press enter. The Snapshots/Rollbacks menu appears.
After selecting the mode, press enter. The request to create the snapshot by time is
submitted.
NOTE
The Available Rollback Windows exists just to display the time at which
the replica can be rolled back.
After selecting the mode, press enter. The request to create the snapshot by time is
submitted.
Remove a Snapshot
To remove a snapshot,
1. Move the cursor to Remove a Snapshot and press enter.
2. On the resulting “Select Replication Group” panel, select a replication group
to remove the snapshot for and press enter.
Rollback by Time
To rollback by time,
1. Move the cursor to Rollback by Time and press enter.
2. Select a replication group to rollback from the resulting Select Replication
Group panel, and press enter.
3. Select the Available Rollback Windows, Date and Time from the resulting
Rollback by Time menu, and press enter. The request to create the rollback
is submitted.
NOTE
The Available Rollback Windows exists just to display the time at which
the replica can be rolled back.
NOTE
The Available Rollback Windows exists just to display the time at which
the replica can be rolled back.
Monitoring
To monitor one or more replications groups or change the monitoring refresh rate,
move the cursor to Monitoring and press enter on the MIMIX Main Menu. The
Monitoring menu is displayed.
The status of the replication groups will be shown. The status will update at the
refresh rate specified.
The status of the replication group will be shown. The status will update at the
refresh rate specified.
After you install Assure MIMIX for AIX, the Assure UI server, and the portal
application, you can use the Replication Group wizard to configure new
replication groups and change, rename, and delete existing ones and the Cluster
Configuration wizard to create and configure clusters. See “Create and configure
a cluster” on page 162.
The volume group types of the production and recovery servers must be compatible.
Servers with big or scalable volume group types may be combined, but a production
server with a small volume group may only be combined with a recovery server
with a small volume group.
Production
Scalable No OK OK
NOTE
AIX may refer to the Small volume group type as Normal or Original
volume group type.
2. Click Configuration
For detailed information refer to the Assure MIMIX for AIX online help.
This starts the Replication Group Configuration wizard and the New
Replication Group Servers panel displays.
The New Replication Group Servers panel contains the following fields:
Field Description
Servers—Section for specifying the host name or IP address for the servers in
this replication group.
Production Select from the list of portal connections that are associated
with the instance.
Recovery 1 Select the portal connection, host name, or IP address for
the server in the first recovery server role. Possible values
are any portal connection that is in the instance domain and
has not been selected for the production server.
4. Click Next.
The New Replication Group Names panel contains the following fields:
NOTE
Only displays when the Failover server field on
the Servers panel has a value other than Do Not
Failover.
5. Click Next.
The New Replication Group Default Volume Groups panel contains the following
fields:
Default volume Select the default volume group for each server from the
group list.
Volume group Displays the volume group type of the volume group
type selected for Default volume group. If no volume group
is selected, this value is a dash.
6. Click Next.
Initially, a message indicating that Assure MIMIX for AIX is retrieving logical
volume information from the production server is displayed.
• The Add Logical Volumes dialog will not contain any logical volumes that
are in use by other replication groups on the node.
• AIX standard logical volume types (such as, raw, jfs, jfs2, jfslog, and
jfs21og) are allowed.
• AIX system internal logical volume types (such as, aio_cache, boot, copy,
paging, sysdump) are not allowed. Any other non-standard logical volume
types are allowed as “raw” logical volumes. If a logical volume is presented
without an associated filesystem path it will be treated as a raw logical
volume.
The New Replication Group Logical Volumes panel after Assure MIMIX for
AIX retrieved the logical volume information from the production server.
Column Description
Server Displays the name of the server in the recovery role or the
servers in the recovery 1 and recovery 2 roles.
Default Volume The default volume group specified for each server.
Group
Volume group Specifies where the selected logical volumes will be located
for replicated on the recovery servers when they are replicated.
data
Volume group The type of the default volume group specified for each
type server.
Volume Group Displays the name and type of the volume group on the
production server for the logical volume.
Size (GB) Displays the size in gigabytes (GB) of the logical volume.
Type Displays the type of logical volume. For example, raw, jfs,
jfs2, and jfs2log.
File System Displays the file system or mount point for the logical
volume. Typically, around 20 characters but can be as long
as 2048. If the length exceeds 76, the text is truncated with
an ellipsis in the middle. Flyover text shows the entire path
name.
8. Click OK. The following checks occur to ensure there are no errors.
– Unprotected file systems do not use the same JFS or JFS2 outline logs
of protected file systems.
Refer to “Setting up JFS log isolation” on page 231 for detailed information.
The New Replication Group Logical Volumes panel displays with the
selected logical volumes you want protected.
9. Click Next.
The New Replication Group Replication IP Addresses panel displays. Use this
panel to specify IP labels or addresses that will be used specifically for
replication. By default, replication uses the IP addresses of the servers. There
are two options:
Replication IP Indicates if the user wants to use the server IP address for
Addresses replication or specify alternates. Possible values:
• Use server IP addresses for replication
• Use specified IP addresses for replication
Recovery Server Identifies the name of the server in the recovery role.
- Host name
Recovery Server Identifies the resolved IP address from the host name.
- IP address Possible values are any valid IPv4 and IPv6 addresses.
The New Replication Group Containers panel displays. Use this panel to
configure how data is moved between servers. Containers are used by internal
processes and replication to move the changed data between servers. A larger
total container size provides a larger rollback window. Smaller sized containers
may replicate more frequently. Specify the quantity and size of the containers,
the default volume group where the containers are located, and the number of
logical volumes to use to balance IO and improve replication performance.
Field Description
Size of each Specify the size of each container in MB. Since the size
container must match on all configured servers, the recovery and
replicated values are display only. Possible values are 2, 4,
8, 16, 32, 64, 128, 256, and 512. Default is 16.
Total size Displays the total space required for the containers on each
server.
Default volume The default volume group specified for each server.
group
Use alternate Indicates if the you want to specify volume groups and
volume groups physical volumes that will be used for the containers used
or physical for replication. If checked, the Replication Containers panel
volumes for is displayed. If not checked, the Replication Containers
replication panel is skipped. This box, is unchecked, by default.
containers
• If you leave the Use alternate volume groups or physical volumes for
replication containers checkbox unchecked on the Containers panel, the
New Replication Group Container Options panel displays. See page 120.
• If you check the Use alternate volume groups or physical volumes for
replication containers checkbox, the New Replication Group Replication
Containers panel displays. If the checkbox is not checked, this panel is
skipped.
Use the New Replication Group Replication Containers panel to select the volume
groups and physical volumes where you want to locate the containers used
specifically for replication.
Field Description
Total container Displays the total space (in MB) required for the containers
size on the server.
Volume Group Displays the list of volume groups on the specified server.
Add Use this button to adds the volume group to the list and
defaults the physical volume to Any.
Remove Removes the volume group from the list. This action is
available after you add a volume group.
The New Replication Group Container Options panel displays. Use this panel to
specify if you want to:
The New Replication Group Container Options panel contains the following fields:
Field Description
Additional Options
The New Replication Group Snapshot Buffer panel displays. When a snapshot is
created, the snapshot buffer is used to hold the changes between the location in the
rollback window where the snapshot was created and the current time. As changes
are replicated, the snapshot buffer fills up.
Field Description
Snapshot Indicates how much space to reserve for the snapshot buffer.
Buffers - Size The value is a percent of the size of the logical volumes that
have been selected to protect. Valid values are integers from
1 to 100. Default is 10.
Warning Indicates how full the snapshot buffer must be before you
threshold are warned that it is filling up.Valid values are integers from
1 to 100. Default is 75.
The New Replication Group Ports panel displays. Port numbers are used for
communication by the agents on each server. You must change the port number
if it is not unique or it is already assigned to a different service.
The Configuration Wizard reads the /etc/services file to find ports numbers that
are unassigned and populates the Port fields with the unassigned values. You
can use the values discovered by the Configuration Wizard or change them.
When the new replication group is deployed, the /etc/services file will be
updated to include the port numbers for the replication group.
• If you enter a port number value that is already in use, the service name that
is using the port will be displayed in an error message.
• When a replication group is deleted through the portal application, the
entries that were previously added to the /etc/services file will be changed by
inserting a " # " at the beginning of each entry associated with the replication
group.
The New Replication Group Ports panel contains the following fields:
Field Description
Archive (AA) Specify the control and data ports used by the Archive
Agent.
Possible values:
• 1024-65535
• The default control port is 5747, the default data port is
5752.
Apply (ABA) Specify the control and data ports used by the Apply Agent.
Possible values:
• 1024-65535
• The default control port is 5753, the default data port is
5750.
Send (LCA) Specify the control and data ports used by the Send Agent.
Possible values:
• 1024-65535
• The default control port is 5754, the default data port is
5751.
Restore client Specify the control and data ports used by the Restore
(CA/RA) client/agent.
Possible values:
• 1024-65535
• The default control port is 5749, the default data port is
5748.
The New Replication Group Tivoli Storage Manager panel displays. The Tivoli
Storage Manager (TSM) can archive containers which allows you to rollback or take
snapshots farther back in time. Full backups of the server where the TSM client is
running can also be performed.
The New Replication Group Tivoli Storage Manager panel contains the following
fields:
TSM client - Displays the name of the server where the TSM client is
Server running. This is always the recovery server. Enabled only
when Enable Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the user ID for TSM to use to log into the server
User ID where the TSM client is running. Enabled only when Enable
Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the password for TSM to use to log into the server
Password where the TSM client is running. Enabled only when Enable
Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the location of the TSM options file on the TSM
Options file client server. The default location is
/usr/tivoli/tsm/client/ba/bin/dsm.opt. You can
specify any valid path and file name. Enabled only when
Enable Tivoli Storage Manager (TSM) is checked.
TSM client - Specify the domain for TSM to use. Enabled only when
Domain Enable Tivoli Storage Manager (TSM) is checked.
TSM server Specify the host name or IP address of the server where the
TSM server software is running. Valid values are any valid
IPv4 and IPv6 addresses or host name. Enabled only when
Enable Tivoli Storage Manager (TSM) is checked.
The content of this panel is the same as the Configuration Summary section in the
Configuration window. Refer to the Assure MIMIX for AIX online help for
additional information for the Configuration Window for replication groups. Refer
to “Configuration window for Replication Groups” on page 152.
IMPORTANT
When you initially configure a replication group the Finish button will be
enabled, so that you can save the configuration to the servers and initialize
the configuration.
IMPORTANT
Assure MIMIX for AIX prevents configurations from being created or
changed if there is not enough space available. Once you have allocated
additional space, you can click “Check Required Space” to verify that you
have sufficient space.
When the new configuration is saved and validated, you can view the progress of
the configuration initialization in the Configuration Initialization Progress section in
the Configuration Window. As each step is successfully completed, a checkmark
appears next to the step. When you create a new configuration, Assure MIMIX for
AIX runs commands for each step. The table below describes steps and commands
that are run when you create a new configuration.
1. Save configuration. This was done when you clicked Finish in the
New Replication Group wizard on the
Summary panel.
5. Set agent port numbers. Shell script that adds the agent port numbers
to the /etc/services file.
Notes:
Related topics
• “Configuration initialization progress (Change Configuration)” on page 146.
• You cannot change the replication group name or context ID with the Change
Replication Group wizard.
Replication Configuration Wizard Panels and Fields Field Support for Dynamic Update
Servers panel
Failover - change from Yes to Do not failover Dynamic change supported
Names panel
No changes can be made.
Use specified IP addresses for replication Dynamic change supported to value “Use
server IP addresses for replication”
Use alternate volume groups or physical volumes for Dynamic change supported
replication containers
Use alternate volume groups or physical volumes for Dynamic change supported
replication containers
Any replication group changes listed in the previous table that are not supported for
dynamic update must be made manually using instructions in “Manually changing
configuration of a replication group” on page 134.
1. On the Production server, stop the LCA replication agent for the replication
group that you are changing.
stopsrc -cs scrt_lca-<primary context id>
2. On the Production server, run esmon to verify that all outstanding LFCs have
been sent to the Recovery servers. Depending on the backlog of containers to be
applied, this could take some time.
esmon <primary context id>
Wait for the message "LFC information is not available" before stopping the
replication group on the Recovery servers.
3. On each of the Recovery servers, stop the Assure MIMIX for AIX replication
group you are changing to unload the Recovery server drivers.
rtstop -FC<primary context id>
Stopping the Assure MIMIX for AIX replication group on the Recovery servers
does not impact the production environment.
4. Use the configuration wizard to make the changes. From the Replication Group
portlet, click Configuration. From the resulting Configuration Window, click the
Change Replication action for the replication group you want to change. Follow
the wizard to make changes.
a. On each of the Recovery servers, start the replication group that you
changed.
rtstart -C<primary context id>
b. On the Production server, start the LCA replication agent for the
replication group that you changed.
startsrc -s scrt_lca-<primary context id>
2. Click Configuration.
The Change Replication Group Servers Login panel displays if you are not
logged in. Use this panel to log into the server specified in the panel.
5. Specify the username and password and click Log In. Log in to each server to
retrieve information. When you run commands, context IDs are used to identify
the replication group. The context IDs specified have been defaulted to unique
IDs on the servers in this replication group.
6. The Change Replication Group Servers panel displays again showing the login
status of each server.
8. Click Next.
a. Click Add to add a logical volume using the Add Logical Volumes
dialog, shown below. For detailed information, refer to the Assure
MIMIX for AIX online help.
IMPORTANT
The Add Logical Volumes dialog will not contain any logical volumes that
are in use by other replication groups on the node.
AIX standard logical volume types (such as, raw, jfs, jfs2, jfslog, and
jfs21og) are allowed. AIX system internal logical volume types (such as,
aio_cache, boot, copy, paging, sysdump) are not allowed. Any other
non-standard logical volume types are allowed as “raw” logical volumes.
If a jfs or jfs2 logical volume is presented without an associated filesystem
path, it will not be available for protection.
Click OK.
The Change Replication Group Logical Volumes panel displays with the
selected logical volumes you want protected.
Use the Change Logical Volumes dialog, shown below, to choose a different
volume group for the replica logical volumes, on the recovery servers. By
default, the default volume group specified is used for this purpose. For
detailed information refer to the Assure MIMIX for AIX online help.
• Select the logical volume(s) you wish to change or remove for this
replication group and click Remove.
Use the Remove Logical Volumes dialog, shown below, to remove the
selected logical volumes from the replication group. These logical volumes
will no longer be protected. For detailed information refer to the Assure
MIMIX for AIX online help.
• If you select, Use server IP addresses for replication, the following panel
appears.
• If you select, Use specified IP addresses for replication, the following panel
appears.
The Change Replication Group Containers panel displays. Containers are used
by internal processes and replication to move changed data between servers.
The Change Replication Group Container Options panel displays. Use this
panel to specify if you want to:
• Use encryption during replication. See “Support for data encryption in Assure
MIMIX DR for AIX” on page 19.
The Change Replication Group Ports panel displays. Port numbers are used for
communication by the agents on each server. If you enter a port number value
that is already in use, the service name that is using the port will be displayed in
an error message.
The Change Replication Group Tivoli Storage Manager panel displays. Tivoli
Storage Manager (TSM) can archive containers which allows you to rollback or
The Change Replication Group Summary panel displays and shows a summary
of this replication group's configuration. If you have made configuration
changes on the Containers panel (see page 143), proceed to “Configuration
changes will reset the rollback window (CDP is lost)” on page 146.
IMPORTANT
When you decide to make configuration changes to the replication group,
they can only be saved if the replication group is stopped. Use the
Replication Group portlet on the Replication page to stop the replication
group.
When the configuration is saved and validated, you can view the progress of the
configuration initialization in the Configuration Initialization Progress section in the
Configuration Window. As each step is successfully completed, a checkmark
appears next to the step. When you change an existing configuration, Assure
MIMIX for AIX runs commands for each step.
1. Save configuration. This was done when you clicked Finish in the
Change Replication Group wizard on the
Summary panel.
NOTE
The steps that are run depend on what is changed in the configuration.
Related Topic:
2. Click Configuration.
The Configuration Window displays. For detailed information refer to the Assure
MIMIX for AIX online help.
.
3. Click Rename from the Actions dropdown. There are two possible results:
2. Click Configuration.
The Configuration Window displays. For detailed information refer to the Assure
MIMIX for AIX online help.
3. Click Delete from the Actions dropdown. There are two possible results:
The Send partially filled containers automatically option enables you to control the
frequency of shipping containers (LFCs) to the recovery server during low I/O
periods.
The container is examined when the frequency to check value is reached. If at that
time the minimum percent filled value has not been reached, the container will not
be sent. When the container reaches the minimum percent filled value, containers
will not be immediately sent. There is a 5 second delay before the containers are
sent. The 5 second delay is provided to account for the possibility that the container
could become completely full. When the container is sent, the frequency to check
value is reset and the entire process starts again.
When changing either of the scconfig options (-a, -b), the command returns output
displaying the values for both options. If the frequency to check value is changed to
zero, the command output will also display “Send Partial Container Automatically
is not active”.
Support for LVM commands when the Assure MIMIX for AIX
drivers are loaded
The AIX Logical Volume Manager (LVM) manages Logical Volumes (LVs) that
Assure MIMIX for AIX protects.
chlv OK
chlvcopy OK
chvg OK
defragfs OK
exportvg OK
extendlv OK
extendvg OK
fileplace OK
importvg OK
lslv OK
lsvg OK
lvmstat OK
mirscan OK
mklv OK
mkvg OK
readlvcopy OK
redefinevg OK
reducevg OK
rmlvcopy OK
splitlvcopy OK
splitvg OK
unmirrorvg OK
varyonvg OK
es_swmaj
Usage
Use this command to switch and restore the LV major number for all LVs in an
Assure MIMIX for AIX context. This will allow AIX LVM commands such as
“migratepv” to function when the Assure MIMIX for AIX drivers are loaded.
Syntax
es_swmaj -C <Context ID> [-s | -f | -u]
-s: Switch to major number(raw LVs are excluded)
-f: Switch to major number(including raw LVs)
-u: Restore major number
-v: Verify switch to major number.
IMPORTANT
If the “-f” option is selected for raw LVs, they must be open before
executing this command. The raw LVs must remain open and no additional
opens should be done until this command is executed with the “-u” option.
The default port assignments shown are not added to the /etc/services file when
Assure MIMIX for AIX is installed.
sc<Context ID>aa_channel 5747/tcp # Archive Agent
sc<Context ID>ra_channel 5748/tcp # Restore Agent (scrt_rs)
sc<Context ID>ca_channel 5749/tcp # Restore Client Agent
sc<Context ID>aba_dchannel 5750/tcp # Assured Backup Agent
sc<Context ID>lca_dchannel 5751/tcp # Log Control Agent
sc<Context ID>aa_achannel 5752/tcp # Archive Agent
sc<Context ID>aba_channel 5753/tcp # Assured Backup Agent
sc<Context ID>lca_channel 5754/tcp # Log Control Agent
On the Configuration Wizard Ports panel, when you are creating a new replication
group configuration, the Configuration Wizard reads the /etc/services file to find
port numbers that are unassigned, including the default port assignments if they are
not in use by another application, and populates the Port fields with the unassigned
values. You can use the values discovered by the Configuration Wizard or change
them.
When the new replication group is deployed, the /etc/services file will be updated to
include the port numbers for the replication group.
On the Configuration Wizard Ports panel, when you are changing a replication
group configuration, the Configuration Wizard reads the /etc/services file to find
port numbers that are already assigned to the replication group and populates the
Port fields with the assigned values. If the Configuration Wizard does not find any
port numbers for the replication group in the /etc/services file, port numbers that are
unassigned, including the default port assignments if they are not in use by another
application, will be used to populate the values in the Port fields.
You can use the values discovered by the Configuration Wizard or change them.
When the new replication group is deployed, the /etc/services file will be updated to
include the port numbers for the replication group.
If multiple Context IDs are configured, each Context ID must have a unique port
assignment. Port assignments for Primary and Failover Context IDs must also be
unique.
For example, if the Primary Context ID is 25 and the Failover Context ID is 250,
add the following entries to the /etc/services file on all servers.
IMPORTANT
Do not include the # Description shown below when adding entries to the
/etc/services file on all servers. For example, do not include, # Assured
Backup Agent.
The default port assignments shown below for the Push Server are added to
the/etc/services file when Assure MIMIX for AIX is installed.
mxws 17835/tcp
mxwsmgmt 17836/tcp
If the default ports assignment for the Push Server are in use by another application,
then the port numbers used by the Push Server must be changed.
• Change the port numbers for the Push Server in the configuration file
/opt/visionsolutions/mxws/service/httpsvr/conf/mxws.properties on all servers
to unused port numbers.
server.port=17835 <replace 17835 with an unused port number>
management.port=17836 <replace 17836 with an unused port number>
You must also update the Push Server port configured for the Assure MIMIX
for AIX portal application in: /opt/visionsolutions/http/vsisvr/httpsvr/
webapps/mimixaix-pa/WEB-INF/classes/com/visionsolutions/mimixaix/mxws.
properties. There is no need to configure the management port for the Push
Server here as it is not used by the Portal Application.
mxws.port=17835 <replace 17835, this must match the server.port
number used in the
/opt/visionsolutions/mxws/service/httpsvr/conf/mxws.properties
configuration file.>
• Use strvsisvr to start the Assure UI server and the Assure MIMIX for AIX
portal application on the servers where the Assure UI portal was running.
/opt/visionsolutions/http/vsisvr/httpsvr/bin/strvsisvr
NOTE
On Windows, the Push Server portal application configuration will be in
the default install location at C:\Program
Files\VisionSolutions\http\vsisvr\httpsvr\webapps\mimixaix-pa\WEB-INF
\classes\com\visionsolutions\mimixaix\ mxws.properties.
To end the Assure UI server, refer to the Assure Unified Interface User’s Guide.
• Change the Push Server port number for the portal application.
mxws.port=17835 <replace 17835, this must match the server.port
number used in the
/opt/visionsolutions/mxws/service/httpsvr/conf/mxws.properties
configuration file on the AIX servers.>
To start the Assure UI server, refer to the Assure Unified Interface User’s Guide.
2. tail –f var/log/EchoStream/es_syslog.out
After you complete the Assure MIMIX for AIX post-installation tasks, refer to
“Starting and Stopping” on page 187 for information on how to start and stop
Assure MIMIX for AIX.
After you created and configured replication groups you can use the Cluster
Configuration wizard to configure new clusters and add a replication group to a
cluster. In addition you can use the Change Application wizard to change the
existing clustered application settings of a replication group.
NOTE
If you plan to use a highly available host name, append the highly
available host name to the host entries in the /etc/hosts file on both nodes
configured in the cluster before changing the existing clustered application
settings of a replication group.
2. Click Configuration.
For detailed information refer to the Assure MIMIX for AIX online help.
This starts the Configuration wizard and the New Cluster - Cluster Name and
Servers panel displays.
This screen is displayed when there are only 2 servers configured in the
instance.
This screen is displayed when there are more than 2 servers configured in the
instance.
Field Description
Cluster Name and Servers—Section for specifying the name of the cluster, an
optional description, and a portal connection for each server in the cluster.
Cluster name Specify the name of the cluster. Cluster names can be 256
characters long and can contain any alphanumeric character.
Whitespace is not allowed.
Server 1 Select from the list of portal connections that are associated
with the instance. The host name from the portal connection
is used for the server. This ensures the newly configured
cluster will appear in the instance. If there are only two
available portal connections, both drop-downs will default
to the available portal connections. When a portal
connection is selected from one drop-down, it will not be
available in the other drop-down.
If there are more than two servers:
• Select...
• Any portal connection that is in the instance domain and
has not been selected for Server 1.
If exactly two servers:
• One of the available portal connections.
Server 2 Specify the portal connection for the second server in the
cluster. If there are only two available portal connections,
both drop-downs will default to the available portal
connections.When a portal connection is selected from one
drop-down, it will not be available in the other drop-down.
If there are more than two servers:
• Select...
• Any portal connection that is in the instance domain and
has not been selected for Server 1.
If exactly two servers:
One of the available portal connections.
The Cluster Name and Servers - Servers panel contains the following fields:
4. Click Next.
The New Cluster - Network and Heartbeats panel displays. Initially, a message
indicates that Assure MIMIX for AIX is retrieving network information from
each server. Use this panel to specify the networks to add and the heartbeat
threshold for each network.
Field Description
Add Action to add networks to the cluster. See the Add Networks
dialog section for details.
Add WAN Action to add a WAN network to the cluster. You can
specify a network name, missed heartbeat threshold and
local area networks. See the Assure MIMIX for AIX online
help for field descriptions.
Remove Action to remove networks from the cluster. See the Assure
MIMIX for AIX online help for field descriptions.
The dialog below is shown when a LAN network is selected. See the Assure
MIMIX for AIX online help for field descriptions.
6. Specify a Network name, select a network and click OK. The New Cluster -
Network and Heartbeats panel displays with the selections you made.
The New Cluster - User Exits panel contains the following fields:
Field Description
Add Action to add a user network to the cluster. See the Dialog -
Add User Exit dialog for details.
Remove Action to remove user exits from the cluster. See the Assure
MIMIX for AIX online help for field descriptions.
Resource Type Specifies the Resource type for the user exit. Possible
values:
• Application
• Application Processes
• Cluster Services
• Replication Group
• Hostname
• Network
• Network Interface
• Server
• Service IP Address
• Custom Monitor
• WAN
Event Specifies the event type for the user exit. Possible values:
• Start
• Stop
• Failure
Actions Action to change the scripts for a user exit. Possible values
are Change and Remove. See the Assure MIMIX for AIX
online help for field descriptions.
8. Click Next on the User Exits panel if you do not want to add user exits,
otherwise click Add - the Add User Exits dialog displays.
The Add User Exit dialog enables you to add user exits for the cluster. Specify
the resource type, event and the path to the user scripts, and click OK.
9. The New Cluster - User Exits panel displays with the added user exit.
The Summary panel displays. This panel provides a summary of the cluster
configuration.
By default the Add a replication group to this cluster check box contains a
check mark which will launch the New Replication Group configuration wizard
if no replication groups are configured, or the Add Replication Group to Cluster
configuration wizard when replication groups are configured. Uncheck the Add
a replication group to this cluster check box if you do not want to add a
replication group at this time.
2. From the Configuration window, select Change Application from the Actions
dropdown for a replication group.
The Change Application - Select Cluster panel displays. This panel shows the
selected replication group that is participating in the specified cluster. Cluster
services will be stopped while configuration changes are deployed. Replication
The Change Application - Select Cluster panel contains the following fields:
Field Description
Cluster Specifies the name of the cluster that this replication group
is assigned to.
3. Click Next. The Change Application - Application panel displays. Specify the
application name, description, and an optional highly available host name that
follows the application when a failover occurs to the recovery server.
Field Description
Field Description
Enable the Indicates if this replication group has start and stop scripts
cluster to and application monitoring defined.
manage
application
availability
Start script Specifies the command that starts the application process.
The start script is also responsible for ensuring that the
correct number of instances of a process is running, and for
starting instances as needed. The Start script can be a script
or a binary file. The field includes the absolute path to the
command. Possible values:
• valid path
• blank
Stop script Specifies the command that stops the application's process
or processes anytime the application is stopped. The Stop
command can be a script or a binary file. The field includes
the absolute path to the command.
Possible values:
• valid path
• blank
Processes to Specifies the name of the process started by the Start script
monitor and monitored by the clustering component. For possible
values see the description for match criteria.
match criteria Specifies the criteria to use to find the process to monitor.
Possible values:
• Simple match—The specified string is matched against
the output of ps -eo args. If the supplied string
appears as a part of any output lines, then the process is
assumed to be started. If you are monitoring more than
one process, enter each additional process name
separated by a semi-colon. In that case, the application is
reported to be started only if all specified processes are
found.
• Regex match—The specified string is a regular
expression matched against the output of ps -eo
args. If the specified string appears as a part of any
output lines, then the process is assumed to be started. If
you are monitoring more than one process, enter each
additional process string separated by a semi-colon. In
that case, the application is reported to be started only if
all specified processes are found. The regular expression
match uses POSIX Regular Expression matching.
Precisely recommends that you use the Simple Match
option and match the string exactly as it appears in the
ps -eo args output.
• PID file— The specified string is taken as the path of the
processes PID (process identification number) file. The
process is reported to be started only if the PID file
exists, and the PID contained in it matches a running
process.
• Script—The specified string is taken as a shell command,
whose exit code indicates the status of the process.
Exit code value Status of the process
Less than zero Agent Error
0 Running
1 Stopped
Greater than 1 Failed
Delay before Specifies the time in seconds that the clustering component
starting will wait after the application starts before it begins
monitoring the process. Possible values are 0-999 or dash.
5. Click Next. The Change Application - Custom Monitors panel displays. Custom
monitors are optional and are used to monitor any resource.
Field Description
Add Displays the Add Custom Monitor dialog you can use to
add a custom monitor.
Remove Displays the Add Custom Monitor dialog you can use to
remove a custom monitor. This action will remove a row
from the table.
Monitor Specifies the name of the custom monitor. The name can be
up to 32 characters in length and can contain any characters
except space.
Field Description
7. Click Next. The Change Application Summary panel displays. The following is
a summary of the application configuration for the replication group being
NOTE
If you plan to use a highly available host name, append the highly
available host name to the host entries in the /etc/hosts file on both nodes
configured in the cluster before adding an existing non-clustered
replication group to a cluster.
2. From the Configuration window, select Add to Cluster from the Actions
dropdown for a replication group.
The Add Replication Group to Cluster - Select Cluster panel displays. Adding a
replication group to a cluster configures an application to manage and makes the
The Add Replication Group to Cluster - Select Cluster panel contains the
following fields:
Field Description
Cluster This list contains valid clusters for this replication group.
Clusters must have the same 2 nodes as the replication
group in order to be included in the list. Possible values:
If more than one cluster:
• Select…
• Valid cluster for this replication group.
If only one cluster:
• Then one cluster can be selected.
3. Click Next. The Add Replication Group to Cluster - Application panel displays.
Specify the application name, description, and an optional highly available host
name that follows the application when a failover occurs to the recovery server.
Field Description
4. Click Next. The Add Replication Group to Cluster - Application Processes and
Monitoring panel displays. Enabling the cluster to manage the availability of the
application is recommended. Specify scripts to start and stop the application
processes and which processes to monitor for availability.
Field Description
Enable the Indicates if this replication group has start and stop scripts
cluster to and application monitoring defined.
manage
application
availability
Start script Specifies the command that starts the application process.
The start script is also responsible for ensuring that the
correct number of instances of a process is running, and for
starting instances as needed. The Start script can be a script
or a binary file. The field includes the absolute path to the
command. Possible values:
• valid path
• blank
Stop script Specifies the command that stops the application's process
or processes anytime the application is stopped. The Stop
command can be a script or a binary file. The field includes
the absolute path to the command.
Possible values:
• valid path
• blank
Processes to Specifies the name of the process started by the Start script
monitor and monitored by the clustering component. For possible
values see the description for match criteria.
match criteria Specifies the criteria to use to find the process to monitor.
Possible values:
• Simple match—The specified string is matched against
the output of ps -eo args. If the supplied string
appears as a part of any output lines, then the process is
assumed to be started. If you are monitoring more than
one process, enter each additional process name
separated by a semi-colon. In that case, the application is
reported to be started only if all specified processes are
found.
• Regex match—The specified string is a regular
expression matched against the output of ps -eo
args. If the specified string appears as a part of any
output lines, then the process is assumed to be started. If
you are monitoring more than one process, enter each
additional process string separated by a semi-colon. In
that case, the application is reported to be started only if
all specified processes are found. The regular expression
match uses POSIX Regular Expression matching.
Precisely recommends that you use the Simple Match
option and match the string exactly as it appears in the
ps -eo args output.
• PID file— The specified string is taken as the path of the
processes PID (process identification number) file. The
process is reported to be started only if the PID file
exists, and the PID contained in it matches a running
process.
• Script—The specified string is taken as a shell command,
whose exit code indicates the status of the process.
Exit code value Status of the process
Less than zero Agent Error
0 Running
1 Stopped
Greater than 1 Failed
Delay before Specifies the time in seconds that the clustering component
starting will wait after the application starts before it begins
monitoring the process. Possible values are 0-999 or dash.
Field Description
Add Displays the Add Custom Monitor dialog you can use to
add a custom monitor.
Remove Displays the Add Custom Monitor dialog you can use to
remove a custom monitor. This action will remove a row
from the table.
Monitor Specifies the name of the custom monitor. The name can be
up to 32 characters in length and can contain any characters
except space.
6. Click Next. The Add Replication Group to Cluster - Service IP Addresses panel
displays. Service IP addresses are optional and follow the application when a
failover occurs to the recovery server.
The Add Replication Group to Cluster - Service IP Addresses panel contains the
following fields. For information on the Add Service IP Address dialog refer to
the Assure MIMIX for AIX online help.
Field Description
Assure MIMIX for AIX provides capabilities to start up and shut down both the
production and the recovery servers.
• “Using the command line to start and stop Assure MIMIX for AIX” on
page 189
Using the Assure UI portal to start and stop Assure MIMIX for
AIX
This topic contains two sections:
• “Using the Assure UI portal to start Assure MIMIX for AIX” on page 187
• “Using the Assure UI portal to stop Assure MIMIX for AIX” on page 188
The Start Replication Group dialog remains displayed until the action completes
successfully.
NOTE
Applications must be stopped before stopping replication. File systems are
unmounted and data is no longer protected when replication is stopped.
The Stop Replication Group dialog remains displayed until the action completes
successfully.
Using the command line to start and stop Assure MIMIX for AIX
This topic contains two sections:
• “Using the command line to start Assure MIMIX for AIX” on page 189
• “Using the command line to stop Assure MIMIX for AIX” on page 191
NOTE
Step 2 and Step 3 are done for the first start after Assure MIMIX for AIX
is configured.
2. Stop any applications that are using the Assure MIMIX for AIX PVS
(Production Volume Set) LVs (Logical Volumes).
On the recovery server, start Assure MIMIX for AIX by performing the following
steps:
This displays:
# rtstart -C 74
Loading Assure MIMIX DR for AIX Recovery Server Drivers
Starting scrt_aba
NOTE
ALL file systems must be mounted BEFORE loading the drivers for
non-Disruptive Startup Mode. A full synchronization will occur for
non-clustered replication groups with mounted file systems. This may take
an extended period of time.
• From the Assure UI portal, select “Mounted” in the File Systems column
drop-down menu in the Start Replication Group dialog.
NOTE
Refer to “Support for LVM commands when the Assure MIMIX for AIX
drivers are loaded” on page 153. The table shows the LVM commands
supported when the Assure MIMIX for AIX drivers are loaded.
This feature is intended to be used only once during the initial configuration, the
loading of the Assure MIMIX for AIX drivers and the full synchronization of data.
Before stopping the replication group, the user's application must be stopped first.
When the replication group is stopped the filesystems will be unmounted by Assure
MIMIX for AIX and the drivers will be unloaded. Refer to “rtstop” on page 324.
• From the command line, use the rtstart command without the -r option on the
production and recovery server to start the replication group. Refer to “rtstart”
on page 323.
rtstart -C <Context ID>
• From the Assure UI portal, select Not Mounted in the File Systems column
drop-down menu in the Start Replication Group dialog.
3. Stop Assure MIMIX for AIX, synchronize data, and unload the drivers on the
production server.
rtstop -FSC <Context ID>
4. On the recovery server, stop Assure MIMIX for AIX and unload the drivers:
rtstop -FC <Context ID>
Overview
Assure MIMIX for AIX enables you to restore a complete copy of the data on the
production server to any time in the past. You can quickly restore a database that
has crashed and rollback the data to a point before the corruption occurred.
Assure MIMIX for AIX enables you to create and use snapshots for production
restores and snapshot-based backups to media such as tape. You can also create a
snapshot when you want to use a copy of the data on the recovery server. Having a
snapshot—a read or write copy of the data—on the recovery server enables you to
investigate and use the data without affecting the operation of the production
server. For example, you can:
• Repair individual objects such as files, tables, and records on the production
server without stopping the application or replacing all of the application files.
• Test whether the snapshot is the correct one to use for rolling back the data on
the production server.
• Navigate to the Recovery page, and select Create Snapshot from the Actions
dropdown.
• Navigate to the Snapshot Details portlet, and select Create from the
dropdown to the right of Snapshot.
• Point in Time—Specify the date and time, within the rollback window, from
where the snapshot will be created. This is displayed when Point in Time is
selected in the Location in rollback window field. This value is always
displayed in the dropdown. This only applies to the Date and time field.
• Container ID—Identify the container ID, within the rollback window, from
where the snapshot will be created. This is displayed when Container ID is
selected in the Location in rollback window field. This value is always
displayed in the dropdown. This only applies to the Container ID field.
Container IDs must be even numbers in the range specified in the rollback
window. When a change is made on the production server an even container
ID is created to represent that change. The even container ID is then applied
to the recovery server. When you create a snapshot, you use the even
container ID, that contains the change.
• Event Marker—Select one of the event markers, within the rollback window,
for where the snapshot will be created. Only one event marker can be
selected at a time.
• Most Recently Applied Container—Specify the most recent confirmed apply
(committed) container, within the rollback window, from where the snapshot
will be created. This option is used when all the data is synchronized and no
changes are being made. You can use this option to create a snapshot at that
point in which you know you have stable and consistent data.
3. Date and Time—Specify the date and time, within the rollback window, from
where the snapshot will be created. The default is the most recent date in the
rollback window. It is only displayed when Point in Time is selected.
7. Click OK.
To access the Increase Snapshot Buffer Size dialog, navigate to the Snapshot Details
portlet, and select Increase Size from the dropdown to the right of Snapshot.
IMPORTANT
The snapshot buffers can be increased before or after a snapshot is created.
If the snapshot buffer has reached 100% the Increase Size option will not
be available.
The Increase Snapshot Buffer Size dialog contains the following fields:
Field Description
Replication Identifies the servers and logical volumes that are being .
group Replication occurs from the production server to the
recovery server.
Current size Identifies the configured snapshot buffer space for this
server. Integers are in MB units. The displayed value is
from the configuration.
You can also use the command line to increase the snapshot buffer size. For infor-
mation, refer to “rn_temp_journal” on page 312.
1. On the recovery server, make sure all snapshot file systems are unmounted
before trying to release the snapshot.
rtumnt -C <Context ID>
3. On the recovery server, enter the following command to create a snapshot based
on the current redo log:
scrt_ra -C <Context ID> -X
4. Mount the volumes on the recovery server. Enter the following command:
NOTE
The DATEMSK environment variable must be set to the full-path of a file
that contains the date format template.
export DATEMSK=/tmp/mdm
The following shows two examples based on the different date formats.
If the content of /tmp/mdm Then the date format for DATEMSK is...
is...
where:
• %m–Month
• %d–Day
• %y–Year
• %H–Hour
• %M–Minute
• %S–Second
IMPORTANT
The date format must be specified exactly as it is in the DATEMSK
environment variable. For example, if you do not specify seconds (%S) in
DATEMSK, you cannot specify it in the command.
To create a snapshot:
1. On the recovery server, make sure all snapshot file systems are unmounted
before trying to release the snapshot.
rtumnt -C <Context ID>
3. On the recovery server, enter the following command to create a snapshot based
on a specific date and time.
scrt_ra -C <Context ID> -D date
If you enter 06/15/18 09:33:40 as the date and 34 for the Context ID, the
output is similar to the following:
scrt_ra -C34 -D "06/15/18 09:33:40"
You have requested a virtual incremental LFC restore to
time (1529055220) Fri Jun 15 09:33:40 2018
c(ontinue) or a(bort)? c
Making SNAP /dev/rsnc1lif_bk_1, 66.306
Making SNAP /dev/rsnc1lif_bk_2, 66.310
Making SNAP /dev/rsnc1lif_bk_3, 66.314
Making SNAP /dev/rsnc1dbmf_bk_1, 66.318
Making SNAP /dev/rsntestlv, 66.450
Making SNAP /dev/rsnrtlog, 66.454
Making SNAP /dev/rtestlv, 66.448
Making SNAP /dev/rrtlog, 66.452
Trying to match LFC to time: 1529055220
Found time 1159536820 in-between LFC #12 and LFC #14
Fetching LFC #17
Applying LFC #17
Fetching LFC #15
Applying LFC #15
Fetching LFC #13
Applying LFC #13
Fetching LFC #14
Applying LFC #14
The BackingStore Vdevs are at level 12+. ABA is at level 18.
Use the scrt_ra command to display the available event mark rollback window.
scrt_ra -C <Context ID> -ve
Available event marks for rollback:
-------------------------------------------------------------
Event Name Date/Time (Epoch Time) Description
---------- ---------------------- -----------
1. On the recovery server, ensure all snapshot file systems are unmounted before
trying to release the snapshot.
rtumnt -C <Context ID>
3. On the recovery server, create a snapshot based on a specific event mark. Use
the target seconds (Epoch Time) value from the "Available event marks for
rollback " output. The Event mark should precede the sequence of the current
redo log, but should not be so far in the past that it has been archived to media
such as tape.
Example:
scrt_ra -C <Context ID> -S<target seconds> which is (Epoch Time)
If the target seconds is 1529601179, then the output is similar to the following:
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_83, 62.6
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_82, 62.10
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_81, 62.14
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_80, 62.18
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_89, 62.22
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_88, 62.26
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_87, 62.30
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_86, 62.34
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_85, 62.38
Jun 21 17:12:59 Making SNAP /dev/rsnlvFS_C71_84, 62.42
Jun 21 17:12:59 Making SNAP /dev/rsnc71lif_bk_1, 62.46
Jun 21 17:12:59 Making SNAP /dev/rsnc71lif_bk_2, 62.50
Jun 21 17:12:59 Making SNAP /dev/rsnc71lif_bk_3, 62.54
Jun 21 17:12:59 Making SNAP /dev/rsnc71dbmf_bk_1, 62.58
Jun 21 17:12:59 Making SNAP /dev/rlvFS_C71_83, 62.4
Jun 21 17:12:59 Making SNAP /dev/rlvFS_C71_82, 62.8
1. On the recovery server, make sure all snapshot file systems are unmounted
before trying to release the snapshot.
rtumnt -C <Context ID>
3. On the recovery server, create a snapshot based on a LFCID. The LFCID should
precede the sequence of the current redo log, but should not be so far in the past
that it has been archived to media such as tape.
scrt_ra -C <Context ID> -t <LFCID>
For information on database repair and database resurrection, refer to Chapter 12,
“Working with Assure MIMIX for AIX Applications” on page 247.
• “Using the Assure UI portal to extend logical volumes and file systems” on
page 211
• “Using the command line to extend an Assure MIMIX for AIX-protected file
system” on page 222
• “Increasing the snapshot journal space on the recovery server” on page 225
• “Removing a File system from an Assure MIMIX for AIX protected jfslog”
on page 232
• “Setting up error notification” on page 232
• “Verify that replica logical volumes are synchronized with logical volumes on
the production server” on page 239
• “Using IBM Power Systems Live Partition Mobility with Assure MIMIX for
AIX” on page 241
After you install Assure MIMIX for AIX, the Assure UI server, and the portal
application, you can use the Replication Group wizard or the command line to
configure new replication groups and change, rename, and delete existing ones.
Refer to “Configuring Replication Groups” on page 107.
• On the production server, the protected logical volumes (LV)V “MAX LPs:” size
is not exceeded.
• On the recovery server, the replica LV (“pt” LV) “MAX LPs:” size is not
exceeded.
• On the protected and replica Volume Groups (VGs), there is enough free space.
Refer to “Using the Assure UI portal to extend logical volumes and file systems” on
page 211 or “Using the command line to extend an Assure MIMIX for
AIX-protected file system” on page 222 for additional information.
Category Event / Description Severity New Subscription Defaults OID for SNMP Trap
Checks for conditions that prevent the Assure UI portal from monitoring the
instance, such as not being able to connect to the instance or the portal
connection may not be configured correctly. There is a network error or the
configuration daemon is not running. Also, you may have specified the
incorrect password in the portal connection.
Event for instance problems when the Assure UI portal starts monitoring the
instance. Other subscribed events, if used, provide additional monitoring.
Category Event / Description Severity New Subscription Defaults OID for SNMP Trap
Event for when the size of a protected logical volume on the recovery 1 or
recovery server 2 does not match the size on the production server.
Node Licenses that are expired or Action 0 and cannot Yes .1.3.6.1.4.1.47240.1.3.0.8
not valid. required be changed
Checks for licenses that will expire in the next 30 days. Replication will be
stopped when the license expires and cannot be started until the license is
updated on the server. While the Assure UI server is up and running, this
event will be sent out once every 24 hours until the problem is corrected or
until the license expires. Users can replace the expiring license with a valid
license on the node. At the next license check interval, the warning status
will be cleared.
Category Event / Description Severity New Subscription Defaults OID for SNMP Trap
Category Event / Description Severity New Subscription Defaults OID for SNMP Trap
Checks for procedures with a status of action required, such as a step has
failed.
Checks for procedures with a status of stopped and waiting for the next step
to be started.
Checks for replication groups with an overall status of action required, such
as snapshot buffer is full, synchronization did not complete, production
server rollback failed, or status is unknown.
Category Event / Description Severity New Subscription Defaults OID for SNMP Trap
Checks for applications that are not being managed by the cluster. In the
Clusters portlet on the Summary page use the Manage Applications action to
enable cluster management of the application.
• Using the Extend Logical Volume Size Wizard. See “Using the Extend Logical
Volume Size wizard” on page 211.
• Using the Extend File System Size Wizard. See “Using the Extend File System
Size wizard” on page 218.
2. Select a logical volume, and click Extend from the Actions dropdown.
Logical Volume The current step in the Extend Logical Volume wizard.
Size
Logical Volume Name of the logical volume. This can be any logical volume
in the replication group.
Type Type of logical volume. Examples are raw, jfs, jfs2, and
jfs2log.
New size The new size of the logical volume in MB of the logical
volume. The size must be greater than the current size and
not greater than the maximum size.
Maximum size If a size is chosen that is greater than this value, then
without replication must be stopped and the configuration update
configuration Values are specified as integers
update
Server Host name of the servers that are part of this replication
group.
Volume group Name of the volume group on the production server and the
volume group for the replica on the recovery server for the
logical volume.
Maximum size The maximum size that the logical volume can be grown.
3. Click Next.
The second and final panel of the Extend Logical Volume wizard panel
displays.
4. Click Finish.
The logical volume will be extended, replication will be stopped, and the
configuration will be updated.
Extend Logical The current step in the Extend Logical Volume wizard.
Volume
Cluster Displays the name of the cluster that this replication group
belongs. Dash (-) if none.
Current logical The current size of the logical volume or file system.
volume size
New logical The new size of the file system or logical volume that was
volume size specified on the previous panel.
1. Increase the maximum chlv -x <new LPs> <LV> If the maximum logical
number of logical partitions value for the
partitions on S. (S is file system or logical
production server, volume will not support
recovery server, recovery the new size.
server 1, or recovery
server 2)
Steps run when a configuration must be changed to increase the state map
and region size
The table below shows the order of the steps and commands that are run when a
configuration must be changed to increase the state map and region size. These steps
are only run as a result of the Extend Logical Volume wizard. Steps 1 through 4 are
run before the wizard is dismissed. The other steps will be displayed in the
Configuration Window.
1. Increase the maximum chlv -x <new LPs> <LV> If the maximum logical
number of logical partitions value for the
partitions on S. (S is file system or logical
production server, volume will not support
recovery server, recovery the new size.
server 1, or recovery
server 2)
2. Unmount the file system rtumnt -C <Context ID> If there is a mount point.
on the production server.
3. Extend the logical extendlv <LV name> <LP(s)> If there is no mount point.
volume on the production
server
4. Extend the file system on chfs -a size=<new size>M <file If there is a mount point.
the production server system>
20. Start replication on the rtstart -C <Context ID> Always. If there are more
recovery server. than one recovery servers
configured, the label is:
Start replication on
recovery server 1.
22. Extend replica logical /usr/scrt/bin/extend_replica_l If there are more than one
volume on the recovery v -C <Context ID> -L <PVS LV> recovery servers
servers. configured.
2. Select a file system, and click Extend from the Actions dropdown.
Field Description
File System Size The current step in the Extend File System wizard.
Logical Volume Name of the logical volume. This can be any logical volume
in the replication group.
Type Type of logical volume. Examples are raw, jfs, jfs2, and
jfs2log.
New size The new size of the file system in MB. The size must be
greater than the current size and not greater than the
maximum size.
Maximum size If a size is chosen that is greater than this value, then
without replication must be stopped and the configuration update
configuration Values are specified as integers.
update
Server Host name of the servers that are part of this replication
group.
Volume group Name of the volume group on the production server and the
volume group for the replica on the recovery servers for the
logical volume.
Maximum size The maximum size that the logical volume can be grown.
3. Click Next.
The second and final panel of the Extend File System wizard panel displays.
The file system will be extended, replication will be stopped, and the
configuration will be updated.
Extend File The current step in the Extend File System wizard.
System
New size The new size of file system in MB, that was specified on the
previous panel.
1. Increase the maximum chlv -x <new LPs> <LV> If the maximum logical
number of logical partitions value for the
partitions on S (S is file system or logical
production, recovery volume will not support
server, recovery server 1, the new size.
or recovery server 2)
A file system can only be extended to a size within the limits of the region size
(blocks). The default region size of 8192 blocks supports a max file system size of
3.984375 TB (4177920 MB).
The following table provides max file system sizes for several combinations of
Region Size in blocks and SMBitmap size in bytes. The SMBitmap size is set to
131072 bytes and should never be changed.
The unit of the region size is the number of blocks. Each block is 512 bytes. If
the RegionSize will be exceeded, refer to “Extending a protected file system
beyond the limit of the region (block) size” on page 224.
Assure MIMIX for AIX-protected file systems can be extended when Assure
MIMIX for AIX is active using the smit chfs command. When the file system is
extended:
• On the production server, the replica ("pt" LV) is automatically extended when
data is written to the expanded area.
NOTE
To manually extend the replica (“pt” LV), use the extend_replica_lv
command to force the expansion of a Replica LV (Logical Volume):
• On the recovery server, the write journal associated with the file system is not
extended.
1. On the production server, stop applications that are using the Assure MIMIX for
AIX protected logical volumes.
2. On the recovery server, stop applications that are using logical volumes from an
active snapshot.
3. On the production server, use rtstop to unmount protected file systems, transfer
any current LFC data to the recovery server and unload the Assure MIMIX for
AIX Production Server Drivers.
/usr/scrt/bin/rtstop -FSC <Context ID>
NOTE
If you cannot unmount file systems, specify the fuser -c command to
locate the processes that are holding the file system(s) open.
4. On the recovery server, use rtstop to unmount any snapshot file systems and
unload the Assure MIMIX for AIX recovery server drivers.
/usr/scrt/bin/rtstop -FC <Context ID>
NOTE
If you cannot unmount file systems, specify the fuser -c command to
locate the processes that are holding the file system(s) open.
5. On the production and recovery servers, use rtattr to change the region size. See
the Region Size (blocks) column in the table above.
rtattr -C <Context ID> -o smf_<LV> -a Size -v <new size>
6. On the production and recovery servers, use scsetup to delete device special
files from /dev.
scsetup -XC <Context ID>
9. On the production and recovery servers, use scsetup to create new configured
logical volumes.
scsetup -MC <Context ID>
10. On the production and recovery servers, use rtdr to create a failover context if
one was configured.
rtdr -C <Primary Context ID> -F <Failover Context ID> setup
11. On the production server, use scconfig to wipe the state maps clean (0.000%
dirty).
scconfig -WC <Context ID>
12. On the Production and Recovery Servers, use rtstart to start Assure MIMIX for
AIX.
rtstart -C <Context ID>
13. On the production server, use “smit chfs” command to extend the file system.
NOTE
Any snapshot journal can be increased in size since all snapshot volumes
are available for use during snapshot creation/use.
8. Select one of the snapshot journals and increase the size of the volume.
/usr/scrt/bin/rtattr -C <Context ID> -o <ObjectName> -a Size
-v <new size value in bytes>
10. Remove the existing snapshot journal volume that you want to increase
rmlv -f <ObjectAttributeValue from step 9 on page 226 minus
the /dev/r prefix>
Starting in version 5.1.00.01, the Assure MIMIX for AIX portal application can
detect when a configured logical volume has been removed from the system while
the replication group is active. When this condition is detected, all actions to stop
the replication group from the Assure UI portal will be disabled, including the
planned failover and failback procedures.
IMPORTANT
Do not attempt to stop the replication group from the command line;
unloading the drivers will fail and you will not be able to restart the
replication group. If the production server is rebooted you will not be able
to restart the replication group.
Contact Support if you need assistance with these instructions.
1. On the production server, stop all applications associated with the replication
group.
2. On the production server, remove the device entries in the /dev/ directory for
the logical volume(s) that was removed from the system. Substitute the
example lv names with the actual lv names.
3. On the production server, remake the jfs or jfs2 logical volume(s) and
associated filesystem(s) or logical volume(s) with no associated file system
that was removed from the system.
IMPORTANT
The Logical Volume size, type and location must match the original size,
type and location it had before it was removed from the system, and the
filesystem must be recreated on the original volume group and mount
point it had before it was removed from the system.
IMPORTANT
Do not use the “S” option in the stop command.
6. On the production server, remove the jfs or jfs2 logical volume and
associated filesystem or logical volume with no associated file system that
was recreated in step 3 on page 227.
7. From the Assure UI portal, use the Configuration Wizard to remove the
logical volume(s) that were removed in step 6 on page 227 from the
replication group's configuration.
a. Click Configuration in the Replication Groups portlet to launch the
Configuration Wizard.
a. Click Add.
b. Select the Volume group for replicated data from the Select dropdown
list. Select the same volume group that originally contained the
replicated data.
c. Select only the removed logical volume(s) and the JFS outline log that
need to be protected.
d. Click OK, the logical volumes are added to the list of logical volumes
to be protected.
e. Click Next and navigate to the Summary panel.
f. Click Finish to remake the configuration.
8. From the Assure UI portal, start the replication group and verify the
replication group is active before starting your application(s).
1. Stop the cluster from managing the application associated with the
replication group.
a. From the Assure UI portal, click on the Clusters tab.
b. Click on the Select dropdown in the Clusters portlet.
c. Click Manage Applications.
d. In the Manage Application dialog click the No radio button then click
OK.
2. On the production server, stop all applications associated with the replication
group.
3. On the production server, remove the device entries in the /dev/ directory for
the logical volume(s) that was removed from the system. Substitute the
example lv names with the actual lv names.
IMPORTANT
The logical volume size, type and location must match the original size,
type and location the logical volume had before it was removed from the
system. Additionally, the filesystem must be recreated on the original
volume group and mount point it had before it was removed from the
system.
IMPORTANT
Do not use the “S” option in the stop command.
7. On the production server, remove the jfs or jfs2 logical volume and
associated filesystem or logical volume with no associated file system that
was recreated in step 4 on page 229.
8. From the Assure UI portal, use the Configuration Wizard to remove the
logical volume(s) that were removed in step 7 on page 229, from the
replication group’s configuration.
a. Click the Configuration action in the Replication Groups portlet to
launch the Configuration Wizard.
b. Click the Change Replication action button for the replication group
being changed.
c. Navigate to the Logical Volumes dialog. If the logical volume being
removed has an associated filesystem that uses a JFS outline log,
select all filesystems that use the outline log and the outline log.
d. Click the Remove action.
e. Click OK to remove the selected entries from the configuration.
If any of the filesystems that were selected to be removed from the
configuration because they use the same outline log, still require protection,
add them to the configuration.
The Add Logical Volumes dialog, (invoked from the Logical Volumes panel of the
New or Change Replication Group Configuration Wizard), checks to ensure that
unprotected file systems do not use the same jfs or jfs2 outline logs that a protected
file system uses.
For example, if there are 4 jfs2 logical volumes /lvfs1, /lvfs2, /lvfs3, /lvfs4 that use
the same jfs2log /dev/fsloglv00, and Assure MIMIX for AIX only protects /lvfs1,
you need to have separate jfs2logs for the protected and unprotected logical
volumes.
NOTE
When using inline logs with jfs2 file systems, there is no jfs2log isolation
requirement.
1. Unmount the file system that you will assign to the new jfslog.
umount /fs1
Assumption: The file system /jfslog is being removed from the Assure MIMIX for
AIX PVS. The Assure MIMIX for AIX protected file systems are unmounted, the
Assure MIMIX for AIX processes are stopped and the Assure MIMIX for AIX data
tap is unloaded.
1. If a jfslog exists in the volume group that is not currently part of the PVS, you
can assign that jfslog to the file system that is being removed from the PVS.
umount /jfslog
where nonrtjfslog is a jfslog that exists in the volume group but is not part the
PVS.
where <newjfslog> is the name of the jfslog that you are creating for the
non-protected file system to use.
To enable AIX Error Notification for one or more of the Assure MIMIX for AIX
errors, you must add a stanza to the errnotify ODM. When the error is logged, the
error notification daemon will call the notify method specified in the errnotify
ODM stanza. The errnotify ODM is located in the /etc/objrepos directory.
For example, the following errnotify stanzas are added to file /tmp/ern_1.add for
error labels SCRT_LFC_READ_ERROR, SCRT_LFC_WRITE_ERROR,
"SCRT_NETWORK_ERROR" and "SCRT_ABORT_ERROR.
errnotify:
en_name = "SCRT_LFC_READ_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
errnotify:
en_name = "SCRT_LFC_WRITE_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
errnotify:
en_name = "SCRT_NETWORK_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
errnotify:
en_name = "SCRT_ABORT_ERROR"
en_class = "S"
en_type = "PERM"
en_method = "/home/scrt/enm_1 $1"
ODM commands
The errnotify stanzas are added to the errnotify ODM using the following command.
odmadd /tmp/ern_1
To list the errnotify stanzas added for Assure MIMIX for AIX errors do the
following command.
odmget -q"en_name LIKE SCRT_*" errnotify|more
To delete all the errnotify stanzas added for Assure MIMIX for AIX errors:
odmdelete -q"en_name LIKE SCRT_*" -o errnotify
NOTE
Synchronize actions are not available when a synchronize action is already
in progress or the replication group processes are not running.
• To synchronize all logical volumes in the replication group use the following
command:
scconfig -C<Context ID> -B
Keep in mind the following while scconfig -B is running:
– Do not cycle Assure MIMIX for AIX or the servers.
– Ensure that you do not lose your terminal session, use (nohup scconfig -B
&).
In release 5.2.01.00, this option is only available from the command line.
• To synchronize all logical volumes in the replication group, use the following
command:
scconfig -C<Context ID> -T
If this command was interrupted and the command is re-executed, any specific
logical volume that is partially synchronized, 0.000% synchronized or
100.000% synchronized will be started from the beginning.
To access the Verify Logical Volume dialog, select Verify from the toolbar on the
Logical Volumes portlet.
Refer to the Assure MIMIX for AIX online help for specific information for this
dialog.
NOTE
You can also use the -L option with the scconfig command for specific
logical volumes to mark the state map zero percent dirty.
3. Save the protected data on the production server to tape or disk. You must save
at the LV level in block sequence.
– Save to tape:
dd if=/dev/db2 of=/dev/rmt0 bs=1024
– Save to disk:
dd if=/dev/db2 of=/dev/db2bu bs=16m
6. Restore the data from tape or disk to the Replica on the recovery server.
NOTE
The restore must be done to the pt LVs.
7. Start Assure MIMIX for AIX on the recovery server. The changes made after
the save to tape or disk synchronize to the recovery server.
rtstart -C <Context ID>
Overview
Assure MIMIX for AIX supports Live Partition Mobility for partitions running AIX
5300-07 or later or AIX 6.1 or later or AIX 7.1 or later on POWER6 or POWER7
technology-based systems. Live Partition Mobility allows a partition that is
replicating with Assure MIMIX for AIX to be migrated to another system without
interrupting replication.
To make Assure MIMIX for AIX Migration-aware, when you install Assure
MIMIX for AIX, the es_migrate script is automatically registered on systems
capable of Live Partition Mobility, with the Dynamic Logical Partition (DLPAR)
function of AIX.
-------------------------------------------------------------------
/usr/lib/dr/scripts/all/es_migrate MIMIX DR for AIX Partition Migration Script
Vendor:Vision Solutions, Version:5.1.0.1, Date:2016.10.21
Script Timeout:10, Admin Override Timeout:0
Memory DR Percentage:100
Resources Supported:
Resource Name: pmig Resource Usage: Partition migration
-------------------------------------------------------------------
Migrating a partition
Each time you migrate a partition, if the Assure MIMIX for AIX license is valid,
Assure MIMIX for AIX will be licensed to run on the migrated partition for up to 30
consecutive days. When a partition is migrated back to the original server it was
migrated from, the license expiration date is returned to its original value.
IMPORTANT
When you migrate a partition back to the original server it was migrated
from, you may not get the partition ID the partition originally had. In this
case, the expiration date in the license file will be changed to expire in 30
days.
IMPORTANT
Assure MIMIX for AIX configuration changes are not allowed if a
partition has been migrated from its original server. To make configuration
changes, the partition must be migrated back to its original server.
Extending protected file systems is allowed while in the failover mode. In
addition, follow the steps in “Extending a protected file system beyond the
limit of the region (block) size” on page 224 if it is applicable due to
increase in the size.
• Kills all processes associated with a mounted file system that is configured in an
Assure MIMIX for AIX Context ID.
If “rtstop” fails for a Context ID it will be marked failed and then continue to do
“rtstop” on any remaining Context IDs. After all the Context IDs are processed the
“exit” status will be set to “0”, or a value equal to the number of Context IDs that
were previously marked as failed. The reason for failure will be recorded in the
“/usr/scrt/log/rn_shutdown.out” file. A non “0” exit will abort the AIX shutdown.
IMPORTANT
To gracefully stop replication groups that are not configured in an Assure
MIMIX for AIX clustered environment, that are running in a production or
recovery role on the node where the AIX “shutdown” command is
executed, use the /usr/scrt/bin/rn_shutdown script in conjunction
with “/usr/EchoCluster/bin/AIX/gc_shutdown.sh.”
• Stops all configured user applications protected in an Assure MIMIX for AIX
clustered environment on the node where the AIX “shutdown” command is
executed.
IMPORTANT
Each application must be configured with an application stop script.
• Stop all resources (such as, Service IP Address) if configured with the user’s
application on the node where the AIX “shutdown” command is executed.
• Assure MIMIX DR for AIX replication group associated with the user’s
application on both cluster nodes.
• Stops cluster services on the node where the AIX “shutdown” command is
executed.
IMPORTANT
To gracefully stop replication groups that are not configured in an Assure
MIMIX for AIX clustered environment, that are running in a production or
recovery role on the node where the AIX shutdown command is executed,
use the /usr/scrt/bin/rn_shutdown script in conjunction with
/usr/EchoCluster/bin/AIX/gc_shutdown.sh.
With Assure MIMIX DR for AIX version 5.0 drivers loaded on AIX 6.1 nodes
without APAR IV50876 installed, or your AIX 7.1 nodes without APAR IV50842
installed, the AIX commands topas -V, fileplace and lvmstat will not function
correctly. Assure MIMIX DR for AIX will prevent a system crash by trapping the
FP_IOCTLs and returning ENOTSUP.
If your AIX 6.1 nodes have APAR IV50876 installed, or your AIX 7.1 nodes have
APAR IV50842 installed, Assure MIMIX DR for AIX version 5.0 will not trap the
FP_IOCTLs. The AIX commands topas -V, fileplace and lvmstat will function
correctly.
This chapter describes working with Assure MIMIX for AIX applications.
Overview
You can apply the redo and undo logs from the recovery server to rollback
application data on the production server and restore the data to an earlier point in
time. When you rollback the application data the information is synchronized with
the replica on the recovery server similar to any other write action.
Keep in mind that the majority of all database recovery operations require
database repairs that are typically performed by Database Administrators (DBAs).
LAN
Recovery
Production Server
Server
Data tap
Data tap
Application Undo
Undo Replica
Data Storage Storage
Logs Logs
In the resurrection scenario, the database is crashed and will not be coming back up.
If you need to rollback to 55 seconds, you do not need to restore last day’s full
backup and replay a full week of database redo (roll forward) logs. In this case,
Assure MIMIX for AIX can rollback a totally crashed and burned database in a
matter of moments—many orders of magnitude faster.
Keep in mind that during a production restore, you do not need the database instance
up. In fact, the database cannot be up while Assure MIMIX for AIX rolls its image
around on disk. By definition, the database is down since it crashed. As a result, the
best and fastest way to restore your database is while it is still down. The Assure
MIMIX for AIX production restore operation provides this capability.
LAN
Recovery
Production Server
Server
Snapshot
Containers Containers
Application Replica
Data Storage Storage COW
In this repair scenario, only some pieces of data are corrupted but the production
database is still running. After you restore a historical, non-burning image of the
database on the snapshot, you can pull pieces out of it and put them back into the
live production database to repair the bad pieces of data. You can put out the fire
with information. Assure MIMIX for AIX restores automatically on the back-end
snapshot.
Using the Assure MIMIX for AIX snapshot to visualize complete historical images
of the database enables DBAs to forgo spending time on a daily basis saving pieces
of database for possible use at a later time. Also, DBAs have access to the exact
historical image they want (typically by reviewing logs, etc.).
• “Step 1: Stop Assure MIMIX for AIX on the production server” on page 250
• “Step 8: Restart Assure MIMIX for AIX on the production server” on page 253
• “Step 10: Mount the volumes for the context” on page 255
EXAMPLE:
In this example, the Context ID is 1. The volume that needs to be replaced is
/dev/rrtctx1. Display information about the volume.
sclist –C 1 –t pdfc
The output shows that /dev/rrtctx1 belongs to volume group rtvg1. The volume size
is shown in bytes (1073741824).
Object: <loglv00>, Type: <SCRT/containers/PDFC>, Serial <355>:
HostName (string) = <production> (No Default)
FileName (string) = </dev/rloglv00> (No Default)
HeaderFileName (string) = </usr/scrt/loglv00.hdr> (No Default)
Size (longlong) = <16777216> (No Default)
DiskGroupHint (string) = <rtvg1> (No Default)
Type (string) = <jfslog> (raw)
Location (string) = <No Value> (No Default)
AccessGroupID (int) = <0> (0)
Object: <rtlv2>, Type: <SCRT/containers/PDFC>, Serial <362>
HostName (string) = <production> (No Default)
FileName (string) = </dev/rrtlv2> (No Default)
HeaderFileName (string) = </usr/scrt/rtlv2.hdr> (No Default
To create the snapshot, enter one of the following commands on the recovery server:
scrt_ra –C <Context ID> –D <time>
or
scrt_ra –C <Context ID> –t <LFCID>
The allowable formats for times entered with the –D option are specified in the file
identified by the DATEMSK environment variable.
EXAMPLE:
In this example, the LFCID is 158. The Context ID is 1.
scrt_ra –C 1 -t 158
IMPORTANT
Do not try to validate the information by running the application on the
production server after the data is restored. If the data turns out to be
incorrect, you need to rollback the data included in all previous attempts to
restore data.
If the data is valid, stop the application. Note the Context ID and the time or LFCID
used to create the snapshot.
If the data is not valid, then stop the application and create the snapshot again as
described in “Step 3: Create a snapshot on the recovery server” on page 251.
Continue until you have a snapshot with valid data.
EXAMPLE:
The Context ID in this example is 1. Enter the following command:
scrt_ra -C 1 –W
or
scrt_ra –C <Context ID> –t <LFCID>
EXAMPLE:
dd if=/dev/rrtctx1 ibs=1024k | rsh production_server_name -l root
dd of=/dev/rrtctx1 obs=1024k
Restarting Assure MIMIX for AIX on the production server begins a region
synchronization between the production server and the recovery server. Monitor the
system log or the Assure MIMIX for AIX log until the following message appears:
-------------------------------------------------------------
--- Dynamic SuperTransaction recovery complete ---
-------------------------------------------------------------
You will be prompted for the time or LFCID that you used to create the snapshot.
rc>
At this prompt, enter l for LFCID or t for “time.” Use the same information that you
used to create the snapshot in “Step 3: Create a snapshot on the recovery server” on
page 251.
> l
You may need to use fsck on the file systems before they can be mounted. If this is
necessary, then unmount the volumes using the rtumnt command, run fsck, and then
mount the volumes using the rtumnt command shown above.
NOTE
Even if you decide later that you would like to pick a better time or a
LFCID, you can roll forward or rollback to that point. This is possible
because Assure MIMIX for AIX keeps all of the change information on
the recovery server.
Before starting the rollback, validate the rollback location using a snapshot and stop
all applications that are using the logical volumes. Use the Replication Group portlet
on the Replication page to stop the replication group. During the rollback, file
systems must be unmounted. For rollback progress, see the Production Server
Rollback portlet on the Recovery page.
NOTE
Data changed on the production server between the rollback location and
your most recent changes will be lost.
• Select Rollback, from the Action dropdown menu on the Recovery portlet.
• Click Rollback on the Production Server Rollback portlet.
NOTE
For 3-node configurations the first panel that displays is Select Replica.
Refer to “Broadcast replication configuration” on page 55 for additional
information.
NOTE
For 2-node configurations, the first panel that displays is Unmount File
Systems.
• Point in Time—Specify the date and time, within the rollback window,
where the production server will be rolled back to.
• Container ID—Specify the container ID, within the rollback window, where
the production server will be rolled back to.
• Event Marker—Specify the event marker, within the rollback window,
where the production server will be rolled back to.
2. Unmount the protected volumes on the production server. Enter the following
command:
rtumnt -C <Context ID>
rc>
5. You can choose to rollback the production server to a specific time or a specific
LFCID. If the current redo LFCID is 36 and you choose to rollback to LFCID
30, you should see output similar to the following:
You have requested an incremental LFC restore
from Tue May 15 15:02:08 2018 (1526396528)
to Mon May 14 15:38:03 2018 (1526312283),
LFCIDs 36+ to 30.
c(ontinue) or a(bort)?
6. Enter c to continue.
Rolling LFC restore status
--------------------------
Production at LFCID 34
Production at LFCID 32
Production at LFCID 30
Production restored to LFCID 30.
Backingstore remains stable at LFCID 36
rc>
NOTE
Although you have rolled back the production server to LFCID 30, the
snapshot on the recovery server still contains information up to and
including LFCID 36. This allows you to verify that LFCID 30 is the one
you want to rollback to. If it is not, enter abort. Determine the proper
LFCID and start the production restore shell as described in step 3.
8. On the production server, mount the protected volumes. Enter the following
command:
rtmnt -C <Context ID> -f
Assure MIMIX for AIX supports media management products such as Tivoli
Storage Manager on AIX platforms. This chapter contains:
• Create a snapshot on the recovery server. You may want to do this if you need
to restore the data on the recovery server or if you want additional copies of
the data for data mining.
• Create additional copies of the data on alternate devices, such as, logical
volumes.
NOTE
The Tivoli Storage Manager must be defined in the Assure MIMIX for
AIX configuration before using this command.
2. Unmount the protected volumes on the production server. Enter the following
command:
rtumnt -C <Context ID><Context ID>
3. On the recovery server, start the production restore shell. Enter the following
command:
scrt_rc -C <Context ID>
rc>
Available VFBs:
--------------------------------------------------------
1529916625 (LFCID: 1342): Mon Jun 25 08:50:25 2018
1526957343 (LFCID: 1986): Tue May 22 14:49:03 2018
1526609678 (LFCID: 1456): Fri May 18 14:14:38 2018
rc>
If you are restoring from archived data, then the LFCID that you choose
identifies data that is no longer stored on the recovery server.
NOTE
Although you have rolled back the production server to LFCID 1360, the
snapshot on the recovery server still contains information up to and
including LFCID 1464. This allows you to verify that LFCID 1360 is the
one you want to rollback to. If it is not, enter abort. Determine the proper
LFCID and start the production restore shell as described in step 3 on
page 264.
10. On the production server, mount the protected volumes. Enter the following
command:
rtmnt -C <Context ID> -f
Introduction
Assure MIMIX for AIX provides highly available data recovery and protection
through failover, resync, and failback operations on both local and remote
recovery servers, providing solutions for local failovers, as well as remote
failovers.
In a local failover operation, the production server fails over to a local (on-site)
recovery server, whereas in a remote failover operation, the production server fails
over to a remote (off-site) recovery server. Failing over to a remote server is
defined as Disaster Recovery, since it applies to the situation where an entire site
is lost due to a disaster such as a flood or hurricane. Disaster recovery is also used
for situations where failing over to a local server is adequate for keeping data
available and applications running.
• “How Assure MIMIX for AIX Disaster Recovery works” on page 268
• resync— bring the production server data up to date with the recovery server
data
These operations use remote mirror Backing Store File Container (BSFC) replicas
as the “failover-production” data. This requires only a configuration change and no
data movement, resulting in reducing the time that is required to resync the volumes
after switching sites.
Use of the statemap tracking mechanism increases the efficiency of the data resync,
as it minimizes the amount of data transfer that must take place. The Production
application continues to run during the resync operation.
Since the failover process puts the volumes into a suspended state, changes are
tracked within a statemap. Assuming that recording these changes is enabled, only
the changed data is sent to the production site to synchronize the volumes. This
reduces the time required to complete the failback operation.
NOTE
You can choose to configure the recovery server as the failover server
where you temporarily run the production system.
This transition requires that the configuration of the failover operation is specified in
a failover context. You create a failover context to configure a context that is
specific to your failover operations.
The failover context provides Assure MIMIX for AIX with the configuration
information used to operate as if the original recovery was actually a production
server, and the original production server, when it comes back on-line, was actually
the recovery server.
You set up the failover context in association with the primary context for the
production server. This failover context enables you to perform the system failover
and failback operations, including managing all the required transitions from the
Production environment to the recovery server.
To accomplish this, Assure MIMIX for AIX uses the block-ordered replica
maintained on the recovery as the failover disk, on which it brings up the production
server.
A failover context is the same as its associated primary context, where all
relationships and flows are reversed. Many of the logical volumes used by the
primary context, such as LFCs, are shared with the failover context. However, some
of the volumes are not shared and therefore may be in different locations depending
on which server the production is operating on. For example, on the production
server during normal operations, or on the recovery during failover operations.
After you setup a failover context by using the rtdr command, all other attributes
used with the rtdr command can then take the primary Context ID. This provides
an environment in which it appears as if you are dealing with one context, either in a
primary or a failover mode. The underlying configurations are really two different,
tightly related, contexts. You only need to remember the primary Context ID to
failover, resync, and failback.
Similar to a primary context, the failover context has all the information for all the
servers in a configuration. However, the difference is that the configuration settings
the failover context contains are derived from the primary context and that it shares
several attributes with the primary context.
In the same configuration, to change the failover server to the recovery server
change the configuration associated with the failover context by specifying the
hostname of the recovery server in the following command:
rtdr -C <Primary Context ID> -F <Failover Context ID> setup
Use the Assure MIMIX for AIX disaster recovery and failover rtdr command to:
Syntax
rtdr -C <ID> -fhmnqv failover | resync | failback
rtdr -C <ID> -F <ID> [-s <hostname>] [-fhmnv] setup
NOTE
If you have changed a primary context you must delete the corresponding
failover context and recreate the failover context.
2. Creating a failover context using the rtdr command. This command creates a
failover context definition associated with the primary context. Refer to
“Preparing before a Failover” on page 273.
• Failover
• Resync
• Failback
where:
<Primary Context ID> is the primary context on the production server, and
<Failover Context ID> is the failover context on the recovery server.
NOTE
If there is more than one target server and you do not want to use the
default for the Failover Server, then use the -s <Hostname> option on
the rtdr command to specify the Failover Server. Refer to “Syntax” on
page 271 for the rtdr command.
You can perform failover operations after you validate the replica.
• “Using the Assure UI portal to perform failover and failback” on page 275
• “Using the command line to perform failover and failback for 2-node
non-clustered replication groups” on page 299
• “Using the command line to perform failover and failback for 2-node
clustered replication groups” on page 307
• “Failover and failback for 2-node clustered replication groups” on page 283
Run a procedure
The Procedures portlet and Steps portlet on the Procedures page will guide you
through each step.
1. Use one of the following methods to run a procedure. Use the method that is
most convenient for you based on the page you are currently on:
2. Use one of the following methods to run the next step in a procedure (if
required) or retry a failed step:
• From the Replication Group portlet, select Resume Procedure from the
Actions dropdown for a specific replication group.
• From the Procedures portlet, select Resume from the Actions dropdown for
a specific procedure and replication group.
• From the Steps portlet, select Resume from the Action toolbar.
NOTE
Before failing over, all applications using the logical volumes must be
stopped. You may want to validate that your applications run properly on
the failover server by using a snapshot and running your applications with
the snapshot.
Sequence Step Dialog for this step Command for this step
Number
10 Unmount the file “Run Planned Failover rtumnt -R -C <Primary Context id>
systems on the Procedure dialog for 2-node
current production non-clustered replication
server; groups: Unmount file systems
on current production server”
on page 278
Sequence Step Dialog for this step Command for this step
Number
NOTE
When the procedure completes, and you are ready to move production
back to the configured production server, run the failback procedure.
The Steps portlet shows the steps that you must manually run to perform an
unplanned failover for 2-node non-clustered replication groups.Each step is run,
with pauses after each step.
Table 5 on page 279 shows what steps you must manually run to perform an
unplanned failover for non-clustered replication groups.
Sequence
Step Dialog for this step Command for this step
number
Sequence Step Dialog for this step Command for this step
number
40 Failover the “Resume Unplanned Failover Procedure rtdr -q -C <Primary Context id>
replication group. dialog for 2-node non-clustered failover
Server roles replication groups: Failover replication
change. group: Server roles change” on page 281
3. Mount the file systems and run your applications with the snapshot to validate
the rollback location.
4. You may need to use more than one snapshot to find a valid location.
You can decide whether or not you have a validated rollback location and how to
proceed. If you decide to create another snapshot, the procedure returns to the
Create snapshot step and waits for you to resume the procedure. The dialog for this
step is the Resume - Create Snapshot dialog and not the Run Procedure dialog.
• If you select Yes, continue to the next step. If selected, click OK to delete the
snapshot and go to the next step in the procedure. Note, that you will need to
use the Resume action.
• If you select No, return to previous step and create another snapshot with a
different rollback location. Note, that you will need to use the Run action.
NOTE
The servers in the replication group have changed now that the replication
group has failed over.
The Steps portlet shows the steps for the failback procedure selected in the
Procedures portlet.
Table 6 on page 282 shows what steps you must manually run to perform a failback
for 2-node non-clustered replication groups
Sequence Step Dialog for this step Command for this step
number
10 Unmount the file systems “Run Failback Procedure dialog rtumnt -R -C <Failover
on the current production for 2-node non-clustered Context id>
server replication groups: Unmount file
systems on current production
server” on page 283
Sequence Step Dialog for this step Command for this step
number
NOTE
After this step completes, applications can be started on the configured
production server.
NOTE
The Run Planned Failover Procedure dialog displays. This is the only
dialog displayed when running a Planned Failover procedure for a 2-node
clustered replication group. All the steps are run continuously, without
user interaction.
After the procedure completes successfully, the replication starts from the new
production server to the new recovery server. When you are ready to move the
production server back to the configured production server, run the Failback
procedure. See “Failback for 2-node clustered replication groups” on page 288.
The Steps portlet shows the steps that are run by the cluster in the case of unplanned
failover. Steps 10 and 20 need to be run by the user one step at a time. Steps 30-50
are run by the cluster continuously.
Table 7 on page 286 shows the steps that are run by the cluster.
Sequence Step Dialog for this step Command for this step
number
followed by
rtmnt -f -C <Primary
Context id>
Sequence Step Dialog for this step Command for this step
number
3. Mount the file systems and run your applications with the snapshot to validate
the rollback location.
4. You may need to use more than one snapshot to find a valid location.
You can decide whether or not you have a validated rollback location and how to
proceed. If you decide to create another snapshot, the procedure returns to the
Create snapshot step and waits for you to resume the procedure. The dialog for this
step is the Resume - Create Snapshot dialog and not the Run Procedure dialog.
• If you select No, return to previous step and create another snapshot with a
different rollback location. Note, that you will need to use the Run action.
NOTE
The servers in the replication group have changed now that the replication
group has failed over.
NOTE
The Run Failback Procedure dialog displays. This is the only dialog
displayed when running a Failback procedure for a clustered replication
group. The steps are run automatically by the cluster.
NOTE
Before failing over, all applications using the logical volumes must be
stopped. You may want to validate that your applications run properly on
the failover server by using a snapshot and running your applications with
the snapshot.
IMPORTANT
In the first step of the Planned Failover Procedure dialog, you can select
either Recovery 1 or Recovery 2 to be the New production server.
Table 8: Run Planned Failover for 3-node non-clustered broadcast replication groups
Sequence Step Dialog for this step Command for this step
Number
10 Unmount the file “Run Planned Failover rtumnt -R -C <Primary Context id>
systems on the Procedure dialog for 3-node
current production non-clustered broadcast
server; replication groups: Unmount
Note: You can select file systems on current
the failover recovery production server” on
server on this dialog. page 292
Sequence Step Dialog for this step Command for this step
Number
NOTE
When the procedure completes, and you are ready to move production
back to the configured production server, run the failback procedure.
After this step completes and you are ready to move production back to the
configured production server, run the Failback procedure.
IMPORTANT
In the first step of the Unplanned Failover Procedure dialog, you can select
either Recovery 1 or Recovery 2 to be the New production server.
Table 9 on page 294 shows the commands that are run on the servers, while you
execute the procedure steps with the Assure UI portal.
Table 9: Run Unplanned Failover for 3-node non-clustered broadcast replication groups
Sequence Step Dialog for this step Command for this step
Number
Sequence Step Dialog for this step Command for this step
Number
30 Rollback the failover “Resume Unplanned Failover (if by Point in Time or Event
server. Procedure dialog for 3-node Marker)
non-clustered broadcast scrt_ra -F -S <timestamp> <Primary
replication groups: Rollback Context id>
failover server” on page 296
(if by Container ID)
scrt_ra -F -t <Container ID> -C
<Primary Context id>
NOTE
When the procedure completes, and you are ready to move production
back to the configured production server, run the failback procedure.
To create a snapshot:
3. Mount the file systems and run your applications with the snapshot to validate
the rollback location.
4. You may need to use more than one snapshot to find a valid location.
You can decide whether or not you have a validated rollback location and how to
proceed. If you decide to create another snapshot, the procedure returns to the
Create snapshot step and waits for you to resume the procedure. The dialog for this
step is the Resume - Create Snapshot dialog and not the Run Procedure dialog.
• If you select Yes, continue to the next step, click OK to delete the snapshot
and go to the next step in the procedure.
• If you select No, return to previous step and create another snapshot with a
different rollback location, click OK to return to previous step and create
another snapshot with a different rollback location.
When this step completes, applications can be started on the new production server.
Table 10 on page 298 shows the commands that are run on the servers, while you
execute the procedure steps with the Assure UI portal.
Sequence Step Dialog for this step Command for this step
Number
10 Unmount the file “Run Failback Procedure rtumnt -R -C <Primary Context id>
systems on the dialog for 3-node non-clustered
current production broadcast replication groups:
server. Unmount file systems on
current production server” on
page 298
After replication stops and before you continue, it is recommended that you create a
snapshot on the failover server and validate that your applications run properly.
After this step completes, applications can be started on the configured production
server.
NOTE
If you have changed a primary context you must delete the corresponding
failover context and recreate the failover context.
2. Creating a failover context using the rtdr command. This command creates a
failover context definition associated with the primary context. Refer to
“Preparing before a Failover” on page 300.
• Failover
• Resync
• Failback
where:
NOTE
If there is more than one target server and you do not want to use the
default for the Failover Server, then use the -s <Hostname> option on
the rtdr command to specify the Failover Server. Refer to “rtdr” on
page 315.
2. Mount filesystems:
rtmnt -C <Context ID>
4. Unmount filesystems:
rtumnt -C <Context ID>
2. Mount filesystems:
rtmnt -C <Context ID>
4. Unmount filesystems:
rtumnt -C <Context ID>
You can perform failover operations after you validate the replica.
Failover operations
This section describes how to perform the failover operations from the production
server to the recovery server when the production server has failed.
IMPORTANT
Do not perform a failover restore on an invalidated restore target. After
validating the replica, the failover procedures are failover, resync, and
failback.
NOTE
Do not do execute scrt_ra -X -F if you want to failover to the latest
point in time.
NOTE
This stops the aba for the primary context.
Performing resync
A resync operation is required when the Production Volumes and Recovery Replica
Volumes diverge. This occurs after a failure of the production server and a failover
to the recovery server. When the application is started on the recovery server the
updates result in a divergence from the data on the production server.
After restoring the original production server, use the resync operation to ensure that
the production server data is current with the recovery server data.
To resynchronize the revived production server from the recovery server, on all
servers (production, recovery):
rtdr -qC <Primary Context ID> resync
• On recovery server:
start lca for failover context
IMPORTANT
The resync operation assumes the original production data was not lost,
and is available in its entirety after the production server is revived. If the
production data is lost, the statemap on the recovery server must be
marked as dirty prior to resync. This forces a complete region recovery
and initializes the production data.
On the recovery server, which is now the acting production server, execute:
rtstop -C <Failover Context ID> -F
scconfig -C <Failover Context ID> -M
rtstart -C <Failover Context ID> -M
Wait for the region recovery completes before performing the failback procedure, as
described in “Performing failback” on page 304.
NOTE
Performing a complete region recovery should be avoided since this will
require production down time and significant network resources.
You can mark only specific state maps dirty, using the -L option of the scconfig
command. Refer to “CLI Assure MIMIX DR for AIX Commands” on page 309 for
more information.
Performing failback
IMPORTANT
Before you failback ensure that you stop the application on the recovery
server.
The data can be restored to the production server from the recovery using the
following procedure after the necessary volume groups and logical volumes have
been recreated.
2. On the production server execute the following command which will start
scrt_aba:
rtdr -qC <Failover Context> resync
3. On the recovery server execute the following command which will unmount
any filesystems if the context has filesystems and unload the drivers.
rtstop -FC <failover context>
4. On the recovery server execute the following command which will mark the
state maps dirty.
scconfig -MC <failover context>
5. On the recovery server execute the following command which will synchronize
the data to the production server.
rtdr -qC <Failover Context ID> resync
6. On the recovery server execute the following command which will show the
state maps.
scconfig -PC <failover context>
7. If the state maps are not clean on the recovery server wait until all the data is
synchronized to the production server.
8. On the production server execute the following command which will create a
snapshot to allow the integrity of the data to be checked.
9. On the production server execute the following command if the context has
filesystems. This command will mount the filesystems.
rtmnt -C <failover context>
11. On the production server execute the following command which will unmount
any filesystems which were mounted in step 9.
rtumnt -C <failover context>
12. On the production server execute the following command which will remove
the snapshot created in step 8.
scrt_ra -WC <failover context>
13. On the recovery servers execute the following command which will failback to
the primary context.
rtdr -qC <Failover Context ID> failback
14. On the production server execute the following command which will failback to
the primary context.
rtdr -qC <Failover Context ID> failback
Usage
You can use the esmon command to view the current utilization of the configured
production server Log File Containers (LFC) for a specified Context ID.
Syntax
esmon <Context ID>
where:
• Total LFC Size = 6000 MB: The number of LFCs configured (400), multiplied
by container size (16 MB), minus the space required for metadata.
NOTE
In the Assure UI Replication Group Details portlet the Total size would be
6.3 GB: The number of containers configured (400), multiplied by
container size (16 MB) = 6400 MB converted to 6.25 GB and rounded to
6.3 GB.
• Free size = 5931 MB: The Total LFC Size minus Used Size.
• Used Size = 69 MB: 69 MB of data writes stored in LFCs waiting to be applied
to the replica.
• Usage = 2/100: The percentage of LFCs used. Used Size divided by Total LFC
Size.
Usage
You can use the extend_replica_lv command to force the expansion of a Replica LV
(Logical Volume) that is associated with a specified PVS (Production Volume Set)
LV, so that the Replica LV will be equal in size to the PVS LV. This command will
only run on the production server and the LCA must be active.
Syntax
extend_replica_lv -C <Context ID> -L <PVS LV>
extend_replica_lv -h help
-C <Context ID>
-L <PVS LV>
-h Help, prints this usage
NOTE
This command is only required for PVS LVs that are extended and have no
associated file system, or PVS LVs that have an associated file system with
an outline log and the file system is extended.
es_swmaj
Usage
Use this command to switch and restore the LV major number for all LVs in an
Assure MIMIX for AIX context. This will allow AIX LVM commands such as
"migratepv" to function when the Assure MIMIX for AIX drivers are loaded.
Syntax
es_swmaj -C <Context ID> [-s | -f | -u]
-s: Switch to AIX major number(raw LVs are excluded)
-f: Switch to AIX major number(including raw LVs)
-u: Restore Assure MIMIX DR for AIX major number
-v: Verify switch to AIX major number.
IMPORTANT
If the "-f" option is selected for raw LVs, they must be open before
executing this command. The raw LVs must remain open and no additional
opens should be done until this command is executed with the "-u" option.
Usage
You can use the rn_temp_journal command to create a temporary write journal
which will extend the snapshot buffer size available for a snapshot. This command
is executed on the node that has the recovery server role. It can be executed before
or after a snapshot is created. When the snapshot is deleted the temporary write
journal may be removed.
NOTE
This command can be issued up to a maximum of 64 times to add
additional space on different volume groups.
Syntax
The following parameters are used:
-C <Context ID>
-v <Volume Group for temp LV>
-s <Size of temp LV in Logical Partitions>
-r Remove temporary Write Journals.
-h help
When creating a temporary write journal LV, allocating and configuring the
associated storage pool the "-C, -v and -s" parameters are required. To deallocate a
storage pool and remove the temporary write journal LV the "-C and -r" parameters
are required.
IMPORTANT
If the snapshot buffer has reached 100%, the command will fail and you
will see the following message:
[rn_temp_journal] ERROR: The Write Journal extension cannot be
created for Context ID (nn). A Write Journal extension cannot be created
when the Write Journal pool is full. Delete the snapshot, create the Write
Journal extension and create the snapshot. [rn_temp_journal] ERROR:
Failed to create the Write Journal extension.
Usage
Use this command to query and change attributes in the Assure MIMIX for AIX
Object Data Manager (ODM) files:
• SCCuAttr
• SCCuObj
• SCCuRel
Syntax
rtattr -C ID [-a attribute] [-o object] [-t type]
rtattr -C ID -a attribute -v value {-o object | -t type}
rtattr -h
-a Attribute for query/edit (ObjectAttributeName)
-C <Context ID>
-h Help, prints this usage
-o Object for query/edit (ObjectName)
-t Type for query/edit (ObjectType)
-v Value for edit (ObjectAttributeValue)
You can use the –v parameter with the commands to edit. If you do not specify the –
v parameter only query is available.
Example 1
View all the machine hostids:
rtattr -C <Context ID> -a HostId
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 4
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f7"
ObjectAttributeType = "ulong"
SerialNumber = 4006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
SCCuAttr:
ObjectName = "replica"
ConfigObjectSerial = 8
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f2"
ObjectAttributeType = "ulong"
Example 2
View only the production server’s hostid:
rtattr -C <Context ID> -o production -a HostId
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 16
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f3"
ObjectAttributeType = "ulong"
SerialNumber = 16006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
Usage
This command manages Assure MIMIX for AIX's disaster recovery processes as
well as failover and failback operations. Given a primary context <X> configured
on both a “Production” and a “Recovery” Server, note:
• To failback to the production server, first on the recovery server, and then on the
production server execute:
rtdr -C <X> failback
NOTE
A failover context associated with a configured primary context must be
created and setup prior to executing a failover. The failover Context ID is
arbitrary, but must be unique on the associated servers.
• To create and setup a failover context, on both the production and recovery
servers:
rtdr -C <X> -F <Y> setup
• Prior to failover, you should validate the data integrity of the Replica, and if
necessary, validate the data if necessary.
To validate data integrity of the Replica, create a snapshot of the replica, then
analyze it with the application itself. To create the snapshot, on the recovery
server:
scrt_ra -C <X> -X
• Given a corrupt replica, and a validated restore point in time, use a failover
restore to roll the actual data replica, not a snapshot of the replica, back to the
validated point in time. To perform a failover restore, on the recovery server
execute:
scrt_ra -C <X> -F [-D | -S | -t]
NOTE
Do not perform a failover restore on an invalidated restore target. After
validating the replica, the disaster recovery procedure is failover, resync,
then failback.
After failover, start the application on the recovery server. It will be the acting
production environment until failback. All data modifications will be tracked
and shipped back to the original production server by resync.
After reviving the original production server, use resync to bring the production
server data up to date with the recovery server data.
After the resync completes, use failback to return to the original production and
recovery server roles. Both the production and recovery servers must be live to
execute resync and failback.
NOTE
Resync assumes the original production data was not lost, and is available
in its entirety after the production server is revived. In the event that
production data was lost, statemap on the recovery server must be dirtied
prior to resync to force a complete region recovery, and re-initialize the
production data.
• To dirty all statemaps, in the failover context on the recovery server (the acting
production server):
rtstop -C <X> -F
scconfig -C <X> -M
NOTE
Do not perform a failover restore on an invalidated restore target.
After failover, start the application on the recovery server. It will be the
acting production environment until failback. All data modifications will be
tracked and shipped back to the original production server by resync.
After reviving the original production server, use resync to bring the
production server data up to date with the recovery server data.
After resync completes, use failback to return to the original production and
recovery server roles. Both the production and recovery servers must be live
to execute resync and failback.
NOTE
Resync assumes the original production data was not lost, and is available
in its entirety after the production server is revived. In the event that
production data was lost, statemap on the recovery server must be dirtied
prior to resync to force a complete region recovery, and re-initialize the
production data.
• To dirty all statemaps, in the failover context on the recovery server (the acting
production server):
rtstop -C <X> -F
scconfig -C <X> -M
rtstart -C <X>
After a system failover, Assure MIMIX for AIX cannot rollback to a point in time
before the failover. Likewise, after a system failback, Assure MIMIX for AIX
cannot rollback to a point in time before the failback. For this reason, replica data
integrity should be validated with a snapshot prior to executing a failover.
Syntax
rtdr -C <ID> [-fhmnqv] failover | resync | failback
rtdr -C <ID> -F <ID> [-s <hostname>] [-fhmnv] setup
-C Context ID (of the "primary" context)
-F Failover Context ID
-f Forced execution (use with caution)
-h Help, prints usage
-m Man style help
-n No execution, just print commands
-q quiet, do not ask for confirmation
-s Select failover site server from multiple recovery
servers (default is first replication hop's server.)
<hostname> must be a configured SCRT/info/host HostName
attribute.
-v Verbose output
Usage
Event Markers are tags that mark points in time or points in process that are
significant to you for the purposes of recovery. An Event Marker can be selected as
the Recovery Point Objective (RPO) during a data restore. They are typically
needed for applications which cannot take advantage of Assure MIMIX for AIX’s
Any Point-In-Time (APIT) data restores along with applications which do not have
live transactional durability on disk.
The following is an example of a script that could be called, with as many arbitrary
attributes to the event that you want, in addition to the time and date attribute
automatically assigned by rtmark. The customer-defined attributes between the cat
line and the second EOF would also be added to the event. The entire event would
be copied to the recovery server, and available for viewing and selection during
restores.
#!/usr/bin/ksh
cat <<-EOF | /usr/scrt/bin/rtmark -C <Context ID> -
name = test1
description = "This is a test."
owner = dave
priority = 2
another_attribute = “Just another attribute”
EOF
Syntax
rtmark [-C ] [-s <num>|-d <str>] [-iV] [<file>|-]
rtmark -rC <Context ID>
rtmark -h
-C ID Event is specific to Context ID.
-d <str> Date string, overrides event time.
-h Help, display this message.
-i Interactive query for event attributes.
-r Copies event marks from the production server to the
Recovery Server
-s <num> Seconds since epoch, overrides event time.
-V Print version.
<file> File containing the event mark attributes.
The following examples show the parameters you need to use in your application;
an event marker file, and a script that calls rtmark:
#example call in your application
<Context ID> <name> <description>
# example event mark script that calls rtmark and emf_1 file
rm -f /tmp/emf_1
printf "name = ${2}\n" >/tmp/emf_1
printf "description = ${3}\n" >>/tmp/emf_1
sync
/usr/scrt/bin/rtmark -C ${1} /tmp/emf_1
NOTE
Event markers are lost on failover and failback because the rollback
window is reset. The production server’s event_marker file is cleared
during a failback to the configured production server. The failover server’s
event_marker file is cleared during a failover to the configured failover
server.
Usage
This command is used to mount all file systems associated with the context
specified.
Syntax
rtmnt [-C ID][-fn]
Parameters
Parameter Description
Usage
This command is used to load the Assure MIMIX for AIX data tap and to start the
data replication processes. On the production server, rtstart also mounts the
protected file systems.
Syntax
rtstart [-C ID][-BMnNpr]
Parameters
Parameter Description
Syntax
rtstop [-C ID][-FfhnkS]
Parameters
Parameter Description
Usage
This command is used to:
Syntax
rtumnt [-C ID][-Dfn]
Parameters
Parameter Description
Usage
This command provides information about containers used in Assure MIMIX for
AIX.
Syntax
sclist -t TYPE [-bR] [-A ATTR [ ... ]] [-R] [-C ID] [-d X]
sclist -t TYPE -o ATTR=VALUE [-bR] [-A ATTR [ ... ]] [-C ID] [-d X]
sclist -a [-A ATTR [ ... ]] [-C ID] [-d X]
sclist -r SERIAL [-r SERIAL ...] [-b | -c] [-C ID] [-d X]
sclist -O SERIAL [-O SERIAL ...] [-C ID] [-d X]
sclist [-BeiIjJlLmMpPstSTvVX] [-I] [-D[D]] [-C ID] [-d X]
sclist -h [-z]
sclist -fZ
-a Query all objects
-A ATTR Query specific attribute (repeatable)
-b Be Brief, useful for scripting output
-B List of StateMap bitmap devices
-c Expansive, if possible, expand on output.
-C ID Operate on Context ID.
-d X Debug level of X (0-9).
-D Query all driver/device objects from ODM.
(-D for just drivers, and -DD for devices & drivers.)
-e List pooling/journal configuration.
-f File system list.
-h Print Help Message
-i Query driver objects
-I Query whether drivers are loaded
-j Query the Journal objects
-J List of History Journal exported devices
-l Query the Logdev objects
-L List of LogDev access devices
-m Query the StateMap objects
-M List of StateMap access devices
-o ATTR=VAL Query within type list for attribute ATTR equal to
VAL.
-O SERIAL Query specific object with serial number SERIAL
-p Query the Passthru objects
-P List of PassThru access devices
-r SERIAL List given objects relationships.
-R List relationships in attribute list.
-s Query the SCID objects
-S List of SCID devices
-v List of Volume group/disk info for config
-V List of Write Journal exported devices
-t TYPE What objects to query ('sclist -hz' for list)
Parameter Description
-C ID Operate on Context ID
-o ATTR=val Query within type list for attribute ATTR equal to val
-t type Type of object to query (enter sclist –hz for list of object
types)
Usage
Use this command to manage DataTap devices and drivers.
Syntax
scconfig -l [-cfinERtv] [-C ID] [-d X] [-I name]
scconfig -u | -U [-finEv] [-C ID] [-d X] [-I name]
scconfig -r [-nv] [-L name ...] [-C ID] [-d X] [-I name]
scconfig -M | -W | -P | -B [-G] [-nv] [-L name ...] [-C ID] [-d X]
scconfig -S [-C ID] [-d X] [-nv]
scconfig -s [-C ID] [-d X]
scconfig -t | -q | -Q | -h
scconfig -C ID -a seconds [-b percent]
scconfig -V
Parameters
Parameter Description
scsetup
Usage
Makes or removes the Logical Volumes (LVs) used by Assure MIMIX for AIX in a
specific protection context, such as LFCs. Note however that scsetup will not
remove production LVs in the PVS or their associated replica LVs. Run this
command after defining and saving a context configuration using the Replication
Group wizard.
After you have defined a context, scsetup creates a log file and containers (logical
volumes) in the specified volume group.
Syntax
scsetup -M [-ijlnprsv] [-C ID] [-d X] [-o role] [ -t TYPE ]
scsetup -R [-inv] [-C ID] [-d X] [-o role] [ -t TYPE ]
scsetup -E [-cinv] [-C ID] [-d X] [-o role]
scsetup -I [-cinv] [-C ID] [-d X] [-o role]
scsetup -L [-inv] [-C ID] [-d X]
scsetup -X [-inv] [-C ID] [-d X]
scsetup -F [-inv] [-C ID] [-d X]
scsetup -h
-C ID Operate on Context ID.
-c Clear destination device files prior to export/import.
-d X Debug level of X (0-9).
-E Export production volumes.
-C ID Operate on Context ID
-s Skip setting or clearing bitmaps for statemap (if there are any)
Notes:
Removes Assure MIMIX for AIX data protection, allowing direct access to
production data.
-j Initialize COW journal containers (fill with zeros).
-L Perform a logform on the statemap log
(implied with -M operation).
-l Skip forming statemap log (if one exists).
-M Make configured LV's from scratch.
-n Don't execute, just print commands.
-o role Operate on specific role { prod | back }.
NOTE
Type must be of SCRT/container/*, and specified as the associated “Class”
name (see “sclist -hz” for a list).
scrt_ra
Usage
This command is the Restore Agent. It is used to display the available rollback
windows and to create snapshots on the recovery server.
Available VFBs:
---------------------------------------------------------
No recorded VFBs.
---------------------------------------------------------
Event Name Date/Time (Epoch Time) Description
Parameters
Parameter Description
-v Verbose.
Usage
The restore client is an interactive command line interface, or shell, for production
data restore. To enter the shell, type scrt_rc -C<ID> at the unix command prompt on
the recovery (a.k.a. backup) server.
Entering the restore client shell starts a production restore session. Ultimately, this
session should be terminated with either the commit or abort command. Problems
during the restore can be resolved with the recovery command.
NOTE
The -p option the scrt_rc command will not start the shell, but instead will
return with agent status.
Syntax
scrt_rc [-C ID] [-d X] [-p X] [-h[v]] [-v] [-V]
-d Debug level of X (0-9)
-h Help, display this message
-C Operation on Context ID (default is 17)
-p Ping agent X (aba|lca|rs), ref is 0 if up
-v Verbose help
-V Print version
• LFC level
• Date/Time
Session termination
A restore session may be terminated either with an abort or a commit command.
When aborted, all restored devices are brought back to pre-session levels. When
committed, all restored devices remain at the last target of the session.
A commit does not remove any forward or reverse incremental data from the Assure
MIMIX for AIX time line which allows for a subsequent restore to a time after the
committed target, if necessary. In fact, the restore itself is included in the time line
which allows it to be undone.
All block I/O during the restore occurs at the Logical Volume Manager (LVM) layer,
below all file systems and/or databases associated with the protected application. In
Assure MIMIX for AIX, the reverse block incremental data is recorded in odd
numbered LFCs, the Before Image Log File.
Containers (BILFCs), which are also raw LVs and reside on the backup/recovery
server, or in external tape archives, if any.
The length of the restore window is a function of how many BILFCs are available to
Assure MIMIX for AIX, the size of the BILFC, and the average application write
rate. Tape archives are used to extend the restore window.
During a restore, the PVS LVs must be opened exclusively for writing by Assure
MIMIX for AIX. No other application may have the LVs opened for writing. All
associated databases and file systems must be unmounted.
Prior to a production restore, LCA (scrt_lca) and ABA (scrt_aba) must be running in
daemon mode. If external tape archiving is enabled, AA (scrt_aa) should also be
running in daemon mode. RC (scrt_rc) will spawn RS (scrt_rs) automatically, and
stop RS automatically at the conclusion of the restore.
General procedure
1. Ensure required agent daemons are running.
4. On the production server - Sync Assure MIMIX for AIX [scconfig -Cx -S]
2. One After Image Log File Container (AILFC) may be sent during the restore to
fine tune to the nearest second. BILFCs are optimized for I/O throughput, while
AILFCs maintain individual write fidelity.
Usage
This command is used to create a virtual full backup.
NOTE
The Tivoli Storage Manager must be defined in the Assure MIMIX for
AIX configuration before using this command.
Syntax
scrt_vfb [-bdDflLnUVrR] [-s <path to validation script>] [ -C ID ]
Parameters
Parameter Description
-h Help.
-V Create VDevs.
Usage
This command is used to schedule a virtual full backup.This command manages
entries in cron for Assure MIMIX for AIX Virtual Full Backups (VFB).
NOTE
The Tivoli Storage Manager must be defined in the Assure MIMIX for
AIX configuration before using this command.
Syntax
sccfgd_cron_schedule <Op> <Context_id> [<sched_type>] [<cron_info>]
[<vfb_opts>]
where:
Op -[a|q|d] for (add|query|delete respectively)
sched_type [once|daily|weekly|monthly]
Examples
sccfgd_cron_schedule add 3 daily 15:3:*:*:*
sccfgd_cron_schedule delete 3
sccfgd_cron_schedule query 3
Usage
This command is used to load the Assure MIMIX for AIX configuration file into the
Assure MIMIX for AIX ODM by creating and loading a failover context
configuration based on a previously loaded primary context configuration.
Syntax
sccfgd_putcfg primary_context_ID failover_context_ID
Parameters
Parameter Description
Usage
This command is used to check a configuration before Assure MIMIX for AIX is
started. Issue this command on each node after the configuration is initialized and
before it is started.
Syntax
sccfgchk-C <Context ID>
Parameters
Parameter Description
-v Verbose
Usage
You can use the Sizing Tool (sztool) to calculate configuration values before Assure
MIMIX for AIX is installed. It is also useful to run the tool after Assure MIMIX for
AIX is installed to determine if the number of LFCs or WJ percentage needs to be
adjusted. For more information, refer to Chapter 3, “Using the Sizing Tool to
Calculate LFC Size” on page 41.
Syntax
sztool
Parameters
sztool script
Command Description
Options
sztool If issued for the very first time, the working directory,
diskinfo file and sztool.cfg file are generated. You should
review the diskinfo file and then modify sztool.cfg,
accordingly. You can then re-run sztool.
sztool -l When the log file is created, this command prints out the
calculation results for different LFC sizes based on the
existing log file. For example, sztool -l32 prints out the
results when the LFC size is at 32M. sztool -l16 -l512,
prints out all the calculation results from 32MB to
512MB. You cannot have spaces between -l and the LFC
size number. Only screen output, there is not any delay or
sleep.
Command syntax
All commands follow the same basic syntax:
cli.sh [<general options>] <command> [<command options>]
[<name_or_id>]
where:
• [<general options>] are the supported general options (see General options).
General options
The following general options are available:
-v | --verbose Sets the logging level to INFO. If used twice, sets logging
level to DEBUG.
• deployConfig
• deconfigure
• validateLicense
• startCluster
• showCluster
• stopCluster
• supportSnapshot
deployConfig
Deploy the XML configuration file to the local node. If a node ID is not specified,
an attempt is made to guess it from the local IP addresses.
Syntax
cli.sh [<general options>] deployConfig <file> [<node_id>]
Example
cli.sh -v deployConfig file1 nodeB
deconfigure
Remove the current configuration from the local node.
Syntax
cli.sh [<general options>] deconfigure
Example
cli.sh -v deconfigure
validateLicense
Display license expiration on the local node.
Syntax
cli.sh [<general options>] validateLicense
Example
cli.sh -v validateLicense
Syntax
cli.sh [<general options>] startCluster
Example
cli.sh -v startCluster
showCluster
Outputs a status summary showing the active configuration version, the Global
Administration Daemon (GAD), which is the node that is coordinating the nodes in
the cluster, and the connectivity to the other node in the cluster.
Syntax
cli.sh [<general options>] showCluster
Example
cli.sh -v showCluster
stopCluster
Stop cluster services on the local node, or on all nodes if the allNodes option is
specified.
Syntax
cli.sh [<general options>] stopCluster <command options>
Example
cli.sh -v stopCluster -a
supportSnapshot
Retrieves a support snapshot from the node and stores it in a file.
Example
cli.sh supportSnapshot
startApplication
Starts the application on the local node.
Syntax
cli.sh [<general options>] startApplication [<command options>]
[<name_or_id>]
Command
Options Description
Example
cli.sh -C startApplication +m Application-1
clearApplication
Clears errors on all local resources of an application.
Syntax
cli.sh [<general options>] clearApplication [<name_or_id>]
Example
cli.sh -v clearApplication Application-1
setApplicationPolicy
Defines the level at which the application will be managed by the cluster. Possible
values are:
• MAINTAIN_ACTIVE
Example
cli.sh -v setApplicationPolicy -s MyApplication MAINTAIN_ACTIVE
Example
cli.sh -v setApplicationPolicy MyApplication MAINTAIN_ACTIVE
moveApplication
Moves an application to the specified node, if possible. This action is carried out
regardless of the autoMove flag.
Syntax
cli.sh [<general options>] moveApplication [<command options>]
[<name_or_id>]
stopApplication
Stops an application on the local node by stopping the application’s resources. This
operation implicitly clears the autoStart and autoMove flags, effectively ‘pinning
the application.
Syntax
cli.sh [<general options>] stopApplication [<command options>]
[<nid>]
Example
cli.sh -v stopApplication Application-1
stopAllLocalApplications
Stops all applications running on this node. Returns 0 on success.
Syntax
cli.sh stopAllLocalApplications [-t <time>]
Example
cli.sh -v stopAllLocalApplications -t 300
showApplication
Prints a clear-text rendering of the status of a single application. If name_or_id is
omitted, the status of all applications is shown.
Syntax
cli.sh [<general options>] showApplication <command options>
[<name_or_id>]
Example
cli.sh showApplication -v Application-1.
startResource
Starts an individual resource on the local node.
Syntax
cli.sh [<general options>] startResource [<command options>]
[<name_or_id>]
Command
Options Description
Examples
cli.sh -v startResource -f Resource-1
clearResource
Clears errors or manual stops on a local resource. If name_or_id is omitted, then all
local resources are cleared.
Syntax
cli.sh [<general options>] clearResource [<name_or_id>]
Example
cli.sh -v clearResource Resource-1
Syntax
cli.sh [<general options>] showResource [<name_or_id>]
Example
cli.sh -v showResource Resource-1
stopResource
Stop a local resource. If the noAutoStart flag is specified, the resource will be
unavailable to the cluster until cleared. Otherwise, the cluster may start the resource
if the application requires it. If name_or_id is omitted, then all local resources are
stopped.
Syntax
cli.sh [<general options>] stopResource [<command options>]
[<name_or_id>]
Command
Options Description
Example
cli.sh -v stopResource -f Resource-1
showEvents
Dump the local event log.
Syntax
cli.sh [<general options>] showEvents [<command options>]
[<name_or_id>]
Example
cli.sh -v showEvents -5m
NOTE
If you do not specify a time interval, the entire event log is dumped.
setLogLevel
Modifies the Log4J logging level which is in effect until the listener is restarted.
Syntax
cli.sh [<general options>] setLogLevel [<command options>]
Example
cli.sh -v setLogLevel DEBUG
performNetDiscovery
Perform a network discovery and dump the results. This command must be run
simultaneously on all nodes.
<nid> represents a numerical node ID that must be unique between the nodes
participating in the discovery.
Syntax
cli.sh [<general options>] performNetDiscovery [<nid>]
Example
cli.sh -v performNetDiscovery aix_22
performDiscovery
Discover all other resources and dump the results.
Example
cli.sh -v performDiscovery
exit
Terminate the listener. Resources won't be stopped, and no effort is made to
coordinate with other nodes. If the cluster was active, it will be restored when the
listener is restarted.
Syntax
cli.sh [<general options>] exit
Example
cli.sh -v exit
• There are two production nodes with shared disks between them.
• Replication is to a third node that lies outside the cluster.
• Multiple Assure MIMIX for AIX Replication Group configurations are
supported and can be online on different nodes.
Option 2:
• There are two production nodes with shared disks between them.
• Replication is to a third node that lies outside the cluster.
• Multiple Assure MIMIX for AIX Replication Group configurations are
supported, however they must be online on the same node.
Option 3:
In this mode:
• There are two production nodes with shared disks between them.
• A Resource Group can only be moved between the two production nodes, and
the Assure MIMIX for AIX roles of the nodes never changes.
• The file systems protected by Assure MIMIX for AIX should not be
automatically mounted at system restart.
• The file systems protected by Assure MIMIX for AIX should not be mounted or
unmounted when a Resource Group is brought online or offline. The mounting
and unmounting of the protected filesystems will be performed by Assure
MIMIX for AIX.
• Do not use the Assure MIMIX for AIX Enable Automatic Startup feature, as
documented in the User Guide.
• All volume groups associated with an Assure MIMIX for AIX configuration
must be enhanced concurrent mode volume groups on both PowerHA
production nodes. Use C-SPOC to create/change the volume group(s), selecting
Fast Disk Takeover or Disk Heart Beat for the Enable Fast Disk Takeover or
Concurrent Access option.
• Assure MIMIX for AIX file containers and the Assure MIMIX for AIX
configuration file systems /usr/scrt/run/c<Primary Context ID> and
/usr/scrt/run/c<Failover ContextID> can co-exist on the same
enhanced concurrent mode volume group(s) that are part of the Assure MIMIX
for AIX configuration. The file systems need to be at least 128MB in size and
must have a separate jfslog, jfs2log, or jfs2 inline log since they will not be part
of the protected file systems.
• Assure MIMIX for AIX must be configured with a Service IP address (an alias
address that follows the application when it is moved between nodes) so that the
recovery server can connect to either production server. To configure a Service
IP Address, change the production servers Initial host adapter IP Address to the
Service IP address used by your application. If your application did not use a
Service IP address, create one. The /etc/hosts file on all nodes must contain the
Service IP Address and associated IP label.
NOTE
In the steps below, APPLICATION_RG is used as the name of the
PowerHA Resource Group that contains the application (production)
logical volumes and file systems that would be protected by Assure
MIMIX for AIX. MIMIX_COMMON_RG is used as the name of
PowerHA Resource Group, containing the Assure MIMIX for AIX
configuration. MIMIX_VG is the name of the shared volume group
containing the logical volumes of Assure MIMIX for AIX containers. You
can use any other names to suit your environment. 1 and 17 are used as the
Primary and Failover Context IDs respectively. If 1 and 17 are already
used by another Assure MIMIX for AIX Replication Group configuration,
change these values to next available unique ID values such as 2 and 18.
1. On the primary production server and recovery server, varyonvg all volume
group(s) associated with the Assure MIMIX for AIX configuration.
2. On the primary production server, use PowerHA C-SPOC to create file systems
/usr/scrt/run/c<Primary Context ID> and /usr/scrt/run/c<FailOver Context
ID> with inline logs on one of the shared enhanced concurrent mode volume
groups that contains the production server logical volumes you want protected.
The filesystem must be at least 128 MB in size.
3. On the primary production server, mount the file systems associated with the
Assure MIMIX for AIX configuration.
NOTE
You must manually copy and load the Assure MIMIX for AIX
configuration onto the failover production server.
5. On the primary production server, create a file with the Primary Context ID
configuration.
6. On the primary production server, create a file with the Failover Context ID
configuration.
8. On the failover production server edit the production HostId stanza in the
/tmp/C<Primary Context ID>.cfg file. Replace the contents of the
“ObjectAttributeValue” field with the output from the "rthostid"
command.
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 15
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "6CABA7DF"
ObjectAttributeType = "ulong"
SerialNumber = 15006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
9. On the failover production server edit the backup HostId stanza in the
/tmp/C<Failover Context ID>.cfg file replacing the content of the
“ObjectAttributeValue” field with the output from the “rthostid”
command.
SCCuAttr:
ObjectName = "backup"
46,50,54,57,64,67,70,73..75,82..93,95...
51,54,57,61,65,69,82...
The Primary Context ID is 1, and the new device major number 82, is available
on both production servers.
The Primary Context ID is 1, and the new device major number 82, is available
on both production servers.
15. On the primary production server, unmount the file systems associated with the
Assure MIMIX for AIX configuration.
• unmount /usr/scrt/run/c<Primary Context ID>
• unmount /usr/scrt/run/c<Failover Context ID>
16. On the primary production server, varyoff all volume group(s) associated with
the Assure MIMIX for AIX configuration.
17. On both production servers, the file systems protected by Assure MIMIX for
AIX should not be automatically mounted at system restart. Change the
auto-mount attribute to “no”.
18. On the secondary production server, add the replication group port numbers to
/etc/services.
The port numbers to add can be found in the /etc/services file on the primary
production server.
For example, these port numbers are for primary context id 1 and failover
context id 17:
sc1aba_dchannel 5779/tcp
sc1aba_dchannel 5783/tcp
sc1lca_channel 5780/tcp
sc1lca_dchannel 5784/tcp
sc1aa_channel 5778/tcp
sc1aa_achannel 5782/tcp
sc1ra_channel 5785/tcp
sc1ca_channel 5781/tcp
sc17aba_channel 5787/tcp
sc17aba_dchannel 5791/tcp
sc17lca_channel 5788/tcp
sc17lca_dchannel 5792/tcp
sc17aa_channel 5786/tcp
sc17aa_achannel 5790/tcp
sc17ra_channel 5793/tcp
sc17ca_channel 5789/tcp
19. Before starting Assure MIMIX for AIX on the production server, you must stop
your application and unmount the protected file systems.
Application Server—If there are multiple Resource Groups that can failover
independently, there must be multiple Assure MIMIX for AIX Context IDs created,
one for each Resource Group. Likewise, each of these Resource Groups requires
scripts to manage the startup and shutdown of each Assure MIMIX for AIX
ContextID. The Assure MIMIX for AIX start script needs to be placed at the
beginning of the application server startup sequence, before applications are started.
The Assure MIMIX for AIX stop script needs to be executed at the end of the
application server shutdown sequence, after all applications have been stopped
gracefully.
Copy the sample Assure MIMIX for AIX start and stop scripts to the location of
your choice and change the values for CONTEXT_ID and
FAILOVER_CONTEXT_ID and insert your application start and stop scripts
where indicated.
For example:
See the sample Assure MIMIX for AIX start and stop scripts below.
NOTE
Use the IP Label/Address that was used by the Assure MIMIX for AIX
Replication Group wizard.
IMPORTANT
The Assure MIMIX for AIX drivers must be loaded before the Assure
MIMIX for AIX protected file systems are mounted or written to. This is
managed during the execution of the Assure MIMIX for AIX startup
process. PowerHA determines which file systems to mount based on the
information provided in the resource group configuration. If no file
systems are specified, PowerHA will mount all file systems in all volume
groups that are defined in the resource group. This scenario is not
preferred for an Assure MIMIX for AIX environment. Assure MIMIX for
AIX should start before the file systems are mounted.
5. On the recovery server use rtstart to manually start the Assure MIMIX for AIX
Primary Context ID.
/usr/scrt/bin/rtstart -C<PrimaryContextID>.
• Verify that the volume groups defined in the Resource Group are online in
concurrent mode.
• Verify that the /usr/scrt/run/c<Primary Context ID> and
/usr/scrt/run/c<Failover Context ID> file systems are mounted.
• Verify that all of the Assure MIMIX for AIX protected file systems are
mounted.
• Verify that the Service IP Address is aliased on the Ethernet Network
Interface.
• Verify that Assure MIMIX for AIX is replicating to the recovery server.
View the log files: /var/log/EchoStream/scrt_lca-<Primary
Context ID>.out on the production server and
/var/log/EchoStream/scrt_aba-<Primary Context ID>.out on
the recovery server.
for lv in $LVS
do
rm -f /dev/recd$lv
rm -f /dev/rrecd$lv
done
exit 0
RTSTOPLOG=/usr/scrt/log/rtstop_c${CONTEXT_ID}.log
echo "${DATE_CMD}: Shutting down ${PRODUCT_NAME} ..." > $RTSTOPLOG
Note that if the Replication Group is active, the following warning message is
displayed in the Servers panel:
You will be allowed to continue through the Change Replication Group wizard and
make changes, until the Summary panel is displayed, after which the following
warning message will be displayed:
At this time, the Replication Group must be stopped so that the configuration
changes can be deployed to the production and recovery servers. Please do not click
the Finish button or use the Assure UI portal to stop the Replication Group. Use
PowerHA to take the Resource Group offline, as described below:
2. On the recovery server execute the following command to stop the Replication
Group:
rtstop -FC <Primary ContextID>
For example:
rtstop -FC1
3. Return to the Summary panel of the Assure UI portaland click the Finish button
when the following message displayed:
For example:
rtstart -C1
NOTE
You must manually copy and load the updated Assure MIMIX for AIX
configuration onto the secondary production server.
6. Perform the steps in “Updating the Assure MIMIX for AIX configuration on the
secondary production server” on page 371.
IMPORTANT
This must be done before a resource group is moved from the primary
production server to the secondary production server
Perform the following step to update the Assure MIMIX for AIX configuration on
the recovery server:
NOTE
You must manually copy and load the updated Assure MIMIX for AIX
configuration onto the secondary production server.
2. Perform the steps in step “Updating the Assure MIMIX for AIX configuration
on the secondary production server” on page 371.
IMPORTANT
This must be done before a resource group is moved from the primary
production server to the secondary production server
Perform the following steps to update the Assure MIMIX for AIX configuration on
the secondary production server.
1. On the secondary production server, delete the Primary and Failover Context
IDs for the replication group that was changed.
2. On the primary production server, create a file with the Primary Context ID
configuration.
odmget -q ContextID=1 SCCuObj SCCuAttr SCCuRel >/tmp/C1.cfg
3. On the primary production server, create a file with the Failover Context ID
configuration.
odmget -q ContextID=17 SCCuObj SCCuAttr SCCuRel >/tmp/C17.cfg
5. On the secondary production server edit the production HostId stanza in the
/tmp/C<Primary Context ID>.cfg file. Replace the contents of the
ObjectAttributeValue field with the output from the rthostid command.
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 15
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "9C5FE66FB"
ObjectAttributeType = "ulong"
SerialNumber = 15006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
6. On the secondary production server edit the backup HostId stanza in the
/tmp/C<Failover Context ID>.cfg file replacing the content of the
ObjectAttributeValue field with the output from the rthostid command.
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 4
8. On the primary production server, use es_ha_config to display the device major
number assigned to the Primary Context ID.
/usr/scrt/bin/es_ha_config 1
The Primary Context ID is 1, and the new device major number 82, is assigned
to the Primary Context ID on both production servers.
12. If the device major number assigned to the Primary Context ID on the primary
production server is not available on the secondary production server, then an
NOTE
Both scenarios require that you manually perform Failover operations.
Unplanned Failover
In this scenario, both Highly Available Production servers are unavailable due to a
disaster. For example, an entire site is lost due to a disaster such as a flood or
hurricane.
1. On the recovery server, make sure all snapshot filesystems are unmounted
before trying to release the snapshot.
/usr/scrt/bin/rtumnt -C <Context ID>
3. On the recovery server, create a Snapshot based on the current redo log and
validate the data integrity. Enter the following command to create a snapshot
based on the current redo log:
/usr/scrt/bin/scrt_ra -C <Primary Context ID> -X
4. Mount the snapshot filesystems on the recovery server. Enter the following
command:
/usr/scrt/bin/rtmnt -C <Context ID>
You should see output similar to the following:
Determining Filesystems to mount...
fsck -fp -y /dev/rtestlv
log redo processing for /dev/rtestlv
syncpt record at 7028
end of log 7028
syncpt record at 7028
syncpt address 7028
number of log records = 1
number of do blocks = 0
number of nodo blocks = 0
/dev/rtestlv (/test): ** Unmounted cleanly - Check
suppressed
Mounting /test...
"If analysis indicates the data is valid, use rtumnt to unmount the snapshot
filesystems, use scrt_ra to remove the snapshot then proceed to step 6 to
perform a failover to the latest point in the data.
/usr/scrt/bin/rtumnt -C <Context ID>
/usr/scrt/bin/scrt_ra -C <Context ID> -W
"If analysis indicates data corruption, use rtumnt to unmount the snapshot
filesystems, use scrt_ra to remove snapshot.
/usr/scrt/bin/rtumnt -C <Context ID>
/usr/scrt/bin/scrt_ra -C <Context ID> -W
then create a snapshot based on one of the following to locate and validate an
optimal restore point.
Once you have located an optimal restore point, remove the snapshot. Proceed
to step 5 to Backup the replica or to step 6 to perform a failover restore.
5. On the recovery server, if you have TSM or SysBack, backup the replica. This
provides additional data protection by keeping complete copies of the data on
archive media such as tape. Refer to Chapter 13, “Working with Archived Data”
on page 263.
6. On the recovery server, depending on the results from step 4, either rollback the
replica to the validated rollback location before executing a failover or failover
using the current redo log.
To rollback the actual data replica, not a snapshot of the replica, to the validated
point in time, enter the following command:
scrt_ra -C <Primary Context ID> -F [-t | -S | -D].
For example, to restore the replica to a previously validated date and time:
scrt_ra -C <Primary Context ID> -F -D "05/15/09 09:33:40"
NOTE
At this point your application is running on the new production server and
both configured production servers are down. When one or both
configured production servers come online, Power HA will start and bring
the Resource Groups online on one of the configured production servers.
A resync operation is required when the Production Volumes and Recovery Replica
Volumes diverge. This occurs after a failover to the recovery server.
When the application is started on the new production server the updates to the
Replica Volumes result in a divergence from the data on the new recovery server.
11. On the new recovery server, execute the resync command to start replication on
the new recovery server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
12. On the new recovery server verify that the resync process completed.
Example:
13. On the new production server, execute the resync command to start replication
on the new production server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
14. On the new production server verify that the resync process completed.
Example:
--- System re-sync is activated. ---
NOTE
At this point your application is running on the new production server and
data is being replicated to the new recovery server.
IMPORTANT
Important: If the failover from the production server to the recovery server
was necessary because all protected data was lost on the production server,
then the data must be restored to the production server before doing a
failback.
15. On the new production server, use the scconfig command to resync all protected
data from the new production server to the new recovery server.
/usr/scrt/bin/scconfig -B -G -C <Failover Context ID>
16. Before performing the failback operation, ensure that you stop your application
on the new production server.
17. On the new production server, use rtumnt to unmount the protected file systems,
transfer any current LFC data to the recovery server.
/usr/scrt/bin/rtumnt -C<FailoverContextID>
18. On the new production server verify that the protected file systems are
unmounted and synchronization of data has successfully completed.
Example:
Determining Filesystems to unmount...
Unmounting /dev/lvFS_1_C1 from /FS_1_C1...
Sync: transferring current LFC to Recovery Server
Waiting for synchronization of data to complete.
All data has been synchronized to the Recovery Server.
20. On the new production server verify that the synchronization of data has
successfully completed.
Example:
No mounted filesystems
Sync: transferring any current LFC data to Recovery Server
Waiting for synchronization of data to complete.
All data has been synchronized to the Recovery Server.
Stopping scrt_lca................
Unloading Assure MIMIX DR for AIX Production Server Drivers
21. On the new recovery server execute the failback command to start the failback
process. Wait for failback to successfully complete before performing failback
on the new production server.
/usr/scrt/bin/rtdr -qC <Primary Context ID> failback
22. On the new recovery server verify that the failback process completed.
Example:
Assure MIMIX DR for AIX Production Server Drivers are already
loaded for Context ID 1.
Starting scrt_lca
fsck -fp -y /dev/rlvFS_1_C1
The current volume is: /dev/lvFS_1_C1
Primary superblock is valid.
Mounting /FS_1_C1...
--- Primary Context ID <1> is enabled. ---
23. On the new production server execute the failback command to start the
failback process. Wait for failback to successfully complete before starting your
application.
/usr/scrt/bin/rtdr -qC <Failover Context ID> failback
24. On the new production server verify that the failback process completed.
Example:
Assure MIMIX DR for AIX Recovery Server Drivers are already
loaded for Context ID 1.
Starting scrt_aba
--- Primary Context ID <1> is enabled. ---
26. Use PowerHa Start Cluster Services to manage the resource groups.
NOTE
This concludes the Unplanned Failover Operations (Failover, Resync, and
Failback).
Planned Failover
In this scenario, the administrator has a scheduled maintenance period and switches
operations that run on the production server to the designated recovery server.
Follow these steps to perform the Planned Failover, Resync and Failback
operations:
3. On the production server, use rtumnt to unmount the protected file systems,
transfer any current LFC data to the recovery server.
4. On the production server verify that the synchronization of data has successfully
completed.
Example:
Sync: transferring current LFC to Recovery Server
Waiting for synchronization of data to complete.
All data has been synchronized to the Recovery Server.
6. On the production server verify that the synchronization of data has successfully
completed.
Example:
All data has been synchronized to the Recovery Server.
Stopping scrt_lca..............
Unloading Assure MIMIX DR for AIX Production Server Drivers
7. On the recovery server execute the failover command to initiate the failover
process:
/usr/scrt/bin/rtdr -C <Primary Context ID> failover
Example:
--- Failover Context ID <Failover ContextID> is enabled. ---
NOTE
At this point the configured recovery server has become the new
production server and the configured production server has become the
new recovery server.
NOTE
At this point your application is running on the new production server and
replication is not active. Perform the resync operation to start replication.
A resync operation is required when the Production Volumes and Recovery Replica
Volumes diverge. This occurs after a failover to the recovery server.
9. On the new recovery server, execute the resync command to start replication on
the new recovery server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
10. On the new recovery server verify that the resync process completed.
Example:
-- Failover Context ID <Failover ContextID> is enabled and ready
for re-sync. ---
11. On the new production server, execute the resync command to start replication
on the new production server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
12. On the new production server verify that the resync process completed.
Example:
--- System re-sync is activated. ---
NOTE
At this point your application is running on the new production server and
data is being replicated to the new recovery server. To return the
production and recovery server back to their original roles, perform the
failback operation.
13. Before performing the failback operation, ensure that you stop your application
on the new production server.
14. On the new production server, use rtumnt to unmount the protected file systems,
transfer any current LFC data to the recovery server.
/usr/scrt/bin/rtumnt -C<FailoverContextID>
15. On the new production server verify that the protected file systems are
unmounted and synchronization of data has successfully completed.
16. On the new production server, use rtstop to unmount the protected file systems,
transfer any current LFC data to the recovery server and unload the Assure
MIMIX for AIX production server drivers.
/usr/scrt/bin/rtstop -FSC<FailoverContextID>
17. On the new production server verify that the synchronization of data has
successfully completed.
Example:
No mounted filesystems
Sync: transferring any current LFC data to Recovery Server
Waiting for synchronization of data to complete.
All data has been synchronized to the Recovery Server.
Stopping scrt_lca................
Unloading Assure MIMIX DR for AIX Production Server Drivers
18. On the new recovery server execute the failback command to start the failback
process. Wait for failback to successfully complete before performing failback
on the new production server.
/usr/scrt/bin/rtdr -C <Primary Context ID> failback
19. On the new recovery server verify that the failback process completed.
Example:
Assure MIMIX DR for AIX Production Server Drivers are already
loaded for Context ID 1.
Starting scrt_lca
fsck -fp -y /dev/rlvFS_1_C1
The current volume is: /dev/lvFS_1_C1
Primary superblock is valid.
Mounting /FS_1_C1...
--- Primary Context ID <1> is enabled. ---
20. On the new production server execute the failback command to start the
failback process. Wait for failback to successfully complete before starting your
application.
/usr/scrt/bin/rtdr -qC <Failover Context ID> failback
21. On the new production server verify that the failback process completed.
Example:
NOTE
At this point the production and recovery servers have been returned back
to their original roles and replication is active.
23. Use PowerHa Start Cluster Services to manage the resource groups.
NOTE
This concludes the Planned Failover Operations (Failover, Resync and
Failback).
In this mode:
• There are two production nodes with shared disks between them.
• A Resource Group can only be moved between the two production nodes, and
the Assure MIMIX for AIX roles of the nodes never changes.
• The file systems protected by Assure MIMIX for AIX should not be mounted or
unmounted when a Resource Group is brought online or offline. The mounting
and unmounting of the protected filesystems will be performed by Assure
MIMIX for AIX.
• Do not use the Assure MIMIX for AIX Enable Automatic Startup feature, as
documented in the User Guide.
• All volume groups associated with an Assure MIMIX for AIX configuration
must be enhanced concurrent mode volume groups on both PowerHA
production nodes. Use PowerHA C-SPOC to create/change the volume
group(s), selecting Fast Disk Takeover or Disk Heart Beat for the Enable Fast
Disk Takeover or Concurrent Access option.
• Assure MIMIX for AIX file containers and the Assure MIMIX for AIX
configuration file systems /usr/scrt/run/c<Primary Context ID> and
/usr/scrt/run/c<Failover ContextID> can co-exist on the same
enhanced concurrent mode volume group(s) that are part of the Assure
MIMIX for AIX configuration. The file systems need to be at least 128MB in
size and must have a separate jfslog, jfs2log, or jfs2 inline log since they will
not be part of the protected file systems.
By default the Assure MIMIX for AIX File Containers are created on a volume
group (Default VG, selected when using the Replication Group wizard), unless
otherwise specified.
• Assure MIMIX for AIX must be configured with a Service IP address (an alias
address that follows the application when it is moved between nodes) so that the
recovery server can connect to either production server. To configure a Service
IP Address, change the production servers Initial host adapter IP Address to the
Service IP address used by your application. If your application did not use a
Service IP address, create one. The /etc/hosts file on all nodes must contain the
Service IP Address and associated IP label.
• Before starting Assure MIMIX for AIX on the production server, you must stop
your application and unmount the protected file systems.
1. On the primary production server, the secondary production server and the
recovery server, install the Assure MIMIX for AIX software.
This Resource Group will be used control the configuration required by Assure
MIMIX for AIX.
4. On the primary production server, use PowerHA C-SPOC to create file system
/usr/scrt/run/common with an inline log on volume group MIMIX_VG. The
filesystem must be at least 128 MB in size.
5. On the primary production server, use PowerHA C-SPOC to create file systems
/usr/scrt/run/c<Primary Context ID> and /usr/scrt/run/c<FailOver Context
ID> with inline logs on one of the shared enhanced concurrent mode volume
groups that contains the production server logical volumes you want protected.
The filesystem must be at least 128 MB in size.
8. On the primary production server, use PowerHA to verify and synchronize the
cluster configuration.
10. On the primary production server, mount the filesystems associated with the
Assure MIMIX for AIX configuration.
As in this example, 1 and 17 are the configured Primary and Failover Context
IDs, respectively.
11. On the recovery server, create volume group MIMIX_VG, which holds the
logical volumes for the Assure MIMIX for AIX containers and the volume
group(s) that will contain the replicated data. Please specify the AUTO ON
option for the volume groups so that they are varied on automatically at system
startup.
12. Use the Replication Group wizard to create the Assure MIMIX for AIX
replication group configuration.
Refer to the section “Configuring Replication Groups” on page 107. Make sure
to add a portal connection, using the Service IP Address as the Host or IP
Address for the production server. This will allow the Assure UI portal to
connect to either the primary or secondary production server, depending on
which server is online. While configuring the replication IP addresses, select the
option to use server IP addresses for replication. This will set the Service IP
address as the replication IP address.
Also, while configuring the production server logical volumes you want
protected, do not select the logical volumes for file systems
/usr/scrt/run/c<primary context id>,
/usr/scrt/run/c<failover context id> and
/usr/scrt/run/common.
After the creation of the Assure MIMIX for AIX replication group
configuration, perform the following steps.
For example:
• /usr/scrt/bin/es_ha_production_setup 1 17
For example:
• /usr/scrt/bin/es_ha_production_setup 1 17
15. On the primary production server, execute the following command to get the
Instance value for the Primary Context ID.
• rtattr -C<Primary Context ID> -o production -a Instance
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 15
ObjectType = "SCRT/info/host"
ObjectAttributeName = "Instance"
ObjectAttributeValue = "4"
ObjectAttributeType = "uint"
SerialNumber = 150011
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
16. On the primary production server, execute the following command to obtain the
device major number used by Primary ContextID:
• es_ha_config <Primary Context ID>
For example:
• /usr/scrt/bin/es_ha_config 1
Current major number: 73
NOTE
The Assure MIMIX for AIX major numbers must match on both the
primary and secondary production servers. In case of a change in the major
number on one of the production servers, the other must also be updated
with the same number to match. Otherwise, Assure MIMIX for AIX will
not function properly.
For example:
• /usr/sbin/lvlstmajor
46,50,54,57,64,67,70,73..75,82..93,95...
NOTE
If the device major number used by Primary Context ID 1 on the primary
production server is displayed in the list of unused device major numbers
on the secondary production server, move Resource Group
MIMIX_COMMON_RG to the secondary production server.
If the device major number used by the Primary Context ID on the primary
production server is not displayed in the list of unused device major numbers on
the secondary production server, use the lvlstmajor AIX command on both
production servers to find a matching unused device major number that is
unused on both production servers. Then, update the configuration on the
primary production server by running the following command before moving
Resource Group MIMIX_COMMON_RG to the secondary production server.
For example:
• /usr/scrt/bin/es_update_major <Primary Context ID>
<Instance> <New Major Number>
For example:
• /usr/scrt/bin/es_ha_update_hostid 1 17
For example:
• /usr/scrt/bin/es_update_major 1 4 73
Note that the instance in the above command was obtained from step 15 from
the primary production server.
20. On the secondary production server, add the replication group port numbers to
/etc/services.
The port numbers to add can be found in the /etc/services file on the primary
production server.
For example, these port numbers are for primary context id 1 and failover
context id 17:
sc1aba_dchannel 5779/tcp
sc1aba_dchannel 5783/tcp
sc1lca_channel 5780/tcp
sc1lca_dchannel 5784/tcp
sc1aa_channel 5778/tcp
sc1aa_achannel 5782/tcp
sc1ra_channel 5785/tcp
sc1ca_channel 5781/tcp
sc17aba_channel 5787/tcp
sc17aba_dchannel 5791/tcp
sc17lca_channel 5788/tcp
sc17lca_dchannel 5792/tcp
sc17aa_channel 5786/tcp
sc17aa_achannel 5790/tcp
sc17ra_channel 5793/tcp
sc17ca_channel 5789/tcp
21. On the secondary production server, use PowerHA to move resource group
MIMIX_COMMON_RG to the primary production server.
22. On the primary production server, unmount the filesystems associated with the
Assure MIMIX for AIX configuration.
23. On the primary production server, use PowerHA to add the filesystem resources
/usr/scrt/run/c1 and /usr/scrt/run/c17 to the resource group
NOTE
The PowerHA Resource Group attribute Filesystems (empty is ALL for
volume groups specified) cannot be empty, because the file systems
protected by Assure MIMIX for AIX must be mounted by Assure MIMIX
for AIX.
24. On the primary production server, use PowerHA to add or change Application
Controller Scripts. For your convenience, sample start and stop scripts are
provided in the /usr/scrt/samples directory.
Copy the sample Assure MIMIX for AIX start and stop scripts to the location of
your choice and change the values for CONTEXT_ID and
FAILOVER_CONTEXT_ID and insert your application start and stop scripts
where indicated.
For example:
See the sample Assure MIMIX for AIX start and stop scripts below.
25. On the primary production server, use PowerHA to Add Parent/Child
Dependency between resource groups, as follows: MIMIX_COMMON_RG as
the Parent Resource Group and the APPLICATION_RG as the Child Resource
Group.
26. On the primary production server, use PowerHA to add Online on the Same
Node Dependency between resource groups MIMIX_COMMON_RG,
APPLICATION_RG and any other resource groups associated with Assure
MIMIX for AIX.
27. On the primary production server, use PowerHA to verify and synchronize the
cluster configuration.
for lv in $LVS
do
rm -f /dev/recd$lv
rm -f /dev/rrecd$lv
done
exit 0
RTSTOPLOG=/usr/scrt/log/rtstop_c${CONTEXT_ID}.log
echo "${DATE_CMD}: Shutting down ${PRODUCT_NAME} ..." > $RTSTOPLOG
Note that if the Replication Group is active, the following warning message is
displayed in the Servers panel:
You will be allowed to continue through the Change Replication Group wizard and
make changes, until the Summary panel is displayed, after which the following
warning message will be displayed:
At this time, the Replication Group must be stopped so that the configuration
changes can be deployed to the production and recovery servers. Please do not click
2. On the recovery server execute the following command to stop the Replication
Group:
For example:
rtstop -FC1
3. Return to the Summary panel of the Assure UI portal and click the Finish button
when the following message displayed:
For example:
rtstart -C1
Perform the following step to update the Assure MIMIX for AIX configuration on
the recovery server:
• “Extending a protected file system beyond the limit of the region (block) size”
on page 224
NOTE
Both scenarios require that you manually perform Failover operations.
Unplanned failover
In this scenario, both Highly Available Production servers are unavailable due to a
disaster. For example, an entire site is lost due to a disaster such as a flood or
hurricane.
1. On the recovery server, make sure all snapshot filesystems are unmounted
before trying to release the snapshot.
/usr/scrt/bin/rtumnt -C <Context ID>
3. On the recovery server, create a Snapshot based on the current redo log and
validate the data integrity. Enter the following command to create a snapshot
based on the current redo log:
/usr/scrt/bin/scrt_ra -C <Primary Context ID> -X
4. Mount the snapshot filesystems on the recovery server. Enter the following
command:
/usr/scrt/bin/rtmnt -C <Context ID>
You should see output similar to the following:
Determining Filesystems to mount...
fsck -fp -y /dev/rtestlv
log redo processing for /dev/rtestlv
syncpt record at 7028
end of log 7028
syncpt record at 7028
syncpt address 7028
number of log records = 1
number of do blocks = 0
number of nodo blocks = 0
/dev/rtestlv (/test): ** Unmounted cleanly - Check
suppressed
Mounting /test...
"If analysis indicates the data is valid, use rtumnt to unmount the snapshot
filesystems, use scrt_ra to remove the snapshot then proceed to step 6 to
perform a failover to the latest point in the data.
/usr/scrt/bin/rtumnt -C <Context ID>
/usr/scrt/bin/scrt_ra -C <Context ID> -W
"If analysis indicates data corruption, use rtumnt to unmount the snapshot
filesystems, use scrt_ra to remove snapshot.
/usr/scrt/bin/rtumnt -C <Context ID>
/usr/scrt/bin/scrt_ra -C <Context ID> -W
then create a snapshot based on one of the following to locate and validate an
optimal restore point.
Once you have located an optimal restore point, remove the snapshot. Proceed
to step 5 to Backup the replica or to step 6 to perform a failover restore.
5. On the recovery server, if you have TSM or SysBack, backup the replica. This
provides additional data protection by keeping complete copies of the data on
archive media such as tape. Refer to Chapter 13, “Working with Archived Data”
on page 263.
6. On the recovery server, depending on the results from step 4, either rollback the
replica to the validated rollback location before executing a failover or failover
using the current redo log.
To rollback the replica to the validated rollback location from step 4, use:
scrt_ra -C <Primary Context ID> -F [-t | -S | -D].
For example, to restore the replica to a previously validated date and time:
scrt_ra -C <Primary Context ID> -F -D "05/15/09 09:33:40"
NOTE
At this point the configured recovery server has become the new
production server and the configured production server has become the
new recovery server.
NOTE
At this point your application is running on the new production server and
both configured production servers are down. When one or both
configured production servers come online, Power HA will start and bring
the Resource Groups online on one of the configured production servers.
A resync operation is required when the Production Volumes and Recovery Replica
Volumes diverge. This occurs after a failover to the recovery server.
When the application is started on the new production server the updates to the
Replica Volumes result in a divergence from the data on the new recovery server.
11. On the new recovery server, execute the resync command to start replication on
the new recovery server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
12. On the new recovery server verify that the resync process completed.
Example:
-- Failover Context ID <Failover ContextID> is enabled and ready
for re-sync. ---
13. On the new production server, execute the resync command to start replication
on the new production server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
Example:
--- System re-sync is activated. ---
NOTE
At this point your application is running on the new production server and
data is being replicated to the new recovery server.
IMPORTANT
Important: If the failover from the production server to the recovery server
was necessary because all protected data was lost on the production server,
then the data must be restored to the production server before doing a
failback.
15. On the new production server, use the scconfig command to resync all protected
data from the new production server to the new recovery server.
/usr/scrt/bin/scconfig -B -G -C <Failover Context ID>
16. Before performing the failback operation, ensure that you stop your application
on the new production server.
17. On the new production server, use rtumnt to unmount the protected file systems,
transfer any current LFC data to the recovery server.
/usr/scrt/bin/rtumnt -FC<FailoverContextID>
18. On the new production server verify that the protected file systems are
unmounted and synchronization of data has successfully completed.
Example:
Determining Filesystems to unmount...
Unmounting /dev/lvFS_1_C1 from /FS_1_C1...
Sync: transferring current LFC to Recovery Server
Waiting for synchronization of data to complete.
All data has been synchronized to the Recovery Server.
19. On the new production server, use rtstop to unmount the protected file systems,
transfer any current LFC data to the recovery server and unload the Assure
MIMIX for AIX production server drivers.
/usr/scrt/bin/rtstop -FSC<FailoverContextID>
20. On the new production server verify that the synchronization of data has
successfully completed.
Example:
21. On the new recovery server execute the failback command to start the failback
process. Wait for failback to successfully complete before performing failback
on the new production server.
/usr/scrt/bin/rtdr -C <Primary Context ID> failback
22. On the new recovery server verify that the failback process completed.
Example:
Assure MIMIX DR for AIX Production Server Drivers are already
loaded
for Context ID 1.
Starting scrt_lca
fsck -fp -y /dev/rlvFS_1_C1
The current volume is: /dev/lvFS_1_C1
Primary superblock is valid.
Mounting /FS_1_C1...
--- Primary Context ID <1> is enabled. ---
23. On the new production server execute the failback command to start the
failback process. Wait for failback to successfully complete before starting your
application.
/usr/scrt/bin/rtdr -C <Failover Context ID> failback
24. On the new production server verify that the failback process completed.
Example:
Assure MIMIX DR for AIX Recovery Server Drivers are already
loaded for Context ID 1.
Starting scrt_aba
--- Primary Context ID <1> is enabled. ---
NOTE
At this point the production and recovery servers have been returned back
to their original roles and replication is active.
26. Use PowerHa Start Cluster Services to manage the resource groups.
Planned Failover
In this scenario, the administrator has a scheduled maintenance period and switches
operations that run on the production server to the designated recovery server.
3. On the production server, use rtumnt to unmount the protected file systems,
transfer any current LFC data to the recovery server.
4. On the production server verify that the synchronization of data has successfully
completed.
Example:
Sync: transferring current LFC to Recovery Server
Waiting for synchronization of data to complete.
All data has been synchronized to the Recovery Server.
5. On the production server, use rtstop to unmount the protected file systems,
transfer any current LFC data to the recovery server and unload the Assure
MIMIX for AIX production server drivers.
/usr/scrt/bin/rtstop -FSC<PrimaryContextID>
6. On the production server verify that the synchronization of data has successfully
completed.
Example:
All data has been synchronized to the Recovery Server.
Stopping scrt_lca..............
Unloading Assure MIMIX DR for AIX Production Server Drivers
7. On the recovery server execute the failover command to initiate the failover
process:
/usr/scrt/bin/rtdr -C <Primary Context ID> failover
Example:
--- Failover Context ID <Failover ContextID> is enabled. ---
NOTE
At this point the configured recovery server has become the new
production server and the configured production server has become the
new recovery server.
NOTE
At this point your application is running on the new production server and
replication is not active. Perform the resync operation to start replication.
A resync operation is required when the Production Volumes and Recovery Replica
Volumes diverge. This occurs after a failover to the recovery server.
When the application is started on the recovery server the updates to the Replica
Volumes result in a divergence from the data on the production server.
9. On the new recovery server, execute the resync command to start replication on
the new recovery server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
10. On the new recovery server verify that the resync process completed.
11. On the new production server, execute the resync command to start replication
on the new production server.
/usr/scrt/bin/rtdr -C <Primary Context ID> resync
12. On the new production server verify that the resync process completed.
Example:
--- System re-sync is activated. ---
NOTE
At this point your application is running on the new production server and
data is being replicated to the new recovery server. To return the
production and recovery server back to their original roles, perform the
failback operation.
13. Before performing the failback operation, ensure that you stop your application
on the new production server.
14. On the new production server, use rtumnt to unmount the protected file systems,
transfer any current LFC data to the recovery server.
/usr/scrt/bin/rtumnt -FC<FailoverContextID>
15. On the new production server verify that the protected file systems are
unmounted and synchronization of data has successfully completed.
Example:
Determining Filesystems to unmount...
Unmounting /dev/lvFS_1_C1 from /FS_1_C1...
Sync: transferring current LFC to Recovery Server
Waiting for synchronization of data to complete.
All data has been synchronized to the Recovery Server.
16. On the new production server, use rtstop to unmount the protected file systems,
transfer any current LFC data to the recovery server and unload the Assure
MIMIX for AIX production server drivers.
/usr/scrt/bin/rtstop -FSC<FailoverContextID>
17. On the new production server verify that the synchronization of data has
successfully completed.
Example:
18. On the new recovery server execute the failback command to start the failback
process. Wait for failback to successfully complete before performing failback
on the new production server.
/usr/scrt/bin/rtdr -C <Primary Context ID> failback
19. On the new recovery server verify that the failback process completed.
Example:
Assure MIMIX DR for AIX Production Server Drivers are already
loaded
for Context ID 1.
Starting scrt_lca
fsck -fp -y /dev/rlvFS_1_C1
The current volume is: /dev/lvFS_1_C1
Primary superblock is valid.
Mounting /FS_1_C1...
--- Primary Context ID <1> is enabled. ---
20. On the new production server execute the failback command to start the
failback process. Wait for failback to successfully complete before starting your
application.
/usr/scrt/bin/rtdr -C <Failover Context ID> failback
21. On the new production server verify that the failback process completed.
Example:
Assure MIMIX DR for AIX Recovery Server Drivers are already
loaded for Context ID 1.
Starting scrt_aba
--- Primary Context ID <1> is enabled. ---
NOTE
At this point the production and recovery servers have been returned back
to their original roles and replication is active.
23. Use PowerHa Start Cluster Services to manage the resource groups.
Option 3:
• Have PowerHA for AIX monitor an Assure MIMIX for AIX context. PowerHA
for AIX will monitor the LCA and ABA Assure MIMIX for AIX agents and
provides notification in the event of failure.
• If the Assure MIMIX for AIX production server fails, provide the ability to
failover Assure MIMIX for AIX and the users application to the recovery server
after verifying the Replica data.
• Provide the ability to control Assure MIMIX for AIX by manipulating the
location and status of the PowerHA for AIX resource groups.
Prerequisites
Before you begin, keep in mind the following:
• Assure MIMIX for AIX must be operational and configured with Primary and
Failover contexts.
• scconfigd must be running on the Assure MIMIX for AIX production and
recovery servers.
• Starting Assure MIMIX for AIX and the associated user applications in
/etc/inittab is not recommended. If the production server fails and the user
applications are brought up on the recovery server, it would not be desirable to
have them start when the production server is restored to service.
When the Production_Server resource group is brought online on the recovery node
it executes the rtdr -C <Primary Context ID> failover command. When the
Production_Server resource group is brought offline (on the recovery server), after
the recovery node becomes available, it will send the rtdr -C <Primary Context ID>
resync command to the recovery node before stopping.
There may be special cases where you would delay bringing the Production_Server
resource group offline on the recovery server. Normally, this would mean that no
data is being synchronized to the production server. In that case, you can manually
execute the rtdr -C <Primary Context ID> resync command on the production
server. This starts the ABA but it will not be monitored by PowerHA for AIX.
If the Recovery_Server resource group fails it will not move, because there is only
one node in the participating node list.
Ensure that you have the Recovery_Server resource group online to prevent an
automatic failover of the Production_Server resource group to the recovery server.
Failback procedure
1. Move the Production_Server resource group to the recovery node.
• Names used such as, Cluster Nodes, resource groups are arbitrary. The
integrator can choose to use any names.
• Notification scripts are not provided and are the responsibility of the integrator.
The following Assure MIMIX for AIX scripts are provided for the PowerHA for
AIX configuration. These scripts require parameters -C <Primary Context ID> and
for the first two scripts optionally -P if called from the Production_Server resource
group. These scripts will log to "/usr/scrt/log" if the "HACMP Log File Parameters"
have "Debug Level" set to "high".
/usr/scrt/bin/production_failback_acquire
/usr/scrt/bin/production_failover_release
/usr/scrt/bin/ABA_Monitor
/usr/scrt/bin/LCA_Monitor
###############################################################################
# Main Entry Point
################################################################################
PROGNAME=${0##*/}
[[ ${VERBOSE_LOGGING} == high ]] &&
{
rm -f /tmp/${PROGNAME}.out
exec 1> /tmp/${PROGNAME}.out
exec 2>&1
PS4='[${PROGNAME}][${LINENO}]'
set -x
}
printf "$(date) ******** Begin ${PROGNAME} ********\n"
/usr/scrt/bin/production_failback_acquire -C <Primary Context ID> -P
if ((${?}!=0))
then
printf "$(date) Production Server start failed.\n"
exit 1
fi
printf "$(date) Production Server start successful.\n"
if ((${?}!=0))
then
printf "$(date) MIMIX_DR_for_AIX_85_Application_Start failed.\n"
exit 1
fi
printf "$(date) MIMIX_DR_for_AIX_85_Application_Start successful.\n"
################################################################################
###############################################################################
# Main Entry Point
###############################################################################
PROGNAME=${0##*/}
[[ ${VERBOSE_LOGGING} == high ]] &&
{
rm -f /tmp/${PROGNAME}.out
exec 1> /tmp/${PROGNAME}.out
exec 2>&1
PS4='[${PROGNAME}][${LINENO}]'
set -x
}
printf "$(date) ******** Begin ${PROGNAME} ********\n"
if ((${?}!=0))
then
printf "$(date) MIMIX_DR_for_AIX_85_Application_Stop failed.\n"
exit 1
fi
printf "$(date) MIMIX_DR_for_AIX_85_Application_Stop successful.\n"
/usr/scrt/bin/production_failover_release -C <Primary Context ID> -P
if ((${?}!=0))
then
printf "$(date) Production Server stop failed.\n"
exit 1
fi
printf "$(date) Production Server stop successful.\n"
###############################################################################
Note: The value for "Stabilization Interval" depends on the time required
to reset the LFCs on Failover. This depends on the number of LFCs
on the Recovery Server and the system performance. With 20,000
LFCs it could typically take up to 15 minutes.
To run the GUI version of the Sizing Tool, you need to install X11 software on
your laptop and configure the X11 software to allow access from the remote AIX
node. Actual sizing data will be collected by the backend Sizing Tool scripts on
the AIX node.
3. Select X11 packages from the listing screen. Search for X11 and select
xorg-server, xinit, xlaunch, openssh and any other packages you prefer.
On the Start program screen, select Start program on this computer and select
xterm as the local program.
5. Run the Environment Setup on the AIX backend node (which will be the
production node).
6. After installation on the laptop, define the following settings on the AIX
backend node: