Cluster Server Agent For Hitachi Truecopy/Hur/Hewlett-Packard XP Continuous Access Installation and Configuration Guide
Cluster Server Agent For Hitachi Truecopy/Hur/Hewlett-Packard XP Continuous Access Installation and Configuration Guide
Cluster Server Agent For Hitachi Truecopy/Hur/Hewlett-Packard XP Continuous Access Installation and Configuration Guide
Hitachi
TrueCopy/HUR/Hewlett-Packard
XP Continuous Access
Installation and
Configuration Guide
7.0
August 2016
Cluster Server Agent for Hitachi
TrueCopy/HUR/Hewlett-Packard XP Continuous
Access Installation and Configuration Guide
The software described in this book is furnished under a license agreement and may be used
only in accordance with the terms of the agreement.
Legal Notice
Copyright © 2016 Veritas Technologies LLC. All rights reserved.
Veritas and the Veritas Logo are trademarks or registered trademarks of Veritas Technologies
LLC or its affiliates in the U.S. and other countries. Other names may be trademarks of their
respective owners.
This product may contain third party software for which Veritas is required to provide attribution
to the third party (“Third Party Programs”). Some of the Third Party Programs are available
under open source or free software licenses. The License Agreement accompanying the
Software does not alter any rights or obligations you may have under those open source or
free software licenses. Please see the Third Party Legal Notice Appendix to this Documentation
or TPIP ReadMe File accompanying this product for more information on the Third Party
Programs.
The product described in this document is distributed under licenses restricting its use, copying,
distribution, and decompilation/reverse engineering. No part of this document may be
reproduced in any form by any means without prior written authorization of Veritas Technologies
LLC and its licensors, if any.
The Licensed Software and Documentation are deemed to be commercial computer software
as defined in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19
"Commercial Computer Software - Restricted Rights" and DFARS 227.7202, et seq.
"Commercial Computer Software and Commercial Computer Software Documentation," as
applicable, and any successor regulations, whether delivered by Veritas as on premises or
hosted services. Any use, modification, reproduction release, performance, display or disclosure
of the Licensed Software and Documentation by the U.S. Government shall be solely in
accordance with the terms of this Agreement.
http://www.veritas.com
Technical Support
Technical Support maintains support centers globally. Technical Support’s primary
role is to respond to specific queries about product features and functionality. The
Technical Support group also creates content for our online Knowledge Base. The
Technical Support group works collaboratively with the other functional areas within
the company to answer your questions in a timely fashion.
Our support offerings include the following:
■ A range of support options that give you the flexibility to select the right amount
of service for any size organization
■ Telephone and/or Web-based support that provides rapid response and
up-to-the-minute information
■ Upgrade assurance that delivers software upgrades
■ Global support purchased on a regional business hours or 24 hours a day, 7
days a week basis
■ Premium service offerings that include Account Management Services
For information about our support offerings, you can visit our website at the following
URL:
www.veritas.com/support
All support services will be delivered in accordance with your support agreement
and the then-current enterprise technical support policy.
Customer service
Customer service information is available at the following URL:
www.veritas.com/support
Customer Service is available to assist with non-technical questions, such as the
following types of issues:
■ Questions regarding product licensing or serialization
■ Product registration updates, such as address or name changes
■ General product information (features, language availability, local dealers)
■ Latest information about product updates and upgrades
■ Information about upgrade assurance and support contracts
■ Advice about technical support options
■ Nontechnical presales questions
■ Issues that are related to CD-ROMs, DVDs, or manuals
Support agreement resources
If you want to contact us regarding an existing support agreement, please contact
the support agreement administration team for your region as follows:
Japan [email protected]
Contents
Index .................................................................................................................... 77
Chapter 1
Introducing the agent for
Hitachi
TrueCopy/HUR/Hewlett-Packard
XP Continuous Access
This chapter includes the following topics:
■ Supported software
■ Supported hardware
TrueCopy resource online also has safe and exclusive access to the configured
devices.
You can use the agent in replicated data clusters and global clusters that run VCS.
The agent supports TrueCopy in all fence levels that are supported on a particular
array.
The agent supports different fence levels for different arrays:
The agent also supports parallel applications, such as Storage Foundation for
Oracle RAC.
The Hitachi TrueCopy/HUR/HP-XP Continuous Access agent also supports Hitachi
Universal Replicator for asynchronous replication on two sites.
Supported software
For information on the software versions that the agent for Hitachi
TrueCopy/HUR/Hewlett-Packard XP Continuous Access supports, see the Veritas
Services and Operations Readiness Tools (SORT) site:
https://sort.veritas.com/agents.
Supported hardware
The agent for Hitachi TrueCopy provides support for the following:
■ The agent supports Hitachi TrueCopy replication, provided that the host, HBA,
array combination are in Hitachi's hardware compatibility list.
■ The agent for Hitachi TrueCopy does not support other Hewlett-Packard
replication solutions under the Continuous Access umbrella such as Continuous
Access Storage Appliance (CASA). The agent only supports Continuous Access
XP.
In environments using Storage Foundation for Oracle RAC, the arrays must support
SCSI-3 persistent reservations.
Introducing the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 12
Typical Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access setup in a VCS cluster
Replication link
■ In a replicated data cluster environment, all hosts are part of the same cluster.
You must connect them with the dual and dedicated networks that support LLT.
In a global cluster environment, you must attach all hosts in a cluster to the
same Hitachi TrueCopy array.
■ In parallel applications like Storage Foundation for Oracle RAC, all hosts that
are attached to the same array must be part of the same GAB membership.
Storage Foundation for Oracle RAC is supported with TrueCopy only in a global
cluster environment and not in a replicated data cluster environment.
Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access agent functions
The VCS enterprise agent for Hitachi TrueCopy monitors and manages the state
of replicated devices that are attached to VCS nodes.
The agent performs the following functions:
Introducing the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 14
Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent functions
Function Description
online If the state of all local devices is read-write enabled, the agent
makes the devices writable by creating a lock file on the local
host.
If one or more devices are not in a writable state, the agent runs
the horctakeover command to enable read-write access to
the devices.
If the S-VOL devices are in the COPY state, the agent runs the
horctakeover command after one of the following:
If the S-VOL devices are in the PAIR state, the agent runs the
pairdisplay command without the -l option to retrieve the
state of the remote site devices. If it finds that the P-VOL devices
are in PAIR state, the agent proceeds with the failover. But if
the remote RAID manager is down, then the agent honors the
SplitTakeover attribute configuration before performing failover.
offline The agent removes the lock file that was created for the resource
by the online entry point. The agent does not run any TrueCopy
commands because taking the resource offline is not indicative
of an intention to give up the devices.
Introducing the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 15
Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent functions
Function Description
monitor Verifies the existence of the lock file to determine the resource
status. If the lock file exists, the agent reports the status of the
resource as online. If the lock file does not exist, the agent
reports the status of the resource as offline.
open Removes the lock file from the host on which this entry point is
called. This functionality prevents potential concurrency violation
if the group fails over to another node.
Note that the agent does not remove the lock file if the agent
starts after the following command:
hastop -force
info Reports the current role and status of the devices in the device
group. This entry point can be used to verify the device state
and to monitor dirty track trends.
action The agent supports the following actions using the hares
-action command from the command line:
Function Description
action\PreSwitch Ensures that the remote site cluster can come online during a
planned failover within a GCO configuration without data loss.
The VCS engine on the remote cluster invokes the PreSwitch
action on all the resources of the remote Service Group during
a planned failover using the hagrp -switch command. For
this, the PreSwitch attribute must be set to 1. The option -nopre
indicates that the VCS engine must switch the servicegroup
regardless of the value of the PreSwitch service group attribute.
action\vxdiske Reports the mapping between the physical disk name and the
volume manager disk name for all connected disks.
action\GetCurrentRPO Fetches the current point in time RPO. The agent performs this
action function on the disaster recovery (DR) system where the
ComputeDRSLA attribute is set to 1. The RPO is computed in
seconds.
Note: The agent does not compute the RPO when the group
is frozen.
The agent does not store the computed RPO; make a note of
the RPO for future reference.
Note: The agent uses the following internal action functions to compute the RPO:
StartRPOComputation, StopRPOComputation, StartWriter, and ReportRPOData.
If one or more devices are not in a writable state, the agent runs the horctakeover
command to enable read-write access to the devices. If the horctakeover command
exits with an error (exit code > 5), for example, due to a timeout, then the agent
flushes and freezes the group to indicate that user-intervention is required to identify
the cause of the error.
For S-VOL devices in any state other than SSWS and SSUS, the agent honors the
SplitTakeover attribute and runs the horctakeover command to make the devices
writable.
The time required for failover depends on the following conditions:
■ The health of the original primary.
■ The RAID Manager timeouts as defined in the horcm configuration file for the
device group.
If the S-VOL devices are in the SSUS state and if the RoleMonitor attribute is set
to 1, the agent runs the pairdisplay command without the -l option, to determine
if the S-VOL is in a writable state. The agent behavior when devices are in S-VOL
SSUS state is as follows:
■ If S-VOL devices are in SSUS writable state, the agent proceeds with online
without failover.
■ If S-VOL devices are in SSUS read only state, the agent honors the SplitTakeover
attribute and accordingly proceeds with failover to make the devices writable.
■ In case agent could not connect to remote RAID manager, the agent faults the
resource.
If the S-VOL devices are in the COPY state, the agent runs the horctakeover
command after one of the following:
■ The synchronization from the primary completes.
■ When the OnlineTimeout period of the entry point expires, the horctakeover
command will not be executed, in which case the resource faults.
If S-VOL devices are in the PAIR state, the agent issues the pairdisplay command
without the -l option to get the replication link state. If it finds that the P-VOL devices
are in the PAIR state, the agent proceeds with failover. If remote horcm is down,
the SplitTakeover attribute is honored before issuing the horctakeover command.
The agent validates the value of OnlineTimeout for the HTC type is sufficient to run
the horctakeover command. If the agent finds this value of OnlineTimeout is
insufficient, the agent logs an appropriate error message.
Chapter 2
Installing and removing the
agent for Hitachi
TrueCopy/HUR/Hewlett-Packard
XP Continuous Access
This chapter includes the following topics:
Set up replication and the required hardware infrastructure. For information about
setting up Oracle RAC environment, refer to the Storage Foundation for Oracle
RAC Configuration and Upgrade Guide.
See “Typical Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access setup
in a VCS cluster” on page 12.
AIX cd1/aix/vcs/replication/htc_agent/
agent_version/pkgs/
Linux cd1/linux/generic/vcs/replication/htc_agent/
agent_version/rpms/
Solaris cd1/solaris/dist_arch/vcs/replication/htc_agent/
agent_version/pkgs/
If you downloaded the individual agent tar file, navigate to the pkgs directory
(for AIX, and Solaris), or the rpms directory (for Linux).
Installing and removing the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 20
Installing the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access
4 Log in as a superuser.
5 Install the package.
Note: On successful installation of the agent, if VCS is running, the agent types
definition is automatically added to the VCS configuration.
2 Disable the publishers that are not reachable as package install may fail, if any
of the already added repositories are unreachable.
# pkg set-publisher --disable <publisher name>
where the publisher name is obtained using the pkg publisher command.
3 Add a file-based repository in the system.
# pkg set-publisher -g /tmp/install/VRTSvcstc.p5p Symantec
Copy the map file to the secondary site and then import the volume group on the
secondary using the map file. Run the following command:
vgimport [-s] [-v] [-m] /vg04map.map vg04.
This must be done as part of an initial setup process before VCS starts controlling
the replication.
Installing and removing the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 23
Removing the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access
type HTC (
static keylist RegList = { ComputeDRSLA, SplitTakeover,
LinkMonitor, RoleMonitor, FreezeSecondaryOnSplit,
AllowOnlineOnSimplex }
static keylist SupportedActions = { localtakeover,
pairresync, pairresync-swaps, pairdisplay, vxdiske,
PreSwitch, ReportRPOData, StartWriter, GetCurrentRPO,
StartRPOComputation, StopRPOComputation }
static int InfoInterval = 300
static keylist LogDbg = { DBG_1, DBG_2, DBG_3 }
static int OpenTimeout = 180
static str ArgList[] = { BaseDir, GroupName, Instance,
SplitTakeover, LinkMonitor, RoleMonitor, FreezeSecondaryOnSplit,
AllowOnlineOnSimplex, ComputeDRSLA, AdvancedOpts }
str BaseDir = "\"/HORCM/usr/bin\""
str GroupName
int Instance
int SplitTakeover
int LinkMonitor
int RoleMonitor
int FreezeSecondaryOnSplit
boolean AllowOnlineOnSimplex = 0
temp str VCSResLock
temp str TargetFrozen
int ComputeDRSLA
temp boolean Tagging = 0
str AdvancedOpts{} = { AllowAutoFailoverInterval="-1" }
temp str PVOLStateTime
)
Attribute Description
Type-Dimension: string-scalar
Default: /HORCM/usr/bin
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 26
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
Attribute Description
Type-Dimension: string-scalar
Instance The Instance number of the device that the agent manages.
Multiple device groups may have the same instance number.
Type-Dimension: integer
Type-Dimension: integer-scalar
Default: 0
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 27
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
Attribute Description
LinkMonitor An integer that determines the action the agent takes when the
replication link is disconnected. Depending on the value of this
attribute, the agent takes the following action:
Type-Dimension: integer-scalar
Default: 0
Attribute Description
If this attribute is set to 1, the agent monitors the status of the HTC
volumes everytime a monitor cycle runs. In addition, the HTC
resource comes online only when any of the following conditions
are met:
Type-Dimension: integer-scalar
Default: 0
FreezeSecondaryOnSplit A flag that determines if the agent must freeze the service group
in the remote cluster when the TrueCopy replication link is either
split or suspended.
The value 1 indicates that the agent must freeze the service group
in the remote cluster when the replication link is split or suspended.
Type-Dimension: integer-scalar
Default: 0
AllowOnlineOnSimplex A flag that determines if the agent must allow a resource to come
online when the TrueCopy devices are in SMPL (Simplex) state.
This attribute is honored only when the agent attempts to bring a
resource online.
The value false indicates that the agent must not allow a resource
to come online when TrueCopy devices are in SMPL (Simplex)
state.
Type-Dimension: boolean-scalar
Default: false
Attribute Description
Type-Dimension: integer-scalar
Default: 0
AdvancedOpts Used at the time of monitoring. This attribute enables the agent
to execute a custom script during the monitor cycle of the resource.
AdvancedOpts{} =
{ AllowAutoFailoverInterval="-1" }
Type-Dimension: string-association
AllowAutoFailoverInterval The agent uses this attribute to fail over automatically only if all
the following conditions are met:
The failover takes place only if the value of this attribute is greater
than the last registered PAIR state time difference.
Note: This attribute is applicable only when the fence level is
NEVER.
Default: -1
Type-Dimension: string-association
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 30
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
Attribute Description
SplitTakeover attribute = 0
The default value of the SplitTakeover attribute is 0.
The default value indicates that the agent does not permit a failover to S-VOL
devices if the P-VOL devices are in the PSUE state, or if the agent cannot connect
to the remote site RAID manager, or if the S-VOL devices are in the SSUS state.
If a failover occurs when the replication link is disconnected, data loss may occur
because the S-VOL devices may not be in sync.
If the S-VOL devices are in the PAIR state, the agent attempts to contact the RAID
manager at the P-VOL side to determine the status of the arrays.
If the P-VOL devices are in the PAIR state, the agent proceeds with failover. But if
the P-VOL side is down, the agent attempts to honor the SplitTakeover attribute
configuration before proceeding with failover.
If a device group is made up of multiple devices, then, in case of a link failure, the
state of each device changes on an individual basis. This change is not reflected
on the device group level. Only those devices to which an application made a write
after a link failure change their state to PSUE. Other devices in the same device
group retain their state to PAIR.
SplitTakeover attribute = 1
If there is a replication link failure, or if the primary array fails, or if a pair is
suspended, the agent allows failover to the S-VOL devices.
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 31
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
FreezeSecondaryOnSplit attribute = 0
If the value of the FreezeSecondaryOnSplit attribute is 0, the agent unfreezes the
remote site service group if it is already frozen. Hence, even if there is a replication
link failure, or if the primary array fails, or if a pair is suspended, the agent allows
failover to the S-VOL devices.
■ The ActionTimeout value of HTC type should be more than twice the value of
remote RAID manager timeout.
Greater than 0 ■ The fence level is NEVER. The failover is triggered. The
■ The remote RAID manager is not last known PAIR status of
reachable. P-VOL is recorded and is used
■ The last known remote state within to determine the failover action.
T seconds is PAIR, where T is the
AllowAutofailoverInterval value.
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 33
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
Greater than 0 ■ The fence level is NEVER. The failover is not triggered.
■ The remote RAID manager is not The last known PAIR status of
reachable. P-VOL is recorded and is used
■ The last known remote state is PAIR to determine the failover action.
and is greater than T seconds, where
T is the AllowAutofailoverInterval
value.
Greater than 0 ■ The fence level is NEVER. The failover is not triggered.
■ The remote RAID manager is not The last known PAIR status of
reachable. P-VOL is recorded and is used
■ The last known remote state is not to determine the failover action.
PAIR.
■ The use of this attribute provides a tradeoff between minimum downtime and
data consistency. You may achieve a smaller downtime at the cost of possible
data loss or corruption. The tradeoff exists because, in the fence level NEVER,
if the remote HORCM is down, there is no way to figure out whether the
replication link is healthy and the latest data is available for failover.
■ In this scenario if takeover has failed, the service group goes into Freeze state.
■ If the SplitTakeover attribute is set to 1, the agent triggers a failover regardless
of the AllowAutoFailoverInterval value.
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 34
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
hahb -modify
ICMP AYAInterval
45
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 35
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
hahb -modify
ICMP
AYARetryLimit 2
D = Timeout value HORCM file at the 120 seconds The HTC agent at
specified for the Secondary site secondary site
cluster node at the attempts to get the
Primary site replication link state
using the
pairdisplay
command. This
operation times out
after the specified
interval.
A + (B x C) + D + BufferTime
= 60 + (60 x 3) + 120 + 40
= 400 seconds
You can modify these attribute values (A to D) to reduce the effective failover time.
For example, the turnaround time can be reduced to 180 seconds by tweaking the
attributes values as follows:
■ A = 30 seconds
■ B = 45 seconds
■ C = 2 attempts
■ D = 30 seconds
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 36
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
Caution: AYAInterval and AYARetryLimit are responsible for GCO link monitoring
and toleration of intermittent network failures. Significantly reducing this value may
falsely flag intermittent network issues as network failures, which may trigger a
failover.
Note: AYAInterval and AYARetryLimit are not used in a replicated data cluster
(RDC) environment, so the effective time for failover in that environment is greatly
reduced.
Listener
Oracle
Volume IP
DiskGroup
NIC
HTC
You can configure a resource of type HTC in the main.cf file as:
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 37
Configuration concepts for the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
HTC DG (
GroupName = DG
Instance = 1
)
group HTC (
SystemList = { fred = 0, barney = 1 }
Parallel = 2
ClusterList = { clus1 = 0, clus2 = 1 }
Authority = 1
AutoStartList = { fred, barney }
)
CFSMount htc_mnt (
BlockDevice = "/dev/vx/dsk/TCdg/htcvol"
MountPoint = "/htc"
)
CVMVolDg htc_dg (
CVMVolume = { htcvol }
CVMActivation = sw
CVMDeportOnOffline = 1
CVMDiskGroup = TCdg
ClearClone = 1
)
HTC rep_htc (
GroupName = vg1
Instance = 1
)
group cvm (
SystemList = { fred = 0, barney = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { fred, barney }
)
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 38
Before you configure the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access
CFSfsckd vxfsckd (
)
CVMCluster cvm_clus (
CVMTransport = gab
CVMClustName = htc701
CVMTimeout = 200
CVMNodeId = { fred = 0, barney = 1 }
)
CVMVxconfigd cvm_vxconfigd (
Critical = 0
CVMVxconfigdArgs = { syslog }
)
■ If you plan to configure the agent in a replicated data cluster, make sure the
required replication infrastructure is in place and that the application is
configured.
■ Ensure that the HORC manager is configured to access the device groups.
■ Verify that the HTC instance is configured appropriately and is in a running
state.
■ Verify that the HORC manager CLIs execute successfully. This is essential
for the HTC and the HTCSnap agents to be able to fetch HTC-related data
and to succesfully perform failover, switchover, and other operations.
Sample output:
CVMVolDg SupportedActions import deport vxdctlenable
Sample output:
CVMVolDg SupportedActions
Replication link
Global clusters do not require system zones because failover occurs on a remote
cluster if all local targets have been exhausted.
Note: You must not change the replication state of devices from primary to
secondary and from secondary to primary, outside of a VCS setup. The agent for
Hitachi TrueCopy/HUR/HP-XP Continuous Access fails to detect a change in the
replication state if the role reversal is done externally and RoleMonitor is disabled.
The action entry point displays the RPO. The agent does not store the computed
RPO; make a note of the RPO for future reference.
If the RPO is not reported, it indicates that the agent needs more time to finish
computing the RPO. Wait for some more time before you run the
GetCurrentRPO action function again.
3 To stop RPO computation, run the following command:
hares -modify HTC_resource_name ComputeDRSLA 0 -sys system_name
CVMDeportOnOffline
The CVMVolDg agent uses the CVMDeportOnOffline attribute to determine whether
or not to deport a shared disk group when the corresponding CVMVolDg resource
is taken offline.
0 Does not deport the disk group when the CVMVolDg resource
is taken offline.
(Default)
Note: You must set the CVMDeportOnOffline attribute to 1 for all the CVMVolDg
resources that depend on the VCS hardware replicated managed devices, such as
HTC.
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 46
Configuring the agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access
Run the following command to verify that the attribute value is set as expected:
# hares -display cvmvoldg_res | grep CVMDeportOnOffline
ClearClone
The HTC agent uses the ClearClone attribute to update the on-disk UDID content
for HTC hardware-replicated devices at a disk group level.
Note: Do not use the ClearClone attribute with hardware clone devices like Hitachi
ShadowImage.
When the DiskGroup or CVMVolDg resources are defined with the ClearClone
attribute set to 1, VCS calls the underlying VxVM command to import the disk group.
VxVM provides the VxDg import option (-c) to update the UDID-related content.
When the-c option is used, the udid_mismatch and the subsequent clone_disk
flags are cleared in a single operation from the disks in the specified disk group.
For shared disk groups, ensure that the -c option is specified in the CVMVolDg
import actions script.
VxVM performs additional checks when using DMP to determine whether the device
is a hardware replicated device, or a hardware clone. This additional safeguard is
not available when using third-party drivers such as MPxIO, MPIO, and EMC
PowerPath.
Note: MPIO and EMC PowerPath are not supported with HTC when it is used in
combination with VxVM or CVM and therefore with the VCS agent for HTC.
■ How VCS recovers from various disasters in an HA/DR setup with Hitachi
TrueCopy/HUR/Hewlett-Packard XP Continuous Access
Global clusters When a site-wide global service group or system fault occurs, VCS
failover behavior depends on the value of the ClusterFailOverPolicy
attribute for the faulted global service group. The Cluster Server agent
for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access
ensures safe and exclusive access to the configured Hitachi
TrueCopy/HUR/Hewlett-Packard XP Continuous Access devices.
Replicated data When service group or system faults occur, VCS failover behavior
clusters depends on the value of the AutoFailOver attribute for the faulted service
group. The VCS agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access ensures safe and exclusive access to the configured
Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access devices.
Refer to the Cluster Server Administrator's Guide for more information on the DR
configurations and the global service group attributes.
Table 4-1 Failure scenarios in a global cluster configuration with the Cluster
Server agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access
Application failure Application cannot start successfully on any hosts at the primary site.
VCS response at the secondary site:
■ Causes global service group at the primary site to fault and displays an alert to
indicate the fault.
■ Does the following based on the ClusterFailOverPolicy global service group attribute:
■ Auto or Connected—VCS automatically brings the faulted global group online at
the secondary site.
■ Manual—No action. You must bring the global group online at the secondary site.
Agent response:
See “Performing failback after a node failure or an application failure” on page 62.
See “Performing failback after a node failure or an application failure” on page 62.
Managing and testing clustering support for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 51
How VCS recovers from various disasters in an HA/DR setup with Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access
Table 4-1 Failure scenarios in a global cluster configuration with the Cluster
Server agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access (continued)
Site failure All hosts and the storage at the primary site fail.
■ 1—The agent issues the horctakeover command to make the HTC devices
write-enabled. The HTC devices go into the SSWS (Suspend for Swapping with
S-VOL side only) state. If the original primary site is restored, you must execute the
pairresync-swaps action on the secondary site to establish reverse replication.
■ 0—Agent does not perform failover to the secondary site.
Table 4-1 Failure scenarios in a global cluster configuration with the Cluster
Server agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access (continued)
Replication link failure Replication link between the arrays at the two sites fails.
■ 0—No action.
■ 1—The agent periodically attempts to resynchronize the S-VOL side using the
pairresync command.
The agent also logs a warning message to indicate that the replication link is broken.
■ 2—The agent periodically attempts to resynchronize the S-VOL side and also sends
notifications about the disconnected link. Notifications are sent in the form of either
SNMP traps or emails. For information about the VCS NotifierMngr agent, refer to
the Cluster Server Bundled Agents Reference Guide.
If the value of the LinkMonitor attribute is not set to 1 or 2, you must manually
resynchronize the HTC devices after the link is restored.
To manually resynchronize the HTC devices after the link is restored:
■ Before you resync the S-VOL device, you must split off the Shadow Image device
from the S-VOL device at the secondary site.
■ You must initiate resync of the S-VOL device using the agent's pairresync action.
■ After P-VOL and S-VOL devices are in sync, re-establish the mirror relationship
between the Shadow Copy and the S-VOL devices.
If you initiate a failover to the secondary site when resync is in progress, the online
function of the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
waits for the resync to complete and then initiates a takeover of the S-VOL devices.
Note: If you did not configure Shadow Copy devices and if disaster occurs when resync
is in progress, then the data at the secondary site becomes inconsistent. Veritas
recommends configuring Shadow Copy devices at both the sites.
Table 4-1 Failure scenarios in a global cluster configuration with the Cluster
Server agent for Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access (continued)
Network failure The network connectivity and the replication link between the sites fail.
VCS response at the secondary site:
■ VCS at each site concludes that the remote cluster has faulted.
■ Does the following based on the ClusterFailOverPolicy global service group attribute:
■ Manual or Connected—No action. You must confirm the cause of the network
failure from the cluster administrator at the remote site and fix the issue.
■ Auto—VCS brings the global group online at the secondary site which may lead
to a site-wide split brain. This causes data divergence between the devices on
the primary and the secondary arrays.
When the network (wac and replication) connectivity restores, you must manually
resync the data.
Note: Veritas recommends that the value of the ClusterFailOverPolicy attribute
is set to Manual for all global groups to prevent unintended failovers due to
transient network failures.
■ Causes the global service group at the primary site to fault and displays an alert to
indicate the fault.
■ Does the following based on the ClusterFailOverPolicy global service group attribute:
■ Auto or Connected—VCS automatically brings the faulted global service group
online at the secondary site.
■ Manual—No action. You must bring the global group online at the secondary site.
Agent response: The agent does the following based on the SplitTakeover attribute of
the HTC resource:
■ 1—The agent issues the horctakeover command to make the HTC devices
write-enabled. The S-VOL devices go into the SSWS state.
■ 0—The agent faults the HTC resource.
Managing and testing clustering support for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 54
How VCS recovers from various disasters in an HA/DR setup with Hitachi TrueCopy/HUR/Hewlett-Packard XP
Continuous Access
Application failure Application cannot start successfully on any hosts at the primary site.
VCS response:
Agent response:
See “Performing failback after a node failure or an application failure” on page 62.
Agent response:
See “Performing failback after a node failure or an application failure” on page 62.
Site failure All hosts and the storage at the primary site fail.
VCS response:
Agent response: The agent does the following based on the SplitTakeover attribute
of the HTC resource:
■ 1— The agent issues the horctakeover command to make the HTC devices
write-enabled. The HTC devices go into the SSWS (Suspend for Swapping with
S-VOL side only) state. If the original primary site is restored, you must execute
the pairresync-swaps action on the secondary site to establish reverse replication.
■ 0 — Agent does not perform failover to the secondary site.
Replication link failure Replication link between the arrays at the two sites fails.
■ 0—No action.
■ 1—The agent periodically attempts to resynchronize the S-VOL side using the
pairresync command.
The agent also logs a warning message to indicate that the replication link is broken.
■ 2—The agent periodically attempts to resynchronize the S-VOL side and also
sends notifications about the disconnected link. Notifications are sent in the form
of either SNMP traps or emails. For information about the VCS NotifierMngr agent,
refer to the Cluster Server Bundled Agents Reference Guide.
If the value of the LinkMonitor attribute is not set to 1 or 2, you must manually
resynchronize the HTC devices after the link is restored.
2 You must initiate resync of S-VOL device using the agent's pairresync action.
3 After P-VOL and S-VOL devices are in sync, reestablish the mirror relationship
between the Shadow Copy and the S-VOL devices.
If you initiate a failover to the secondary site when resync is in progress, the online
function of the Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access agent
waits for the resync to complete and then initiates a takeover of the S-VOL devices.
Note: If you did not configure Shadow Copy devices and if disaster occurs when
resync is in progress, then the data at the secondary site becomes inconsistent. Veritas
recommends configuring Shadow Copy devices at both the sites.
Network failure The LLT and the replication links between the sites fail.
VCS response:
■ VCS at each site concludes that the nodes at the other site have faulted.
■ Does the following based on the AutoFailOver attribute for the faulted service
group:
■ 2—No action. You must confirm the cause of the network failure from the cluster
administrator at the remote site and fix the issue.
■ 1—VCS brings the service group online at the secondary site which leads to a
cluster-wide split brain. This causes data divergence between the devices on
the arrays at the two sites.
When the network (LLT and replication) connectivity is restored, VCS takes all
the service groups offline on one of the sites and restarts itself. This action
eliminates concurrency violation where in the same group is online at both the
sites.
After taking the service group offline, you must manually resynchronize the
data.
Note: Veritas recommends that the value of the AutoFailOver attribute is set
to 2 for all service groups to prevent unintended failovers due to transient
network failures.
Depending on the site whose data you want to retain run the pairresync or
the pairresync-swap command.
■ Causes the service group at the primary site to fault and displays an alert to indicate
the fault.
■ Does the following based on the AutoFailOver attribute for the faulted service
group:
■ 1—VCS automatically brings the faulted service group online at the secondary
site.
■ 2—You must bring the service group online at the secondary site.
Agent response: The agent does the following based on the SplitTakeover attribute
of the HTC resource:
■ 1—The agent issues the horctakeover command to make the HTC devices
write-enabled. The S-VOL devices go into the SSWS state.
■ 0—The agent does not perform failover to the secondary site.
Link fails and is restored, but never, async Run the pairresync action
application does not fail over. to resynchronize the S-Vols.
Link fails and application fails never, async, or data Run the pairresync-swaps
to the S-VOL side. action to promote the S-VOLs
to P-VOLs, and resynchronize
the original P-VOLs.
VCS brings the global service group online on a node at the secondary site.
■ Verify that the HTC devices at the secondary site are write-enabled and
the device state is PAIR.
2 Fail back the global service group from the secondary site to the primary site.
Perform the following steps:
■ Switch the global service group from the secondary site to the primary site.
VCS brings the global service group online at the primary site.
■ Verify that the HTC devices at the secondary site are write-enabled and
the device state is PAIR.
VCS brings the service group online on a node at the secondary site.
■ Verify that the HTC devices at the secondary site are write-enabled, and
the device state is PAIR.
2 Fail back the service group from the secondary site to the primary site.
Managing and testing clustering support for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 60
Testing disaster recovery after host failure
VCS brings the service group online on a node at the primary site.
■ Verify that the HTC devices at the secondary site are write-enabled, and
the device state is PAIR.
2 Verify that the global service group is online at the secondary site.
3 Verify that the HTC devices at the secondary site are write-enabled and the
device state is PAIR.
Managing and testing clustering support for Hitachi TrueCopy/HUR/Hewlett-Packard XP Continuous Access 61
Testing disaster recovery after site failure
To test disaster recovery for host failure in replicated data cluster setup
1 Halt the hosts at the primary site.
The value of the AutoFailOver attribute for the faulted service group determines
the VCS failover behavior.
■ 1—VCS brings the faulted service group online at the secondary site.
■ 2—You must bring the service group online at the secondary site.
On a node in the secondary site, run the following command:
3 Verify that the HTC devices at the secondary site are write-enabled and the
device state is SSWS.
2 Verify that the HTC devices at the secondary site are write-enabled and the
device state is SSWS.
3 Verify that the global service group is online at the secondary site.
To test disaster recovery for site failure in replicated data cluster setup
1 Halt all hosts and the arrays at the primary site.
If you cannot halt the array at the primary site, then disable the replication link
between the two arrays.
The value of the AutoFailOver attribute for the faulted global service group
determines the VCS failover behavior.
■ 1—VCS brings the faulted global service group online at the secondary
site.
■ 2—You must bring the global service group online at the secondary site.
On a node in the secondary site, run the following command:
2 Verify that the HTC devices at the secondary site are write-enabled and the
device state is SSWS.
3 Verify that the global service group is online at the secondary site.
VCS brings the global service group online at the primary site.
2 Verify that the HTC devices at the primary site are write-enabled and the device
state is PAIR.
To perform failback after a host failure or an application failure in replicated
data cluster
1 Switch the global service group from the secondary site to any node in the
primary site.
VCS brings the global service group online on a node at the primary site.
2 Verify that the HTC devices at the primary site are write-enabled and the device
state is PAIR.
2 Since the application has made writes on the secondary due to a failover,
resynchronize the primary from the secondary site and reverse the
P-VOL/S-VOL roles with the pairresync-swaps action on the secondary site.
After the resync is complete, the devices in the secondary are P-VOL and the
devices in the primary are S-VOL. The device state is PAIR at both the sites.
3 Bring the global service group online at the primary site. On a node in the
primary site, run the following command:
2 Since the application has made writes on the secondary due to a failover,
resync the primary from the secondary site and reverse the P-VOL/S-VOL
roles with the pairresync-swaps action on the secondary site.
After the resync is complete, the devices in the secondary are P-VOL and the
devices in the primary are S-VOL. The device state is PAIR at both the sites.
3 Bring the global service group online at the primary site. On a node in the
primary site, run the following command:
The HTCSnap agent supports fire drill for storage devices that are managed using
Veritas Volume Manager.
The agent supports fire drill in a Storage Foundation for Oracle RAC environment.
Gold Runs the fire drill on a snapshot of the target array. The replicated
device keeps receiving writes from the primary.
You can use the Gold configuration only with ShadowImage pairs
created without the -m noread flag to the paircreate command.
Silver VCS takes a snapshot, but does not run the fire drill on the snapshot
data. VCS breaks replication and runs the fire drill on the replicated
target device.
You can use the Silver configuration only with ShadowImage pairs
created with the -m noread flag to the paircreate command.
Setting up a fire drill 67
About the HTCSnap agent
Bronze VCS breaks replication and runs the fire drill test on the replicated target.
VCS does not take a snapshot in this configuration.
■ Suspends replication.
■ Brings the fire drill service group online using the data on the target
array.
Function Description
monitor Verifies the existence of the lock file to make sure the
resource is online.
clean Restores the state of the LUNs to their original state after
a failed online function.
type HTCSnap (
static keylist RegList = { MountSnapshot, UseSnapshot }
static keylist SupportedActions = { clearvm }
static str ArgList[] = { TargetResName, MountSnapshot,
UseSnapshot, RequireSnapshot, ShadowInstance }
str TargetResName
int ShadowInstance
Setting up a fire drill 69
About the HTCSnap agent
int MountSnapshot
int UseSnapshot
int RequireSnapshot
temp str Responsibility
temp str FDFile
temp str VCSResLock
)
Attribute Description
Type-Dimension: integer-scalar
Type-Dimension: string-scalar
Type-Dimension: integer-scalar
Attribute Description
Type-Dimension: integer-scalar
Note: Set this attribute to 1 only if UseSnapshot is
set to 1.
Type-Dimension: integer-scalar
Note: Set this attribute to 1 only if the UseSnapshot
attribute is set to 1.
MountSnapshot 1 0 0
UseSnapshot 1 1 0
■ If you plan to use Gold or Silver configuration, make sure ShadowImage for
TrueCopy is installed and configured at the target array.
■ For the Gold configuration, you must use Veritas Volume Manager to import
and deport the storage.
■ You can use the Silver configuration only with ShadowImage pairs that are
created with the -m noread flag to the paircreate command. A fire drill uses
the -E flag to split the pairs, which requires a 100% resynchronization. The Silver
mode that preserves the snapshots as noread after a split.
■ The name of the ShadowImage device group must be the same as the replicated
device group for both replicated and non-replicated LUNs that are to be snapshot.
The instance number may be different.
■ Make sure the HORCM instance managing the S-VOLs runs continuously; the
agent does not start this instance.
■ For non-replicated devices:
■ You must use Veritas Volume Manager.
On HP-UX, you must use Veritas Volume Manager 5.0 MP1.
■ For Gold configuration to run without the Bronze mode, set the
RequireSnapshot attribute to 1.
See “Creating the fire drill service group using Cluster Manager (Java Console)”
on page 72.
■ Fire Drill Setup wizard
This text-based wizard is available at /opt/VRTSvcs/bin/fdsetup-htc.
See “Creating the fire drill service group using the Fire Drill SetUp Wizard”
on page 74.
Note: If multiple disk groups are dependent on the HTC or the HTCSnap resources
in the application service group, then you must use the text-based Fire Drill Setup
wizard to create the fire drill service group.
Creating the fire drill service group using Cluster Manager (Java
Console)
This section describes how to use Cluster Manager (Java Console) to create the
fire drill service group. After creating the fire drill service group, you must set the
failover attribute to false so that the fire drill service group does not fail over to
another node during a test.
To create the fire drill service group
1 Open the Cluster Manager (Java Console).
2 Log on to the cluster and click OK.
3 Click the Service Group tab in the left pane and click the Resources tab in
the right pane.
4 Right-click the cluster in the left pane and click Add Service Group.
5 In the Add Service Group dialog box, provide information about the new
service group.
■ In Service Group name, enter a name for the fire drill service group.
■ Select systems from the Available Systems box and click the arrows to add
them to the Systems for Service Group box.
■ Click OK.
5 Right-click the resource to be edited and click View > Properties View. If a
resource to be edited does not appear in the pane, click Show All Attributes.
6 Edit attributes to reflect the configuration at the remote site. For example,
change the Mount resources so that they point to the volumes that are used
in the fire drill service group.
Creating the fire drill service group using the Fire Drill SetUp Wizard
This section describes how to use the Fire Drill SetUp Wizard to create the fire drill
service group.
See “Fire drill configurations ” on page 66.
To create the fire drill service group
1 Start the Fire Drill SetUp Wizard.
/opt/VRTSvcs/bin/fdsetup-htc
2 Enter the name of the application service group for which you want to configure
a fire drill service group.
3 Select the supported snapshot configurations:
Gold, Silver, or Bronze
4 Choose whether to run a Bronze fire drill, if the snapshot fails with Gold or
Silver configurations.
If snapshot fails, should bronze be used? [y,n,q](n)
10 Schedule fire drill for the service group by adding the following command to
the crontab to be run at regular intervals.
/opt/VRTSvcs/bin/fdsched-htc
11 Make fire drill highly available by adding the following command to the crontab
on every node in this cluster.
fdsched-htc
HTCSnap oradg_fd {
TargetResName = "DG"
ShadowInstance = 5
UseSnapshot = 1
RequireSnapshot = 0
Setting up a fire drill 76
Sample configuration for a fire drill service group
MountSnapshot = 1
}
Index
I V
installing the agent VCSResLock attribute 25
AIX systems 19
Linux systems 19
Solaris systems 19
Instance attribute 25
L
LinkMonitor attribute 25
M
MountSnapshot attribute 70
R
Recovery Point Objective (RPO)
ComputeDRSLA attribute 29
Configuring RPO computation support 44
GetCurrentRPO function 16
Tagging attribute 29
replicated data clusters
failure scenarios 54
RequireSnapshot attribute 70
resource type definition
Hitachi TrueCopy agent 24
HTCSnap agent 68
S
sample configuration 36
split-brain
handling in cluster 41
SplitTakeover attribute 25
T
TargetFrozen attribute 25
type definition
Hitachi TrueCopy agent 24
HTCSnap agent 68
typical setup 12
U
uninstalling the agent
AIX systems 22
Linux systems 22
Solaris systems 22
UseSnapshot attribute 69