Managing SGLX 11.19
Managing SGLX 11.19
Tenth Edition
Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of
merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held liable for errors contained herein or direct,
indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
Warranty. A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can be
obtained from your local Sales and Service Office.
Restricted Rights Legend. Use, duplication or disclosure by the U.S. Government is subject to restrictions as set forth in
subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 for DOD agencies,
and subparagraphs (c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause at FAR 52.227-19 for other
agencies.
Hewlett-Packard Company
19420 Homestead Road
Cupertino, California 95014 U.S.A.
Use of this manual and flexible disk(s) or tape cartridge(s) supplied for this pack is restricted to this product only. Additional
copies of the programs may be made for security and back-up purposes only. Resale of the programs in their present form or
with alterations, is expressly prohibited.
Copyright Notices
Reproduction, adaptation, or translation of this document without prior written permission is prohibited, except as allowed under
copyright laws.
Trademark Notices
Preface.......................................................................................................................................21
Table of Contents 3
Service Assistant Daemon: cmserviced...................................................................40
Quorum Server Daemon: qs....................................................................................41
Utility Daemon: cmlockd.........................................................................................41
Cluster SNMP Agent Daemon: cmsnmpd...............................................................41
Cluster WBEM Agent Daemon: cmwbemd.............................................................42
Proxy Daemon: cmproxyd.......................................................................................42
How the Cluster Manager Works ......................................................................................42
Configuration of the Cluster ........................................................................................42
Heartbeat Messages ......................................................................................................43
Manual Startup of Entire Cluster..................................................................................43
Automatic Cluster Startup ...........................................................................................44
Dynamic Cluster Re-formation ....................................................................................44
Cluster Quorum to Prevent Split-Brain Syndrome.......................................................44
Cluster Lock...................................................................................................................45
Use of a Lock LUN as the Cluster Lock........................................................................45
Use of the Quorum Server as a Cluster Lock................................................................46
No Cluster Lock ............................................................................................................48
What Happens when You Change the Quorum Configuration Online.......................48
How the Package Manager Works.....................................................................................49
Package Types...............................................................................................................49
Non-failover Packages.............................................................................................49
Failover Packages.....................................................................................................50
Configuring Failover Packages ..........................................................................50
Deciding When and Where to Run and Halt Failover Packages .......................51
Failover Packages’ Switching Behavior..............................................................51
Failover Policy....................................................................................................53
Automatic Rotating Standby..............................................................................54
Failback Policy....................................................................................................57
On Combining Failover and Failback Policies...................................................60
Using Older Package Configuration Files.....................................................................60
How Packages Run.............................................................................................................61
What Makes a Package Run?.........................................................................................61
Before the Control Script Starts.....................................................................................64
During Run Script Execution........................................................................................64
Normal and Abnormal Exits from the Run Script........................................................66
Service Startup with cmrunserv..................................................................................66
While Services are Running..........................................................................................67
When a Service or Subnet Fails, or a Dependency is Not Met......................................67
When a Package is Halted with a Command................................................................67
During Halt Script Execution........................................................................................68
Normal and Abnormal Exits from the Halt Script........................................................69
Package Control Script Error and Exit Conditions..................................................70
How the Network Manager Works ...................................................................................71
Stationary and Relocatable IP Addresses and Monitored Subnets...............................71
4 Table of Contents
Types of IP Addresses...................................................................................................73
Adding and Deleting Relocatable IP Addresses ..........................................................73
Load Sharing ...........................................................................................................73
Bonding of LAN Interfaces ...........................................................................................74
Bonding for Load Balancing..........................................................................................77
Monitoring LAN Interfaces and Detecting Failure: Link Level....................................78
Monitoring LAN Interfaces and Detecting Failure: IP Level........................................78
Reasons To Use IP Monitoring.................................................................................79
How the IP Monitor Works......................................................................................79
Failure and Recovery Detection Times...............................................................81
Constraints and Limitations.....................................................................................81
Reporting Link-Level and IP-Level Failures.................................................................82
Package Switching and Relocatable IP Addresses........................................................82
Address Resolution Messages after Switching on the Same Subnet ...........................83
VLAN Configurations...................................................................................................83
What is VLAN?........................................................................................................83
Support for Linux VLAN.........................................................................................83
Configuration Restrictions.......................................................................................84
Additional Heartbeat Requirements........................................................................84
Volume Managers for Data Storage....................................................................................84
Storage on Arrays..........................................................................................................85
Monitoring Disks...........................................................................................................86
More Information on LVM............................................................................................86
About Persistent Reservations............................................................................................86
Rules and Limitations....................................................................................................87
How Persistent Reservations Work...............................................................................88
Responses to Failures .........................................................................................................89
Reboot When a Node Fails ...........................................................................................89
What Happens when a Node Times Out.................................................................90
Example .............................................................................................................90
Responses to Hardware Failures ..................................................................................91
Responses to Package and Service Failures .................................................................92
Service Restarts .......................................................................................................92
Network Communication Failure ...........................................................................92
Table of Contents 5
FibreChannel............................................................................................................95
Multipath for Storage ..............................................................................................96
Disk I/O Information ....................................................................................................96
Hardware Configuration Worksheet ............................................................................97
Power Supply Planning .....................................................................................................97
Power Supply Configuration Worksheet .....................................................................98
Cluster Lock Planning........................................................................................................98
Cluster Lock Requirements...........................................................................................98
Planning for Expansion.................................................................................................99
Using a Quorum Server.................................................................................................99
Quorum Server Worksheet .....................................................................................99
Volume Manager Planning ................................................................................................99
Volume Groups and Physical Volume Worksheet......................................................100
Cluster Configuration Planning .......................................................................................100
Heartbeat Subnet and Cluster Re-formation Time .....................................................100
About Hostname Address Families: IPv4-Only, IPv6-Only, and Mixed Mode..........101
What Is IPv4–only Mode?......................................................................................102
What Is IPv6-Only Mode?......................................................................................102
Rules and Restrictions for IPv6-Only Mode.....................................................102
Recommendations for IPv6-Only Mode...........................................................104
What Is Mixed Mode?............................................................................................104
Rules and Restrictions for Mixed Mode...........................................................104
Cluster Configuration Parameters ..............................................................................105
Cluster Configuration: Next Step ...............................................................................123
Package Configuration Planning .....................................................................................123
Logical Volume and File System Planning .................................................................123
Planning for Expansion...............................................................................................125
Choosing Switching and Failover Behavior................................................................125
About Package Dependencies.....................................................................................126
Simple Dependencies.............................................................................................126
Rules for Simple Dependencies.............................................................................127
Dragging Rules for Simple Dependencies........................................................128
Guidelines for Simple Dependencies.....................................................................131
Extended Dependencies.........................................................................................132
Rules for Exclusionary Dependencies..............................................................133
Rules for different_node and any_node Dependencies...................................134
About Package Weights...............................................................................................134
Package Weights and Node Capacities..................................................................134
Configuring Weights and Capacities.....................................................................135
Simple Method.......................................................................................................135
Example 1..........................................................................................................135
Points to Keep in Mind.....................................................................................136
Comprehensive Method.........................................................................................137
Defining Capacities...........................................................................................137
6 Table of Contents
Defining Weights..............................................................................................139
Rules and Guidelines.............................................................................................142
For More Information.............................................................................................142
How Package Weights Interact with Package Priorities and Dependencies.........143
Example 1..........................................................................................................143
Example 2..........................................................................................................143
About External Scripts.................................................................................................143
Using Serviceguard Commands in an External Script..........................................146
Determining Why a Package Has Shut Down.......................................................147
last_halt_failed Flag..........................................................................................147
About Cross-Subnet Failover......................................................................................147
Implications for Application Deployment.............................................................148
Configuring a Package to Fail Over across Subnets: Example..............................149
Configuring node_name...................................................................................149
Configuring monitored_subnet_access............................................................149
Configuring ip_subnet_node............................................................................150
Configuring a Package: Next Steps.............................................................................150
Planning for Changes in Cluster Size...............................................................................150
Table of Contents 7
Enabling Volume Group Activation Protection.....................................................169
Building Volume Groups: Example for Smart Array Cluster Storage (MSA 2000
Series).....................................................................................................................170
Building Volume Groups and Logical Volumes....................................................171
Distributing the Shared Configuration to all Nodes..............................................172
Testing the Shared Configuration..........................................................................173
Storing Volume Group Configuration Data ..........................................................175
Preventing Boot-Time vgscan and Ensuring Serviceguard Volume Groups
Are Deactivated................................................................................................176
Setting up Disk Monitoring...................................................................................176
Configuring the Cluster....................................................................................................177
cmquerycl Options......................................................................................................177
Speeding up the Process........................................................................................177
Specifying the Address Family for the Cluster Hostnames...................................178
Specifying the Address Family for the Heartbeat .................................................178
Full Network Probing............................................................................................179
Specifying a Lock LUN................................................................................................179
Specifying a Quorum Server.......................................................................................179
Obtaining Cross-Subnet Information..........................................................................180
Identifying Heartbeat Subnets....................................................................................182
Specifying Maximum Number of Configured Packages ...........................................182
Modifying the MEMBER_TIMEOUT Parameter.........................................................182
Controlling Access to the Cluster................................................................................183
A Note about Terminology....................................................................................183
How Access Roles Work........................................................................................183
Levels of Access......................................................................................................184
Setting up Access-Control Policies.........................................................................185
Role Conflicts....................................................................................................188
Package versus Cluster Roles.................................................................................189
Verifying the Cluster Configuration ...........................................................................189
Cluster Lock Configuration Messages........................................................................190
Distributing the Binary Configuration File ................................................................190
Managing the Running Cluster........................................................................................191
Checking Cluster Operation with Serviceguard Commands.....................................191
Setting up Autostart Features .....................................................................................192
Changing the System Message ...................................................................................193
Managing a Single-Node Cluster................................................................................193
Single-Node Operation..........................................................................................193
Disabling identd..........................................................................................................194
Deleting the Cluster Configuration ............................................................................195
8 Table of Contents
Types of Package: Failover, Multi-Node, System Multi-Node....................................198
Package Modules and Parameters...............................................................................199
Base Package Modules...........................................................................................200
Optional Package Modules....................................................................................202
Package Parameter Explanations.................................................................................204
package_name...........................................................................................................204
module_name...........................................................................................................205
module_version.........................................................................................................205
package_type............................................................................................................205
package_description..................................................................................................205
node_name...............................................................................................................205
auto_run..................................................................................................................206
node_fail_fast_enabled...............................................................................................206
run_script_timeout...................................................................................................207
halt_script_timeout...................................................................................................207
successor_halt_timeout.............................................................................................208
script_log_file...........................................................................................................208
operation_sequence...................................................................................................208
log_level...................................................................................................................208
failover_policy..........................................................................................................209
failback_policy..........................................................................................................209
priority.....................................................................................................................209
dependency_name.....................................................................................................210
dependency_condition...............................................................................................210
dependency_location.................................................................................................211
weight_name, weight_value.......................................................................................211
monitored_subnet.....................................................................................................212
monitored_subnet_access...........................................................................................212
ip_subnet.................................................................................................................213
ip_subnet_node ........................................................................................................214
ip_address................................................................................................................214
service_name............................................................................................................214
service_cmd..............................................................................................................215
service_restart..........................................................................................................215
service_fail_fast_enabled...........................................................................................216
service_halt_timeout.................................................................................................216
vgchange_cmd..........................................................................................................216
vg............................................................................................................................216
File system parameters...........................................................................................216
concurrent_fsck_operations.......................................................................................217
concurrent_mount_and_umount_operations..............................................................217
fs_mount_retry_count...............................................................................................217
fs_umount_retry_count ............................................................................................218
fs_name....................................................................................................................218
Table of Contents 9
fs_directory..............................................................................................................218
fs_type.....................................................................................................................218
fs_mount_opt...........................................................................................................219
fs_umount_opt.........................................................................................................219
fs_fsck_opt................................................................................................................219
pv............................................................................................................................219
pev_.........................................................................................................................219
external_pre_script...................................................................................................220
external_script..........................................................................................................220
user_host..................................................................................................................220
user_name................................................................................................................221
user_role..................................................................................................................221
Additional Parameters Used Only by Legacy Packages........................................221
Generating the Package Configuration File......................................................................222
Before You Start...........................................................................................................222
cmmakepkg Examples.................................................................................................222
Next Step.....................................................................................................................223
Editing the Configuration File..........................................................................................223
Verifying and Applying the Package Configuration........................................................227
Adding the Package to the Cluster...................................................................................228
Creating a Disk Monitor Configuration...........................................................................228
10 Table of Contents
Adding Previously Configured Nodes to a Running Cluster.....................................240
Removing Nodes from Participation in a Running Cluster........................................241
Using Serviceguard Commands to Remove a Node from Participation in a
Running Cluster ....................................................................................................241
Halting the Entire Cluster ...........................................................................................242
Automatically Restarting the Cluster .........................................................................242
Managing Packages and Services ....................................................................................242
Starting a Package .......................................................................................................242
Starting a Package that Has Dependencies............................................................243
Halting a Package .......................................................................................................243
Halting a Package that Has Dependencies............................................................243
Moving a Failover Package .........................................................................................244
Changing Package Switching Behavior ......................................................................244
Maintaining a Package: Maintenance Mode.....................................................................245
Characteristics of a Package Running in Maintenance Mode or Partial-Startup
Maintenance Mode .....................................................................................................246
Rules for a Package in Maintenance Mode or Partial-Startup Maintenance Mode
................................................................................................................................247
Additional Rules for Partial-Startup Maintenance Mode.................................247
Dependency Rules for a Package in Maintenance Mode or Partial-Startup
Maintenance Mode ................................................................................................248
Performing Maintenance Using Maintenance Mode..................................................248
Procedure...............................................................................................................248
Performing Maintenance Using Partial-Startup Maintenance Mode..........................249
Procedure...............................................................................................................249
Excluding Modules in Partial-Startup Maintenance Mode...................................250
Reconfiguring a Cluster....................................................................................................251
Previewing the Effect of Cluster Changes...................................................................252
What You Can Preview..........................................................................................252
Using Preview mode for Commands and in Serviceguard Manager....................253
Using cmeval..........................................................................................................254
Reconfiguring a Halted Cluster ..................................................................................255
Reconfiguring a Running Cluster................................................................................255
Adding Nodes to the Configuration While the Cluster is Running .....................256
Removing Nodes from the Cluster while the Cluster Is Running ........................256
Changing the Cluster Networking Configuration while the Cluster Is Running.......257
What You Can Do...................................................................................................257
What You Must Keep in Mind...............................................................................258
Example: Adding a Heartbeat LAN.......................................................................259
Example: Deleting a Subnet Used by a Package....................................................260
Updating the Cluster Lock LUN Configuration Online.............................................261
Changing MAX_CONFIGURED_PACKAGES.............................................................261
Configuring a Legacy Package.........................................................................................262
Creating the Legacy Package Configuration ..............................................................262
Table of Contents 11
Using Serviceguard Manager to Configure a Package .........................................262
Using Serviceguard Commands to Configure a Package .....................................262
Configuring a Package in Stages......................................................................263
Editing the Package Configuration File............................................................263
Creating the Package Control Script...........................................................................265
Customizing the Package Control Script ..............................................................266
Adding Customer Defined Functions to the Package Control Script ...................267
Adding Serviceguard Commands in Customer Defined Functions ...............267
Support for Additional Products...........................................................................268
Verifying the Package Configuration..........................................................................268
Distributing the Configuration....................................................................................268
Distributing the Configuration And Control Script with Serviceguard
Manager.................................................................................................................269
Copying Package Control Scripts with Linux commands.....................................269
Distributing the Binary Cluster Configuration File with Linux Commands ........269
Configuring Cross-Subnet Failover.............................................................................269
Configuring node_name........................................................................................270
Configuring monitored_subnet_access..................................................................270
Creating Subnet-Specific Package Control Scripts.................................................270
Control-script entries for nodeA and nodeB....................................................271
Control-script entries for nodeC and nodeD....................................................271
Reconfiguring a Package...................................................................................................271
Migrating a Legacy Package to a Modular Package....................................................272
Reconfiguring a Package on a Running Cluster .........................................................272
Reconfiguring a Package on a Halted Cluster ............................................................273
Adding a Package to a Running Cluster.....................................................................273
Deleting a Package from a Running Cluster ..............................................................273
Resetting the Service Restart Counter.........................................................................274
Allowable Package States During Reconfiguration ....................................................274
Changes that Will Trigger Warnings......................................................................278
Responding to Cluster Events ..........................................................................................278
Single-Node Operation ....................................................................................................279
Removing Serviceguard from a System...........................................................................279
12 Table of Contents
Examples......................................................................................................................285
Replacing LAN Cards.......................................................................................................285
Replacing a Failed Quorum Server System......................................................................286
Troubleshooting Approaches ...........................................................................................288
Reviewing Package IP Addresses ...............................................................................288
Reviewing the System Log File ..................................................................................289
Sample System Log Entries ...................................................................................289
Reviewing Object Manager Log Files .........................................................................290
Reviewing Configuration Files ...................................................................................290
Reviewing the Package Control Script .......................................................................290
Using the cmquerycl and cmcheckconf Commands.............................................291
Reviewing the LAN Configuration ............................................................................291
Solving Problems .............................................................................................................291
Name Resolution Problems.........................................................................................292
Networking and Security Configuration Errors....................................................292
Cluster Re-formations Caused by Temporary Conditions..........................................292
Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low.............292
System Administration Errors ....................................................................................293
Package Control Script Hangs or Failures ............................................................294
Package Movement Errors ..........................................................................................295
Node and Network Failures .......................................................................................296
Troubleshooting the Quorum Server...........................................................................296
Authorization File Problems..................................................................................296
Timeout Problems..................................................................................................297
Messages................................................................................................................297
Lock LUN Messages....................................................................................................297
Table of Contents 13
Designing Applications to Run on Multiple Systems .....................................................304
Avoid Node Specific Information ...............................................................................304
Obtain Enough IP Addresses ................................................................................305
Allow Multiple Instances on Same System ...........................................................305
Avoid Using SPU IDs or MAC Addresses .................................................................305
Assign Unique Names to Applications ......................................................................306
Use DNS ................................................................................................................306
Use uname(2) With Care ............................................................................................307
Bind to a Fixed Port ....................................................................................................307
Bind to Relocatable IP Addresses ...............................................................................307
Call bind() before connect() ...................................................................................308
Give Each Application its Own Volume Group .........................................................308
Use Multiple Destinations for SNA Applications ......................................................308
Avoid File Locking ......................................................................................................309
Restoring Client Connections ..........................................................................................309
Handling Application Failures ........................................................................................310
Create Applications to be Failure Tolerant .................................................................310
Be Able to Monitor Applications ................................................................................311
Minimizing Planned Downtime ......................................................................................311
Reducing Time Needed for Application Upgrades and Patches ...............................311
Provide for Rolling Upgrades ...............................................................................312
Do Not Change the Data Layout Between Releases .............................................312
Providing Online Application Reconfiguration .........................................................312
Documenting Maintenance Operations .....................................................................312
14 Table of Contents
Textual Representation of IPv6 Addresses..................................................................327
IPv6 Address Prefix.....................................................................................................328
Unicast Addresses.......................................................................................................328
IPv4 and IPv6 Compatibility.......................................................................................328
IPv4 Compatible IPv6 Addresses...........................................................................329
IPv4 Mapped IPv6 Address...................................................................................329
Aggregatable Global Unicast Addresses...............................................................329
Link-Local Addresses.............................................................................................330
Site-Local Addresses..............................................................................................330
Multicast Addresses...............................................................................................330
Network Configuration Restrictions................................................................................331
Configuring IPv6 on Linux...............................................................................................332
Enabling IPv6 on Red Hat Linux.................................................................................332
Adding persistent IPv6 Addresses on Red Hat Linux................................................332
Configuring a Channel Bonding Interface with Persistent IPv6 Addresses on Red
Hat Linux.....................................................................................................................332
Adding Persistent IPv6 Addresses on SUSE...............................................................333
Configuring a Channel Bonding Interface with Persistent IPv6 Addresses on
SUSE............................................................................................................................333
Index........................................................................................................................................341
Table of Contents 15
List of Figures
1-1 Typical Cluster Configuration ....................................................................................24
1-2 Typical Cluster After Failover ....................................................................................25
1-3 Tasks in Configuring a Serviceguard Cluster ............................................................28
2-1 Redundant LANs .......................................................................................................32
2-2 Mirrored Disks Connected for High Availability ......................................................36
3-1 Serviceguard Software Components on Linux...........................................................38
3-2 Lock LUN Operation...................................................................................................46
3-3 Quorum Server Operation..........................................................................................47
3-4 Quorum Server to Cluster Distribution......................................................................47
3-5 Package Moving During Failover...............................................................................50
3-6 Before Package Switching...........................................................................................52
3-7 After Package Switching.............................................................................................53
3-8 Rotating Standby Configuration before Failover........................................................55
3-9 Rotating Standby Configuration after Failover..........................................................56
3-10 configured_node Policy Packages after Failover...................................................57
3-11 Automatic Failback Configuration before Failover....................................................58
3-12 Automatic Failback Configuration After Failover......................................................59
3-13 Automatic Failback Configuration After Restart of node1........................................60
3-14 Legacy Package Time Line Showing Important Events..............................................63
3-15 Legacy Package Time Line .........................................................................................65
3-16 Legacy Package Time Line for Halt Script Execution.................................................69
3-17 Bonded Network Interfaces........................................................................................75
3-18 Bonded NICs...............................................................................................................76
3-19 Bonded NICs After Failure.........................................................................................77
3-20 Bonded NICs Configured for Load Balancing............................................................78
3-21 Physical Disks Combined into LUNs..........................................................................85
5-1 Access Roles..............................................................................................................184
E-1 System Management Homepage with Serviceguard Manager................................337
E-2 Cluster by Type.........................................................................................................339
16 List of Figures
List of Tables
1 .....................................................................................................................................19
3-1 Package Configuration Data.......................................................................................54
3-2 Node Lists in Sample Cluster......................................................................................58
3-3 Error Conditions and Package Movement for Failover Packages..............................70
4-1 Package Failover Behavior .......................................................................................126
5-1 Changing Linux Partition Types...............................................................................164
6-1 Base Modules.............................................................................................................201
6-2 Optional Modules......................................................................................................202
7-1 Types of Changes to the Cluster Configuration .......................................................251
7-2 Types of Changes to Packages ..................................................................................275
D-1 IPv6 Address Types...................................................................................................327
D-2 ...................................................................................................................................328
D-3 ...................................................................................................................................329
D-4 ...................................................................................................................................329
D-5 ...................................................................................................................................329
D-6 ...................................................................................................................................330
D-7 ...................................................................................................................................330
D-8 ...................................................................................................................................330
17
18
Printing History
Table 1
Printing Date Part Number Edition
The last printing date and part number indicate the current edition, which applies to
the A.11.19 version of HP Serviceguard for Linux.
The printing date changes when a new edition is printed. (Minor corrections and
updates which are incorporated at reprint do not cause the date to change.) The part
number is revised when extensive technical changes are incorporated.
New editions of this manual will incorporate all material updated since the previous
edition.
HP Printing Division:
Business Critical Computing
Hewlett-Packard Co.
19111 Pruneridge Ave.
Cupertino, CA 95014
19
20
Preface
This guide describes how to configure and manage Serviceguard for Linux on HP
ProLiant and HP Integrity servers under the Linux operating system. It is intended for
experienced Linux system administrators. (For Linux system administration tasks that
are not specific to Serviceguard, use the system administration documentation and
manpages for your distribution of Linux.)
The contents are as follows:
• Chapter 1 (page 23) describes a Serviceguard cluster and provides a roadmap for
using this guide.
• Chapter 2 (page 29) provides a general view of the hardware configurations used
by Serviceguard.
• Chapter 3 (page 37) describes the software components of Serviceguard and shows
how they function within the Linux operating system.
• Chapter 4 (page 93) steps through the planning process.
• Chapter 5 (page 153) describes the creation of the cluster configuration.
• Chapter 6 (page 197) describes the creation of high availability packages.
• Chapter 7 (page 229) presents the basic cluster administration tasks.
• Chapter 8 (page 281) explains cluster testing and troubleshooting strategies.
• Appendix A (page 299) gives guidelines for creating cluster-aware applications
that provide optimal performance in a Serviceguard environment.
• Appendix B (page 315) provides suggestions for integrating your existing
applications with Serviceguard for Linux.
• Appendix C (page 319) contains a set of empty worksheets for preparing a
Serviceguard configuration.
• Appendix D (page 327) provides information about IPv6.
• Appendix E (page 335) is an introduction to Serviceguard Manager.
Related Publications
The following documents contain additional useful information:
• HP Serviceguard for Linux Version A.11.19 Release Notes
• HP Serviceguard Quorum Server Version A.04.00 Release Notes
• Clusters for High Availability: a Primer of HP Solutions. Second Edition. HP Press,
2001 (ISBN 0-13-089355-2)
Use the following URL to access HP’s high availability documentation web page:
http://docs.hp.com/hpux/ha
Information about supported configurations is in the HP Serviceguard for Linux
Configuration Guide. For updated information on supported hardware and Linux
21
distributions refer to the HP Serviceguard for Linux Certification Matrix. Both documents
are available at:
http://www.hp.com/info/sglx
Problem Reporting
If you have any problems with the software or documentation, please contact your
local Hewlett-Packard Sales Office or Customer Service Center.
22
1 Serviceguard for Linux at a Glance
This chapter introduces Serviceguard for Linux and shows where to find different
kinds of information in this book. It includes the following topics:
• What is Serviceguard for Linux?
• Using Serviceguard Manager (page 26)
• Configuration Roadmap (page 27)
If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4
(page 93). Specific steps for setup are in Chapter 5 (page 153).
In the figure, node 1 (one of two SPU's) is running package A, and node 2 is running
package B. Each package has a separate group of disks associated with it, containing
data needed by the package's applications, and a copy of the data. Note that both nodes
are physically connected to disk arrays. However, only one node at a time may access
the data for a given group of disks. In the figure, node 1 is shown with exclusive access
to the top two disks (solid line), and node 2 is shown as connected without access to
the top disks (dotted line). Similarly, node 2 is shown with exclusive access to the
bottom two disks (solid line), and node 1 is shown as connected without access to the
bottom disks (dotted line).
Disk arrays provide redundancy in case of disk failures. In addition, a total of four data
buses are shown for the disks that are connected to node 1 and node 2. This
configuration provides the maximum redundancy and also gives optimal I/O
performance, since each package is using different buses.
Note that the network hardware is cabled to provide redundant LAN interfaces on
each node. Serviceguard uses TCP/IP network services for reliable communication
among nodes in the cluster, including the transmission of heartbeat messages, signals
from each functioning node which are central to the operation of the cluster. TCP/IP
services also are used for other types of inter-node communication. (The heartbeat is
explained in more detail in the chapter “Understanding Serviceguard Software.”)
24 Serviceguard for Linux at a Glance
Failover
Under normal conditions, a fully operating Serviceguard cluster simply monitors the
health of the cluster's components while the packages are running on individual nodes.
Any host system running in the Serviceguard cluster is called an active node. When
you create the package, you specify a primary node and one or more adoptive nodes.
When a node or its network communications fails, Serviceguard can transfer control
of the package to the next available adoptive node. This situation is shown in Figure
1-2.
After this transfer, the package typically remains on the adoptive node as long the
adoptive node continues running. If you wish, however, you can configure the package
to return to its primary node as soon as the primary node comes back online.
Alternatively, you may manually transfer control of the package back to the primary
node at the appropriate time.
Figure 1-2 does not show the power connections to the cluster, but these are important
as well. In order to remove all single points of failure from the cluster, you should
provide as many separate power circuits as needed to prevent a single point of failure
of your nodes, disks and disk mirrors. Each power circuit should be protected by an
NOTE: For more-detailed information, see Appendix E (page 335), and the section on
Serviceguard Manager in the latest version of the Serviceguard Release Notes. Check
the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature
Matrix and the latest Release Notes for up-to-date information about Serviceguard
Manager compatibility. You can find both documents at http://www.docs.hp.com
-> High Availability -> Serviceguard.
Serviceguard Manager is the graphical user interface for Serviceguard. It is available
as a “plug-in” to the web-based HP System Management Homepage (HP SMH).
You can use Serviceguard Manager to monitor, administer, and configure Serviceguard
clusters.
• You can see properties, status, and alerts of cluster, nodes, and packages.
• You can do administrative tasks such as run or halt clusters, cluster nodes, and
packages.
• You can create or modify a cluster and its packages.
See the latest Release Notes for your version of Serviceguard for Linux for an
introduction to using Serviceguard Manager, and the
Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix
for up-to-date information about Serviceguard Manager compatibility (http://
www.docs.hp.com -> High Availability -> Serviceguard for Linux).
Configuration Roadmap
This manual presents the tasks you need to perform in order to create a functioning
HA cluster using Serviceguard. These tasks are shown in Figure 1-3.
Configuration Roadmap 27
Figure 1-3 Tasks in Configuring a Serviceguard Cluster
HP recommends that you gather all the data that is needed for configuration before you
start. See Chapter 4 (page 93) for tips on gathering data.
NOTE: If you will be using a cross-subnet configuration, see also the Restrictions
(page 33) that apply specifically to such configurations.
Cross-Subnet Configurations
As of Serviceguard A.11.18 it is possible to configure multiple subnets, joined by a
router, both for the cluster heartbeat and for data, with some nodes using one subnet
and some another.
A cross-subnet configuration allows:
• Automatic package failover from a node on one subnet to a node on another
• A cluster heartbeat that spans subnets.
Restrictions
The following restrictions apply:
• All nodes in the cluster must belong to the same network domain (that is, the
domain portion of the fully-qualified domain name must be the same).
• The nodes must be fully connected at the IP level.
• A minimum of two heartbeat paths must be configured for each cluster node.
• There must be less than 200 milliseconds of latency in the heartbeat network.
• Each heartbeat subnet on each node must be physically routed separately to the
heartbeat subnet on another node; that is, each heartbeat path must be physically
separate:
— The heartbeats must be statically routed; static route entries must be configured
on each node to route the heartbeats through different paths.
— Failure of a single router must not affect both heartbeats at the same time.
• IPv6 heartbeat subnets are not supported in a cross-subnet configuration.
• IPv6–only and mixed modes are not supported in a cross-subnet configuration.
For more information about these modes, see “About Hostname Address Families:
IPv4-Only, IPv6-Only, and Mixed Mode” (page 101).
• Deploying applications in this environment requires careful consideration; see
“Implications for Application Deployment” (page 148).
NOTE: See also the Rules and Restrictions (page 30) that apply to all cluster
networking configurations.
Disk Monitoring
You can configure monitoring for disks and configure packages to be dependent on
the monitor. For each package, you define a package service that monitors the disks
that are activated by that package. If a disk failure occurs on one node, the monitor
will cause the package to fail, with the potential to fail over to a different node on which
the same disks are available.
Serviceguard Architecture
The following figure shows the main software components used by Serviceguard for
Linux. This chapter discusses these components in some detail.
Serviceguard Architecture 37
Figure 3-1 Serviceguard Software Components on Linux
Serviceguard Daemons
Serviceguard for Linux uses the following daemons:
• cmclconfd—configuration daemon
• cmcld—cluster daemon
• cmnetd—Network Manager daemon
• cmlogd—cluster system log daemon
• cmdisklockd—cluster lock LUN daemon
• cmomd—Cluster Object Manager daemon
• cmserviced—Service Assistant daemon
• qs—Quorum Server daemon
• cmlockd—utility daemon
• cmsnmpd—cluster SNMP subagent (optionally running)
• cmwbemd—WBEM daemon
• cmproxyd—proxy daemon
Each of these daemons logs to the Linux system logging files. The quorum server
daemon logs to the user specified log file, such as, /usr/local/qs/log/qs.log file
NOTE: The file cmcluster.conf contains the mappings that resolve symbolic
references to $SGCONF, $SGROOT, $SGLBIN, etc, used in the pathnames in the
subsections that follow. See “Understanding the Location of Serviceguard Files”
(page 153) for details.
Serviceguard Architecture 39
NOTE: Two of the central components of Serviceguard—Package Manager, and
Cluster Manager—run as parts of the cmcld daemon. This daemon runs at priority 94
and is in the SCHED_RR class. No other process is allowed a higher real-time priority.
Serviceguard Architecture 41
The installation of the cmsnmpd rpm configures snmpd and cmsnmpd to start up
automatically. Their startup scripts are in /etc/init.d/. The scripts can be run manually
to start and stop the daemons.
For more information, see the cmsnmpd (1)manpage.
IMPORTANT: When multiple heartbeats are configured, heartbeats are sent in parallel;
Serviceguard must receive at least one heartbeat to establish the health of a node. HP
recommends that you configure all subnets that interconnect cluster nodes as heartbeat
networks; this increases protection against multiple faults at no additional cost.
Heartbeat IP addresses must be on the same subnet on each node, but it is possible to
configure a cluster that spans subnets; see “Cross-Subnet Configurations” (page 32).
See HEARTBEAT_IP, under “Cluster Configuration Parameters ” (page 105), for more
information about heartbeat requirements. For timeout requirements and
recommendations, see the MEMBER_TIMEOUT parameter description in the same
section. For troubleshooting information, see “Cluster Re-formations Caused by
MEMBER_TIMEOUT Being Set too Low” (page 292). See also “Cluster Daemon: cmcld”
(page 39).
NOTE: The lock LUN is dedicated for use as the cluster lock, and, in addition, HP
recommends that this LUN comprise the entire disk; that is, the partition should take
up the entire disk.
The complete path name of the lock LUN is identified in the cluster configuration file.
The operation of the lock LUN is shown in Figure 3-2.
Serviceguard periodically checks the health of the lock LUN and writes messages to
the syslog file if the disk fails the health check. This file should be monitored for early
detection of lock disk problems.
A quorum server can provide quorum services for multiple clusters. Figure 3-4 illustrates
quorum server use across four clusters.
No Cluster Lock
Normally, you should not configure a cluster of three or fewer nodes without a cluster
lock. In two-node clusters, a cluster lock is required. You may consider using no cluster
lock with configurations of three or more nodes, although the decision should be
affected by the fact that any cluster may require tie-breaking. For example, if one node
in a three-node cluster is removed for maintenance, the cluster re-forms as a two-node
cluster. If a tie-breaking scenario later occurs due to a node or communication failure,
the entire cluster will become unavailable.
In a cluster with four or more nodes, you may not need a cluster lock since the chance
of the cluster being split into two halves of equal size is very small. However, be sure
to configure your cluster to prevent the failure of exactly half the nodes at one time.
For example, make sure there is no potential single point of failure such as a single
LAN between equal numbers of nodes, and that you don’t have exactly half of the
nodes on a single power circuit.
Package Types
Three different types of packages can run in the cluster; the most common is the failover
package. There are also special-purpose packages that run on more than one node at
a time, and so do not fail over. They are typically used to manage resources of certain
failover packages.
Non-failover Packages
There are two types of special-purpose packages that do not fail over and that can run
on more than one node at the same time: the system multi-node package, which runs
on all nodes in the cluster, and the multi-node package, which can be configured to
run on all or some of the nodes in the cluster. System multi-node packages are reserved
for use by HP-supplied applications.
The rest of this section describes failover packages.
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with
some nodes using one subnet and some another. This is known as a cross-subnet
configuration. In this context, you can configure packages to fail over from a node on
one subnet to a node on another, and you will need to configure a relocatable IP address
for each subnet the package is configured to start on; see “About Cross-Subnet Failover”
(page 147), and in particular the subsection “Implications for Application Deployment”
(page 148).
When a package fails over, TCP connections are lost. TCP applications must reconnect
to regain connectivity; this is not handled automatically. Note that if the package is
dependent on multiple subnets, normally all of them must be available on the target
node before the package will be started. (In a cross-subnet configuration, all the
monitored subnets that are specified for this package, and configured on the target
node, must be up.)
If the package has a dependency on a resource or another package, the dependency
must be met on the target node before the package can start.
In Figure 3-7, node1 has failed and pkg1 has been transferred to node2. pkg1's IP
address was transferred to node2 along with the package. pkg1 continues to be
available and is now running on node2. Also note that node2 now has access both to
pkg1's disk and pkg2's disk.
Failover Policy
The Package Manager selects a node for a failover package to run on based on the
priority list included in the package configuration file together with the failover_policy
parameter, also in the configuration file. The failover policy governs how the package
manager selects which node to run a package on when a specific node has not been
identified and the package needs to be started. This applies not only to failovers but
also to startup for the package, including the initial startup. The two failover policies
are configured_node (the default) and min_package_node. The parameter is set
in the package configuration file.
When the cluster starts, each package starts as shown in Figure 3-8.
If a failure occurs, the failing package would fail over to the node containing fewest
running packages:
NOTE: Under the min_package_node policy, when node2 is repaired and brought
back into the cluster, it will then be running the fewest packages, and thus will become
the new standby node.
If these packages had been set up using the configured_node failover policy, they
would start initially as in Figure 3-8, but the failure of node2 would cause the package
to start on node3, as shown in Figure 3-10.
If you use configured_node as the failover policy, the package will start up on the
highest-priority eligible node in its node list. When a failover occurs, the package will
move to the next eligible node in the list, in the configured order of priority.
Failback Policy
The use of the failback_policy parameter allows you to decide whether a package will
return to its primary node if the primary node becomes available and the package is
not currently running on the primary node. The configured primary node is the first
node listed in the package’s node list.
The two possible values for this policy are automatic and manual. The parameter is
set in the package configuration file:
As an example, consider the following four-node configuration, in which failover_policy
is set to configured_node and failback_policy is automatic:
node1 panics, and after the cluster reforms, pkgA starts running on node4:
After rebooting, node1 rejoins the cluster. At that point, pkgA will be automatically
stopped on node4 and restarted on node1.
NOTE: Setting the failback_policy to automatic can result in a package failback and
application outage during a critical production period. If you are using automatic
failback, you may want to wait to add the package’s primary node back into the cluster
until you can allow the package to be taken out of service temporarily while it switches
back to the primary node.
NOTE: If you configure the package while the cluster is running, the package does
not start up immediately after the cmapplyconf command completes. To start the
package without halting and restarting the cluster, issue the cmrunpkg or cmmodpkg
command.
How does a failover package start up, and what is its behavior while it is running?
Some of the many phases of package life are shown in Figure 3-14.
At any step along the way, an error will result in the script exiting abnormally (with
an exit code of 1). For example, if a package service is unable to be started, the control
script will exit with an error.
NOTE: This diagram is specific to legacy packages. Modular packages also run external
scripts and “pre-scripts” as explained above.
If the run script execution is not complete before the time specified in the
run_script_timeout parameter (page 207), the package manager will kill the script. During
run script execution, messages are written to a log file. For legacy packages, this is in
the same directory as the run script and has the same name as the run script and the
extension.log. For modular packages, the pathname is determined by the script_log_file
parameter in the package configuration file (page 208)). Normal starts are recorded in
the log, together with error messages or warnings related to starting the package.
NOTE: If a package is dependent on a subnet, and the subnet on the primary node
fails, the package will start to shut down. If the subnet recovers immediately (before
the package is restarted on an adoptive node), the package manager restarts the package
on the same node; no package switch occurs.
NOTE: If you use cmhaltpkg command with the-n <nodename> option, the package
is halted only if it is running on that node.
The cmmodpkg command cannot be used to halt a package, but it can disable switching
either on particular nodes or on all nodes. A package can continue running when its
switching has been disabled, but it will not be able to start on other nodes if it stops
running on its current node.
At any step along the way, an error will result in the script exiting abnormally (with
an exit code of 1). If the halt script execution is not complete before the time specified
in the halt_script_timeout (page 207), the package manager will kill the script. During
halt script execution, messages are written to a log file. For legacy packages, this is in
the same directory as the run script and has the same name as the run script and the
extension.log. For modular packages, the pathname is determined by the script_log_file
parameter in the package configuration file (page 208). Normal starts are recorded in
the log, together with error messages or warnings related to halting the package.
NOTE: This diagram applies specifically to legacy packages. Differences for modular
scripts are called out above.
Error or Exit Node Service Linux Status Halt script Package Allowed Package
Code Failfast Failfast on Primary runs after to Run on Primary Allowed to
Enabled Enabled after Error Error or Node after Error Run on
Exit Alternate
Node
Error or Exit Node Service Linux Status Halt script Package Allowed Package
Code Failfast Failfast on Primary runs after to Run on Primary Allowed to
Enabled Enabled after Error Error or Node after Error Run on
Exit Alternate
Node
Halt Script YES Either system N/A N/A (system Yes, unless
Timeout Setting reset reset) the timeout
happened
after the
cmhaltpkg
command
was executed.
NOTE: Serviceguard monitors the health of the network interfaces (NICs) and can
monitor the IP level (layer 3) network.
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with
some nodes using one subnet and some another. This is called a cross-subnet
configuration. In this context, you can configure packages to fail over from a node on
one subnet to a node on another, and you will need to configure a relocatable address
for each subnet the package is configured to start on; see “About Cross-Subnet Failover”
(page 147), and in particular the subsection “Implications for Application Deployment”
(page 148).
Types of IP Addresses
Both IPv4 and IPv6 address types are supported in Serviceguard. IPv4 addresses are
the traditional addresses of the form n.n.n.n where n is a decimal digit between 0
and 255. IPv6 addresses have the form x:x:x:x:x:x:x:x where x is the hexadecimal
value of each of eight 16-bit pieces of the 128-bit address. You can define heartbeat IPs,
stationary IPs, and relocatable (package) IPs as IPv4 or IPv6 addresses (or certain
combinations of both).
Load Sharing
Serviceguard allows you to configure several services into a single package, sharing a
single IP address; in that case all those services will fail over when the package does.
If you want to be able to load-balance services (that is, move a specific service to a less
The LANs in the non-bonded configuration have four LAN cards, each associated with
a separate non-aggregated IP address and MAC address, and each with its own LAN
name (eth1, eth2, eth3, eth4). When these ports are aggregated, all four ports are
associated with a single IP address and MAC address. In this example, the aggregated
ports are collectively known as bond0, and this is the name by which the bond is known
during cluster configuration.
Figure 3-18 shows a bonded configuration using redundant hubs with a crossover
cable.
Node1 Node2
bond0: bond0:
active active
Hub
Crossover cable
Hub
In the bonding model, individual Ethernet interfaces are slaves, and the bond is the
master. In the basic high availability configuration (mode 1), one slave in a bond assumes
an active role, while the others remain inactive until a failure is detected. (In Figure
3-18, both eth0 slave interfaces are active.) It is important that during configuration,
the active slave interfaces on all nodes are connected to the same hub. If this were not
the case, then normal operation of the LAN would require the use of the crossover
between the hubs and the crossover would become a single point of failure.
After the failure of a card, messages are still carried on the bonded LAN and are received
on the other node, but now eth1 has become active in bond0 on node1. This situation
is shown in Figure 3-19.
Various combinations of Ethernet card types (single or dual-ported) and bond groups
are possible, but it is vitally important to remember that at least two physical cards (or
physically separate on-board LAN interfaces) must be used in any combination of
channel bonds to avoid a single point of failure for heartbeat connections.
HP recommends that you configure target polling if the subnet is not private to the
cluster.
How the Network Manager Works 79
The IP Monitor section of the cmquerycl output looks similar to this:
…
Route Connectivity (no probing was performed):
IPv4:
1 16.89.143.192
16.89.120.0
IPv4:
IPv6:
…
The IP Monitor section of the cluster configuration file will look similar to the following
for a subnet on which IP monitoring is configured with target polling.
NOTE: This is the default if cmquerycl detects a gateway for the subnet in question;
see SUBNET under “Cluster Configuration Parameters ” (page 105) for more information.
IMPORTANT: By default, cmquerycl does not verify that the gateways it detects will
work correctly for monitoring. But if you use the -w full option, cmquerycl will
validate them as polling targets.
SUBNET 192.168.1.0
IP_MONITOR ON
POLLING_TARGET 192.168.1.254
To configure a subnet for IP monitoring with peer polling, edit the IP Monitor section
of the cluster configuration file to look similar to this:
SUBNET 192.168.2.0
IP_MONITOR ON
NOTE: This is the default if cmquerycl does not detect a gateway for the subnet in
question; it is equivalent to having no SUBNET entry for the subnet. See SUBNET under
“Cluster Configuration Parameters ” (page 105) for more information.
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with
some nodes using one subnet and some another. This is called a cross-subnet
configuration. In this context, you can configure packages to fail over from a node on
one subnet to a node on another, and you will need to configure a relocatable address
for each subnet the package is configured to start on; see “About Cross-Subnet Failover”
(page 147), and in particular the subsection“Implications for Application Deployment”
(page 148).
When a package switch occurs, TCP connections are lost. TCP applications must
reconnect to regain connectivity; this is not handled automatically. Note that if the
package is dependent on multiple subnets (specified as monitored_subnets in the package
configuration file), all those subnets must normally be available on the target node
VLAN Configurations
Virtual LAN configuration (VLAN) is supported in Serviceguard clusters.
What is VLAN?
VLAN is a technology that allows logical grouping of network nodes, regardless of
their physical locations.
VLAN can be used to divide a physical LAN into multiple logical LAN segments or
broadcast domains, helping to reduce broadcast traffic, increase network performance
and security, and improve manageability.
Multiple VLAN interfaces, each with its own IP address, can be configured from a
physical LAN interface; these VLAN interfaces appear to applications as ordinary
network interfaces (NICs). See the documentation for your Linux distribution for more
information on configuring VLAN interfaces.
Storage on Arrays
Figure 3-21 shows LUNs configured on a storage array. Physical disks are configured
by an array utility program into logical units, or LUNs, which are seen by the operating
system.
NOTE: LUN definition is normally done using utility programs provided by the disk
array manufacturer. Since arrays vary considerably, you should refer to the
documentation that accompanies your storage unit.
For information about configuring multipathing, see “Multipath for Storage ” (page 96).
NOTE: Persistent Reservations coexist with, and are independent of, activation
protection of volume groups. You should continue to configure activation protection
as instructed under Enabling Volume Group Activation Protection. Subject to the Rules
and Limitations spelled out below, Persistent Reservations will be applied to the cluster's
LUNs, whether or not the LUNs are configured into volume groups.
Advantages of PR are:
• Clusters that have nodes that are VMware guests can use PR, with the following
restrictions:
— Two or more VMware guests acting as nodes in the same cluster cannot run
on the same host.
(A cluster can have multiple VMware guests if each is on a separate host; and
a host can have multiple guests if each is in a different cluster.)
— Packages running on VMware guests must use Raw Device Mapping to access
the underlying physical LUNs.
Responses to Failures
Serviceguard responds to different kinds of failures in specific ways. For most hardware
failures, the response is not user-configurable, but for package and service failures,
you can choose the system’s response, within limits.
Responses to Failures 89
A reboot is also initiated by Serviceguard itself under specific circumstances; see
“Responses to Package and Service Failures ” (page 92).
Example
Situation. Assume a two-node cluster, with Package1 running on SystemA and
Package2 running on SystemB. Volume group vg01 is exclusively activated on
SystemA; volume group vg02is exclusively activated on SystemB. Package IP
addresses are assigned to SystemA and SystemB respectively.
Failure. Only one LAN has been configured for both heartbeat and data traffic. During
the course of operations, heavy application traffic monopolizes the bandwidth of the
network, preventing heartbeat packets from getting through.
Since SystemA does not receive heartbeat messages from SystemB, SystemA attempts
to re-form as a one-node cluster. Likewise, since SystemB does not receive heartbeat
messages from SystemA, SystemB also attempts to reform as a one-node cluster.
During the election protocol, each node votes for itself, giving both nodes 50 percent
of the vote. Because both nodes have 50 percent of the vote, both nodes now vie for the
cluster lock. Only one node will get the lock.
Outcome. Assume SystemA gets the cluster lock. SystemA re-forms as a one-node
cluster. After re-formation, SystemA will make sure all applications configured to run
on an existing clustered node are running. When SystemA discovers Package2 is not
running in the cluster it will try to start Package2 if Package2 is configured to run
on SystemA.
SystemB recognizes that it has failed to get the cluster lock and so cannot re-form the
cluster. To release all resources related toPackage2 (such as exclusive access to volume
Responses to Failures 91
Responses to Package and Service Failures
In the default case, the failure of the package or of a service within a package causes
the package to shut down by running the control script with the stop parameter, and
then restarting the package on an alternate node. A package will also fail if it is
configured to have a dependency on another package, and that package fails.
You can modify this default behavior by specifying that the node should halt (system
reset) before the transfer takes place. You do this by setting failfast parameters in the
package configuration file.
In cases in which package shutdown might hang, leaving the node in an unknown
state, failfast options can provide a quick failover, after which the node will be cleaned
up on reboot. Remember, however, that a system reset causes all packages on the node
to halt abruptly.
The settings of the failfast parameters in the package configuration file determine the
behavior of the package and the node in the event of a package or resource failure:
• If service_fail_fast_enabled (page 216) is set to yes in the package configuration file,
Serviceguard will reboot the node if there is a failure of that specific service.
• If node_fail_fast_enabled (page 206) is set to yes in the package configuration file,
and the package fails, Serviceguard will halt (reboot) the node on which the package
is running.
For more information, see “Package Configuration Planning ” (page 123) and Chapter 6
(page 197).
Service Restarts
You can allow a service to restart locally following a failure. To do this, you indicate a
number of restarts for each service in the package control script. When a service starts,
the variable service_restart is set in the service’s environment. The service, as it executes,
can examine this variable to see whether it has been restarted after a failure, and if so,
it can take appropriate action such as cleanup.
NOTE: Planning and installation overlap considerably, so you may not be able to
complete the worksheets before you proceed to the actual configuration. In that case,
fill in the missing elements to document the system as you proceed with the
configuration.
Subsequent chapters describe configuration and maintenance tasks in detail.
General Planning
A clear understanding of your high availability objectives will quickly help you to
define your hardware requirements and design your system. Use the following questions
as a guide for general planning:
1. What applications must continue to be available in the event of a failure?
2. What system resources (processing power, networking, SPU, memory, disk space)
are needed to support these applications?
3. How will these resources be distributed among the nodes in the cluster during
normal operation?
4. How will these resources be distributed among the nodes of the cluster in all
possible combinations of failures, especially node failures?
5. How will resources be distributed during routine maintenance of the cluster?
6. What are the networking requirements? Are all networks and subnets available?
7. Have you eliminated all single points of failure? For example:
• network points of failure.
• disk points of failure.
General Planning 93
• electrical points of failure.
• application points of failure.
Hardware Planning
Hardware planning requires examining the physical hardware itself. One useful
procedure is to sketch the hardware configuration in a diagram that shows adapter
cards and buses, cabling, disks and peripherals.
You may also find it useful to record the information on the Hardware worksheet
(page 320) indicating which device adapters occupy which slots and updating the details
as you create the cluster configuration. Use one form for each node (server).
SPU Information
SPU information includes the basic characteristics of the server systems you are using
in the cluster.
You may want to record the following on the Hardware worksheet (page 320) :
Server Series Number Enter the series number, for example, DL380 G5.
Host Name Enter the name to be used on the system as the host
name.
Memory Capacity Enter the memory in MB.
Number of I/O slots Indicate the number of slots.
Shared Storage
SCSI can be used for up to four-node clusters; FibreChannel can be used for clusters
of up to 16 nodes.
FibreChannel
FibreChannel cards can be used to connect up to 16 nodes to a disk array containing
storage. After installation of the cards and the appropriate driver, the LUNs configured
on the storage unit are presented to the operating system as device files, which can be
used to build LVM volume groups.
Hardware Planning 95
NOTE: Multipath capabilities are supported by FibreChannel HBA device drivers
and the Linux Device Mapper. Check with the storage device documentation for details.
See also “Multipath for Storage ”.
You can use the worksheet to record the names of the device files that correspond to
each LUN for the Fibre-Channel-attached storage unit.
NOTE: With the rapid evolution of Linux, the multipath mechanisms may change,
or new ones may be added. Serviceguard for Linux supports DeviceMapper multipath
(DM-MPIO) with some restrictions; see the Serviceguard for Linux Certification Matrix
at the address provided in the Preface to this manual for up-to-date information.
NOTE: md also supports software RAID; but this configuration is not currently
supported with Serviceguard for Linux.
NOTE: You cannot use more than one type of lock in the same cluster.
IMPORTANT: If you plan to use a Quorum Server, make sure you read the HP
Serviceguard Quorum Server Version A.04.00 Release Notes before you proceed. You can
find them at: http://www.docs.hp.com -> High Availability -> Quorum
Server. You should also consult the Quorum Server white papers at the same location.
NOTE: HP recommends that you use volume group names other than the default
volume group names (vg01, vg02, etc.). Choosing volume group names that represent
the high availability applications they are associated with (e.g., /dev/vgdatabase)
will simplify cluster administration.
NOTE: This applies only to hostname resolution. You can have IPv6 heartbeat and
data LANs no matter what the HOSTNAME_ADDRESS_FAMILY parameter is set to.
(IPv4 heartbeat and data LANs are allowed in IPv4 and mixed mode.)
NOTE: How the clients of IPv6-only cluster applications handle hostname resolution
is a matter for the discretion of the system or network administrator; there are no HP
requirements or recommendations specific to this case.
In IPv6-only mode, all Serviceguard daemons will normally use IPv6 addresses for
communication among the nodes, although local (intra-node) communication may
occur on the IPv4 loopback address.
For more information about IPv6, see Appendix D (page 327).
IMPORTANT: See the latest version of the Serviceguard for Linux release notes for
the most current information on these and other restrictions.
• Red Hat 5 clusters are not supported.
• All addresses used by the cluster must be in each node's /etc/hosts file. In
addition, the file must contain the following entry:
::1 localhost ipv6-localhost ipv6-loopback
For more information and recommendations about hostname resolution, see
“Configuring Name Resolution” (page 156).
• If you use a Quorum Server, you must make sure that the Quorum Server hostname
(and the alternate Quorum Server address specified by QS_ADDR, if any) resolve
to IPv6 addresses, and you must use Quorum Server version A.04.00 or later. See
the latest Quorum Server release notes for more information; you can find them
at docs.hp.com under High Availability —> Quorum Server.
NOTE: The Quorum Server itself can be an IPv6–only system; in that case it can
serve IPv6–only and mixed-mode clusters, but not IPv4–only clusters.
• If you use a Quorum Server, and the Quorum Server is on a different subnet from
cluster, you must use an IPv6-capable router.
• Hostname aliases are not supported for IPv6 addresses, because of operating
system limitations.
IMPORTANT: Check the latest Serviceguard for Linux release notes for the latest
instructions and recommendations.
• If you decide to migrate the cluster to IPv6-only mode, you should plan to do so
while the cluster is down.
IMPORTANT: See the latest version of the Serviceguard release notes for the most
current information on these and other restrictions.
• Red Hat 5 clusters are not supported.
• The hostname resolution file on each node (for example, /etc/hosts) must
contain entries for all the IPv4 and IPv6 addresses used throughout the cluster,
including all STATIONARY_IP and HEARTBEAT_IP addresses as well any private
addresses. There must be at least one IPv4 address in this file (in the case of /etc/
hosts, the IPv4 loopback address cannot be removed). In addition, the file must
contain the following entry:
::1 localhost ipv6-localhost ipv6-loopback
For more information and recommendations about hostname resolution, see
“Configuring Name Resolution” (page 156).
• You must use $SGCONF/cmclnodelist, not ~/.rhosts or /etc/hosts.equiv,
to provide root access to an unconfigured node.
See “Allowing Root Access to an Unconfigured Node” (page 155) for more
information.
• Hostname aliases are not supported for IPv6 addresses, because of operating
system limitations.
NOTE: See “Reconfiguring a Cluster” (page 251) for a summary of changes you can
make while the cluster is running.
The following parameters must be configured:
CLUSTER_NAME The name of the cluster as it will appear in the
output of cmviewcl and other commands, and
as it appears in the cluster configuration file.
The cluster name must not contain any of the
following characters: space, slash (/), backslash
(\), and asterisk (*).
IMPORTANT: CAPACITY_NAME,
WEIGHT_NAME, and weight_value must all
match exactly.
NOTE: As of the date of this manual, the Framework for HP Serviceguard Toolkits deals
specifically with legacy packages.
CAUTION: Do not use /etc/fstab to mount file systems that are used by
Serviceguard packages.
For information about creating, exporting, and importing volume groups, see “Creating
the Logical Volume Infrastructure ” (page 165).
Package fails over to the node with the • failover_policy set to min_package_node.
fewest active packages.
Package fails over to the node that is • failover_policy set to configured_node. (Default)
next on the list of nodes. (Default)
All packages switch following a system • service_fail_fast_enabled set to yes for a specific service.
reboot on the node when a specific • auto_run set to yes for all packages.
service fails. Halt scripts are not run.
All packages switch following a system • service_fail_fast_enabled set to yes for all services.
reboot on the node when any service • auto_run set to yes for all packages.
fails.
Simple Dependencies
A simple dependency occurs when one package requires another to be running on the
same node. You define these conditions by means of the parameters dependency_condition
and dependency_location, using the literal values UP and same_node, respectively. (For
NOTE: pkg1 can depend on more than one other package, and pkg2 can depend on
another package or packages; we are assuming only two packages in order to make
the rules as clear as possible.
• pkg1 will not start on any node unless pkg2 is running on that node.
• pkg1’s package_type (page 205) and failover_policy (page 209) constrain the type and
characteristics of pkg2, as follows:
— If pkg1 is a multi-node package, pkg2 must be a multi-node or system
multi-node package. (Note that system multi-node packages are not supported
for general use.)
— If pkg1 is a failover package and its failover_policy is min_package_node,
pkg2 must be a multi-node or system multi-node package.
— If pkg1 is a failover package and its failover_policy is configured_node, pkg2
must be:
◦ a multi-node or system multi-node package, or
◦ a failover package whose failover_policy is configured_node.
• pkg2 cannot be a failover package whose failover_policy is min_package_node.
• pkg2’s node_name list (page 205) must contain all of the nodes on pkg1’s.
— This means that if pkg1 is configured to run on any node in the cluster (*),
pkg2 must also be configured to run on any node.
NOTE: If pkg1 lists all the nodes, rather than using the asterisk (*), pkg2
must also list them.
— Preferably the nodes should be listed in the same order if the dependency is
between packages whose failover_policy is configured_node; cmcheckconf
and cmapplyconf will warn you if they are not.
NOTE: This applies only when the packages are automatically started (package
switching enabled); cmrunpkg will never force a package to halt.
Keep in mind that you do not have to set priority, even when one or more packages
depend on another. The default value, no_priority, may often result in the behavior
you want. For example, if pkg1 depends on pkg2, and priority is set to no_priority
for both packages, and other parameters such as node_name and auto_run are set as
recommended in this section, then pkg1 will normally follow pkg2 to wherever both
can run, and this is the common-sense (and may be the most desirable) outcome.
If pkg1 depends on pkg2, and pkg1’s priority is higher than pkg2’s, pkg1’s node
order dominates. Assuming pkg1’s node order is node1, node2, node3, then:
• On startup:
— pkg1 will select node1 to start on.
— pkg2 will start on node1, provided it can run there (no matter where node1
appears on pkg2’s node_name list).
◦ If pkg2 is already running on another node, it will be dragged to node1,
provided it can run there.
— If pkg2 cannot start on node1, then both packages will attempt to start on
node2 (and so on).
Note that the nodes will be tried in the order of pkg1’s node_name list, and pkg2
will be dragged to the first suitable node on that list whether or not it is currently
running on another node.
• On failover:
— If pkg1 fails on node1, pkg1 will select node2 to fail over to (or node3 if it
can run there and node2 is not available or does not meet all of its dependencies;
etc.)
— pkg2 will be dragged to whatever node pkg1 has selected, and restart there;
then pkg1 will restart there.
• On failback:
— If both packages have moved to node2 and node1 becomes available, pkg1
will fail back to node1 if both packages can run there;
◦ otherwise, neither package will fail back.
Extended Dependencies
To the capabilities provided by Simple Dependencies (page 126), extended dependencies
add the following:
• You can specify whether the package depended on must be running or must be
down.
You define this condition by means of the dependency_condition, using one of the
literals UP or DOWN (the literals can be upper or lower case). We'll refer to the
requirement that another package be down as an exclusionary dependency; see
“Rules for Exclusionary Dependencies” (page 133).
• You can specify where the dependency_condition must be satisfied: on the same
node, a different node, all nodes, or any node in the cluster.
You define this by means of the dependency_location parameter (page 211), using
one of the literals same_node, different_node, all_nodes, or any_node.
IMPORTANT: If you have not already done so, read the discussion of Simple
Dependencies (page 126) before you go on.
The interaction of the legal values of dependency_location and dependency_condition creates
the following possibilities:
• Same-node dependency: a package can require that another package be UP on the
same node.
This is the case covered in the section on Simple Dependencies (page 126).
• Different-node dependency: a package can require that another package be UP
on a different node.
• Any-node dependency: a package can require that another package be UP on any
node in the cluster.
• Same-node exclusion: a package can require that another package be DOWN on
the same node. (But this does not prevent that package from being UP on another
node.)
• All-nodes exclusion: a package can require that another package be DOWN on all
nodes in the cluster.
Simple Method
Use this method if you simply want to control the number of packages that can run on
a given node at any given time. This method works best if all the packages consume
about the same amount of computing resources.
If you need to make finer distinctions between packages in terms of their resource
consumption, use the Comprehensive Method (page 137) instead.
To implement the simple method, use the reserved keyword package_limit to define
each node's capacity. In this case, Serviceguard will allow you to define only this single
type of capacity, and corresponding package weight, in this cluster. Defining package
weight is optional; for package_limit it will default to 1 for all packages, unless you
change it in the package configuration file.
Example 1
For example, to configure a node to run a maximum of ten packages at any one time,
make the following entry under the node's NODE_NAME entry in the cluster
configuration file:
NODE_NAME node1
...
CAPACITY_NAME package_limit
CAPACITY_VALUE 10
Now all packages will be considered equal in terms of their resource consumption,
and this node will never run more than ten packages at one time. (You can change this
behavior if you need to by modifying the weight for some or all packages, as the next
example shows.) Next, define the CAPACITY_NAME and CAPACITY_VALUE
parameters for the remaining nodes, setting CAPACITY_NAME to package_limit
NOTE: Serviceguard does not require you to define a capacity for each node. If you
define the CAPACITY_NAME and CAPACITY_VALUE parameters for some nodes but
not for others, the nodes for which these parameters are not defined are assumed to
have limitless capacity; in this case, those nodes would be able to run any number of
eligible packages at any given time.
If some packages consume more resources than others, you can use the weight_name
and weight_value parameters to override the default value (1) for some or all packages.
For example, suppose you have three packages, pkg1, pkg2, and pkg3. pkg2 is about
twice as resource-intensive as pkg3 which in turn is about one-and-a-half times as
resource-intensive as pkg1. You could represent this in the package configuration files
as follows:
• For pkg1:
weight_name package_limit
weight_value 2
• For pkg2:
weight_name package_limit
weight_value 6
• For pkg3:
weight_name package_limit
weight_value 3
Now node1, which has a CAPACITY_VALUE of 10 for the reserved CAPACITY_NAME
package_limit, can run any two of the packages at one time, but not all three. If in
addition you wanted to ensure that the larger packages, pkg2 and pkg3, did not run
on node1 at the same time, you could raise the weight_value of one or both so that the
combination exceeded 10 (or reduce node1's capacity to 8).
Comprehensive Method
Use this method if the Simple Method (page 135) does not meet your needs. (Make sure
you have read that section before you proceed.) The comprehensive method works
best if packages consume differing amounts of computing resources, so that simple
one-to-one comparisons between packages are not useful.
IMPORTANT: You cannot combine the two methods. If you use the reserved capacity
package_limit for any node, Serviceguard will not allow you to define any other
type of capacity and weight in this cluster; so you are restricted to the Simple Method
in that case.
Defining Capacities
Begin by deciding what capacities you want to define; you can define up to four different
capacities for the cluster.
You may want to choose names that have common-sense meanings, such as “processor”,
“memory”, or “IO”, to identify the capacities, but you do not have to do so. In fact it
could be misleading to identify single resources, such as “processor”, if packages really
contend for sets of interacting resources that are hard to characterize with a single
name. In any case, the real-world meanings of the names you assign to node capacities
and package weights are outside the scope of Serviceguard. Serviceguard simply
Example 2
To define these capacities, and set limits for individual nodes, make entries such as the
following in the cluster configuration file:
CLUSTER_NAME cluster_23
...
NODE_NAME node1
...
CAPACITY_NAME A
CAPACITY_VALUE 80
CAPACITY_NAME B
NOTE: You do not have to define capacities for every node in the cluster. If any
capacity is not defined for any node, Serviceguard assumes that node has an infinite
amount of that capacity. In our example, not defining capacity A for a given node would
automatically mean that node could run pkg1 and pkg2 at the same time no matter
what A weights you assign those packages; not defining capacity B would mean the
node could run pkg3 and pkg4 at the same time; and not defining either one would
mean the node could run all four packages simultaneously.
When you have defined the nodes' capacities, the next step is to configure the package
weights; see “Defining Weights”.
Defining Weights
Package weights correspond to node capacities, and for any capacity/weight pair,
CAPACITY_NAME and weight_name must be identical.
You define weights for individual packages in the package configuration file, but you
can also define a cluster-wide default value for a given weight, and, if you do, this
default will specify the weight of all packages that do not explicitly override it in their
package configuration file.
NOTE: There is one exception: system multi-node packages cannot have weight, so
a cluster-wide default weight does not apply to them.
Example 3
WEIGHT_NAME A
WEIGHT_DEFAULT 20
NOTE: Option 4 means that the package is “weightless” as far as this particular
capacity is concerned, and can run even on a node on which this capacity is completely
consumed by other packages.
(You can make a package “weightless” for a given capacity even if you have defined
a cluster-wide default weight; simply set the corresponding weight to zero in the
package's cluster configuration file.)
Pursuing the example started under “Defining Capacities” (page 137), we can now use
options 1 and 2 to set weights for pkg1 through pkg4.
Example 4
In pkg1's package configuration file:
weight_name A
weight_value 60
In pkg2's package configuration file:
weight_name A
weight_value 40
In pkg3's package configuration file:
IMPORTANT: weight_name in the package configuration file must exactly match the
corresponding CAPACITY_NAME in the cluster configuration file. This applies to case
as well as spelling: weight_name a would not match CAPACITY_NAME A.
You cannot define a weight unless the corresponding capacity is defined: cmapplyconf
will fail if you define a weight in the package configuration file and no node in the
package's node_name list (page 205) has specified a corresponding capacity in the cluster
configuration file; or if you define a default weight in the cluster configuration file and
no node in the cluster specifies a capacity of the same name.
• Node capacity is defined in the cluster configuration file, via the CAPACITY_NAME
and CAPACITY_VALUE parameters.
• Capacities can be added, changed, and deleted while the cluster is running. This
can cause some packages to be moved, or even halted and not restarted.
• Package weight can be defined in cluster configuration file, via the WEIGHT_NAME
and WEIGHT_DEFAULT parameters, or in the package configuration file, via the
weight_name and weight_value parameters, or both.
• Weights can be assigned (and WEIGHT_DEFAULTs, apply) only to multi-node
packages and to failover packages whose failover_policy (page 209) is
configured_node and whose failback_policy (page 209) is manual.
• If you define weight (weight_name and weight_value) for a package, make sure you
define the corresponding capacity (CAPACITY_NAME and CAPACITY_VALUE)
in the cluster configuration file for at least one node on the package's node_name
list (page 205). Otherwise cmapplyconf will fail when you try to apply the package.
• Weights (both cluster-wide WEIGHT_DEFAULTs, and weights defined in the
package configuration files) can be changed while the cluster is up and the packages
are running. This can cause some packages to be moved, or even halted and not
restarted.
Example 1
• pkg1 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 10. It is down and has switching disabled.
• pkg2 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 20. It is running on node turkey and has switching enabled.
• turkey and griffon can run one package each (package_limit is set to 1).
If you enable switching for pkg1, Serviceguard will halt the lower-priority pkg2 on
turkey. It will then start pkg1 on turkey and restart pkg2 on griffon.
If neither pkg1 nor pkg2 had priority, pkg2 would continue running on turkey and
pkg1 would run on griffon.
Example 2
• pkg1 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 10. It is running on node turkey and has switching enabled.
• pkg2 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 20. It is running on node turkey and has switching enabled.
• pkg3 is configured to run on nodes turkey and griffon. It has a weight of 1
and a priority of 30. It is down and has switching disabled.
• pkg3 has a same_node dependency on pkg2
• turkey and griffon can run two packages each (package_limit is set to 2).
If you enable switching for pkg3, it will stay down because pkg2, the package it depends
on, is running on node turkey, which is already running two packages (its capacity
limit). pkg3 has a lower priority than pkg2, so it cannot drag it to griffon where
they both can run.
NOTE: In the case of the validate entry point, exit values 1 and 2 are treated the
same; you can use either to indicate that validation failed.
The script can make use of a standard set of environment variables (including the
package name, SG_PACKAGE, and the name of the local node, SG_NODE) exported by
the package manager or the master control script that runs the package; and can also
call a function to source in a logging function and other utility functions. One of these
functions, sg_source_pkg_env(), provides access to all the parameters configured
for this package, including package-specific environment variables configured via the
pev_ parameter (page 219).
NOTE: Some variables, including SG_PACKAGE, and SG_NODE, are available only at
package run and halt time, not when the package is validated. You can use
SG_PACKAGE_NAME at validation time as a substitute for SG_PACKAGE.
For more information, see the template in $SGCONF/examples/
external_script.template.
A sample script follows. It assumes there is another script called monitor.sh, which
will be configured as a Serviceguard service to monitor some application. The
monitor.sh script (not included here) uses a parameter
PEV_MONITORING_INTERVAL, defined in the package configuration file, to
periodically poll the application it wants to monitor; for example:
function validate_command
{
typeset -i ret=0
typeset -i i=0
typeset -i found=0
# check PEV_ attribute is configured and within limits
if [[ -z PEV_MONITORING_INTERVAL ]]
then
sg_log 0 "ERROR: PEV_MONITORING_INTERVAL attribute not configured!"
ret=1
elif (( PEV_MONITORING_INTERVAL < 1 ))
then
sg_log 0 "ERROR: PEV_MONITORING_INTERVAL value ($PEV_MONITORING_INTERVAL) not within legal
limits!"
ret=1
fi
# check monitoring service we are expecting for this package is configured
while (( i < ${#SG_SERVICE_NAME[*]} ))
do
case ${SG_SERVICE_CMD[i]} in
*monitor.sh*) # found our script
found=1
break
;;
*)
;;
esac
(( i = i + 1 ))
done
if (( found == 0 ))
then
sg_log 0 "ERROR: monitoring service not configured!"
ret=1
fi
if (( ret == 1 ))
then
sg_log 0 "Script validation for $SG_PACKAGE_NAME failed!"
fi
return $ret
}
function stop_command
{
sg_log 5 "stop_command"
# log current PEV_MONITORING_INTERVAL value, PEV_ attribute can be changed
# while the package is running
sg_log 0 "PEV_MONITORING_INTERVAL for $SG_PACKAGE_NAME is $PEV_MONITORING_INTERVAL"
return 0
}
typeset -i exit_val=0
case ${1} in
start)
start_command $*
exit_val=$?
;;
stop)
stop_command $*
exit_val=$?
;;
validate)
validate_command $*
exit_val=$?
;;
*)
sg_log 0 "Unknown entry point $1"
;;
esac
exit $exit_val
last_halt_failed Flag
cmviewcl -v -f line displays a last_halt_failed flag.
NOTE: last_halt_failed appears only in the line output of cmviewcl, not the
default tabular format; you must use the -f line option to see it.
The value of last_halt_failed is no if the halt script ran successfully, or has not
run since the node joined the cluster, or has not run since the package was configured
to run on the node; otherwise it is yes.
NOTE: This section provides an example for a modular package; for legacy packages,
see “Configuring Cross-Subnet Failover” (page 269).
Suppose that you want to configure a package, pkg1, so that it can fail over among all
the nodes in a cluster comprising NodeA, NodeB, NodeC, and NodeD.
NodeA and NodeB use subnet 15.244.65.0, which is not used by NodeC and NodeD;
and NodeC and NodeD use subnet 15.244.56.0, which is not used by NodeA and
NodeB. (See “Obtaining Cross-Subnet Information” (page 180) for sample cmquerycl
output).
Configuring node_name
First you need to make sure that pkg1 will fail over to a node on another subnet only
if it has to. For example, if it is running on NodeA and needs to fail over, you want it
to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing
over to NodeC or NodeD.
Assuming nodeA is pkg1’s primary node (where it normally starts), create node_name
entries in the package configuration file as follows:
node_name nodeA
node_name nodeB
node_name nodeC
node_name nodeD
Configuring monitored_subnet_access
In order to monitor subnet 15.244.65.0 or 15.244.56.0, depending on where
pkg1 is running, you would configure monitored_subnet and monitored_subnet_access
in pkg1’s package configuration file as follows:
monitored_subnet 15.244.65.0
monitored_subnet_access PARTIAL
monitored_subnet 15.244.56.0
monitored_subnet_access PARTIAL
Configuring ip_subnet_node
Now you need to specify which subnet is configured on which nodes. In our example,
you would do this by means of entries such as the following in the package configuration
file:
ip_subnet 15.244.65.0
ip_subnet_node nodeA
ip_subnet_node nodeB
ip_address 15.244.65.82
ip_address 15.244.65.83
ip_subnet 15.244.56.0
ip_subnet_node nodeC
ip_subnet_node nodeD
ip_address 15.244.56.100
ip_address 15.244.56.101
NOTE: For more information and advice, see the white paper Securing Serviceguard
at http://docs.hp.com -> High Availability -> Serviceguard ->
White Papers.
NOTE: When you upgrade a cluster from Version A.11.15 or earlier, entries in
$SGCONF/cmclnodelist are automatically updated to Access Control Policies in the
cluster configuration file. All non-root user-hostname pairs are assigned the role of
Monitor.
About identd
HP strongly recommends that you use identd for user verification, so you should
make sure that each prospective cluster node is configured to run it. identd is usually
started from /etc/init.d/xinetd.
(It is possible to disable identd, though HP recommends against doing so. If for some
reason you have to disable identd, see “Disabling identd” (page 194).)
For more information about identd, see the white paper Securing Serviceguard at
http://docs.hp.com -> High Availability -> Serviceguard -> White
Papers, and the identd manpage.
For example, consider a two node cluster (gryf and sly) with two private subnets
and a public subnet. These nodes will be granting access by a non-cluster node (bit)
which does not share the private subnets. The /etc/hosts file on both cluster nodes
should contain:
15.145.162.131 gryf.uksr.hp.com gryf
10.8.0.131 gryf.uksr.hp.com gryf
10.8.1.131 gryf.uksr.hp.com gryf
NOTE: Serviceguard recognizes only the hostname (the first element) in a fully
qualified domain name (a name like those in the example above). This means, for
example, that gryf.uksr.hp.com and gryf.cup.hp.com cannot be nodes in the
same cluster, as Serviceguard would see them as the same host gryf.
If applications require the use of hostname aliases, the Serviceguard hostname must
be one of the aliases in all the entries for that host. For example, if the two-node cluster
in the previous example were configured to use the alias hostnames alias-node1
and alias-node2, then the entries in /etc/hosts should look something like this:
15.145.162.131 gryf.uksr.hp.com gryf1 alias-node1
10.8.0.131 gryf2.uksr.hp.com gryf2 alias-node1
10.8.1.131 gryf3.uksr.hp.com gryf3 alias-node1
15.145.162.132 sly.uksr.hp.com sly1 alias-node2
10.8.0.132 sly2.uksr.hp.com sly2 alias-node2
10.8.1.132 sly3.uksr.hp.com sly3 alias-node2
NOTE: If such a hang or error occurs, Serviceguard and all protected applications
will continue working even though the command you issued does not. That is, only
the Serviceguard configuration commands (and corresponding Serviceguard Manager
functions) are affected, not the cluster daemon or package services.
The procedure that follows shows how to create a robust name-resolution configuration
that will allow cluster nodes to continue communicating with one another if a name
service fails.
1. Edit the /etc/hosts file on all nodes in the cluster. Add name resolution for all
heartbeat IP addresses, and other IP addresses from all the cluster nodes; see
“Configuring Name Resolution” (page 156) for discussion and examples.
NOTE: For each cluster node, the public-network IP address must be the first
address listed. This enables other applications to talk to other nodes on public
networks.
2. If you are using DNS, make sure your name servers are configured in /etc/
resolv.conf, for example:
domain cup.hp.com
search cup.hp.com hp.com
nameserver 15.243.128.51
nameserver 15.243.160.51
3. Edit or create the /etc/nsswitch.conf file on all nodes and add the following
text, if it does not already exist:
• for DNS, enter (two lines):
hosts: files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return UNAVAIL=return]
ipnodes: files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return
UNAVAIL=return]
If a line beginning with the string hosts: or ipnodes: already exists, then make
sure that the text immediately to the right of this string is (on one line):
files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return UNAVAIL=return]
or
files [NOTFOUND=continue UNAVAIL=continue] nis [NOTFOUND=return UNAVAIL=return]
This step is critical, allowing the cluster nodes to resolve hostnames to IP addresses
while DNS, NIS, or the primary LAN is down.
4. Create a $SGCONF/cmclnodelist file on all nodes that you intend to configure
into the cluster, and allow access by all cluster nodes. See “Allowing Root Access
to an Unconfigured Node” (page 155).
NOTE: HP recommends that you also make the name service itself highly available,
either by using multiple name servers or by configuring the name service into a
Serviceguard package.
NOTE: HP recommends that you do the bonding configuration from the system
console, because you will need to restart networking from the console when the
configuration is done.
Sample Configuration
Configure the following files to support LAN redundancy. For a single failover only
one bond is needed.
1. Create a bond0 file, ifcfg-bond0.
Create the configuration in the /etc/sysconfig/network-scripts directory.
For example, in the file, ifcfg-bond0, bond0 is defined as the master (for your
installation, substitute the appropriate values for your network instead of
192.168.1.1).
Include the following information in the ifcfg-bond0 file:
DEVICE=bond0
IPADDR=192.168.1.1
NETMASK=255.255.255.0
NETWORK=192.168.1.0
BROADCAST=192.168.1.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
For Red Hat 5 only, add the following line to the ifcfg-bond0file:
BONDING OPTS=’miimon=100 mode=1’
2. Create an ifcfg-ethn file for each interface in the bond. All interfaces should
have SLAVE and MASTER definitions. For example, in a bond that uses eth0 and
eth1, edit the ifcfg-eth0 file to appear as follows:
DEVICE=eth0
USERCTL=no
NOTE: During configuration, you need to make sure that the active slaves for the
same bond on each node are connected the same hub or switch. You can check on this
by examining the file /proc/net/bonding/bond<x>/info on each node. This file
will show the active slave for bond x.
Restarting Networking
Restart the networking subsystem. From the console of either node in the cluster, execute
the following command on a Red Hat system:
/etc/rc.d/init.d/network restart
NOTE: It is better not to restart the network from outside the cluster subnet, as there
is a chance the network could go down before the command can complete.
The command prints bringing up network statements.
If there was an error in any of the bonding configuration files, the network might not
function properly. If this occurs, check each configuration file for errors, then try to
restart the network again.
NOTE: Use ifconfig to find the relationship between eth IDs and the MAC
addresses.
For more networking information on bonding, see
/usr/src/linux<kernel_version>/Documentation/networking/bonding.txt.
Restarting Networking
Restart the networking subsystem. From the console of any node in the cluster, execute
the following command on a SUSE system:
NOTE: It is better not to restart the network from outside the cluster subnet, as there
is a chance the network could go down before the command can complete.
If there is an error in any of the bonding configuration files, the network may not
function properly. If this occurs, check each configuration file for errors, then try to
start the network again.
The following example of the fdisk dialog shows that the disk on the device file /dev/
sdc is set to Smart Array type partition, and appears as follows:
fdisk /dev/sdc
Command (m for help): n
Partition number (1-4): 1
HEX code (type L to list codes): 83
Command (m for help): 1
Command (m for help): 1
To transfer the disk partition format to other nodes in the cluster use the command:
sfdisk -R <device>
where <device> corresponds to the same physical device as on the first node. For
example, if /dev/sdc is the device name on the other nodes use the command:
sfdisk -R /dev/sdc
You can check the partition table by using the command:
fdisk -l /dev/sdc
NOTE: fdisk may not be available for SUSE on all platforms. In this case, using
YAST2 to set up the partitions is acceptable.
CAUTION: The minor numbers used by the LVM volume groups must be the same
on all cluster nodes. This means that if there are any non-shared volume groups in the
cluster, create the same number of them on all nodes, and create them before you define
the shared storage. If possible, avoid using private volume groups, especially LVM
boot volumes. Minor numbers increment with each logical volume, and mismatched
numbers of logical volumes between nodes can cause a failure of LVM (and boot, if
you are using an LVM boot volume).
NOTE: Except as noted in the sections that follow, you perform the LVM configuration
of shared storage on only one node. The disk partitions will be visible on other nodes
as soon as you reboot those nodes. After you’ve distributed the LVM configuration to
all the cluster nodes, you will be able to use LVM commands to switch volume groups
between nodes. (To avoid data corruption, a given volume group must be active on
only one node at a time).
For multipath information, see “Multipath for Storage ” (page 96).
In this example, the disk described by device file /dev/sda has already been partitioned
for Linux, into partitions named /dev/sda1 - /dev/sda7. The second internal device
/dev/sdb and the two external devices /dev/sdc and /dev/sdd have not been
partitioned.
NOTE: fdisk may not be available for SUSE on all platforms. In this case, using
YAST2 to set up the partitions is acceptable.
Creating Partitions
You must define a partition on each disk device (individual disk or LUN in an array)
that you want to use for your shared storage. Use the fdisk command for this.
The following steps create the new partition:
1. Run fdisk, specifying your device file name in place of <DeviceName>:
# fdisk <DeviceName>
Respond to the prompts as shown in the following table, to define a partition:
4. First cylinder (1-nn, default Enter Accept the default starting cylinder
1): 1
5. Last cylinder or +size or +sizeM Enter Accept the default, which is the last
or +sizeK (1-nn, default nn): cylinder number
The following example of the fdisk dialog shows that the disk on the device file
/dev/sdc is configured as one partition, and appears as follows:
fdisk /dev/sdc
Command (m for help): n
Command action
e extended
p primary partition (1-4) p
Partition number (1-4): 1
First cylinder (1-4067, default 1): Enter
Using default value 1Last cylinder or +size or +sizeM or +sizeK (1-4067, default 4067): Enter
Using default value 4067
2. Respond to the prompts as shown in the following table to set a partition type:
The following example of the fdisk dialog describes that the disk on the device
file /dev/sdc is set to Smart Array type partition, and appears as follows:
fdisk /dev/sdc
Command (m for help): t
Partition number (1-4): 1
HEX code (type L to list codes): 8e
NOTE: fdisk may not be available for SUSE on all platforms. In this case, using
YAST2 to set up the partitions is acceptable.
NOTE: At this point, the setup for volume-group activation protection is complete.
Serviceguard adds a tag matching the uname -n value of the owning node to
each volume group defined for a package when the package runs and deletes the
tag when the package halts. The command vgs -o +tags vgname will display
any tags that are set for a volume group.
The sections that follow take you through the process of configuring volume groups
and logical volumes, and distributing the shared configuration. When you have
finished that process, use the procedure under “Testing the Shared Configuration”
(page 173) to verify that the setup has been done correctly.
Building Volume Groups: Example for Smart Array Cluster Storage (MSA 2000 Series)
NOTE: For information about setting up and configuring the MSA 2000 for use with
Serviceguard, see the HP Serviceguard for Linux Version A.11.19 Deployment Guide on
docs.hp.com under High Availability —> Serviceguard for Linux —>
White Papers.
Use Logical Volume Manager (LVM) on your system to create volume groups that can
be activated by Serviceguard packages. This section provides an example of creating
Volume Groups on LUNs created on MSA 2000 Series storage. For more information
on LVM, see the Logical Volume Manager How To, which you can find at http://
tldp.org/HOWTO/HOWTO-INDEX/howtos.html
Before you start, partition your LUNs and label them with a partition type of 8e (Linux
LVM). Use the type t parameter of the fdisk command to change from the default
of 83 (Linux).
Do the following on one node:
1. Update the LVM configuration and create the /etc/lvmtab file. You can omit
this step if you have previously created volume groups on this node.
vgscan
NOTE: The files /etc/lvmtab and /etc/lvmtab.d may not exist on some
distributions. In that case, ignore references to these files.
5. To test that the file system /extra was created correctly and with high availability,
you can create a file on it, and read it.
echo "Test of LVM" >> /extra/LVM-test.conf
cat /extra/LVM-test.conf
NOTE: Be careful if you use YAST or YAST2 to configure volume groups, as that
may cause all volume groups on that system to be activated. After running YAST
or YAST2, check to make sure that volume groups for Serviceguard packages not
currently running have not been activated, and use LVM commands to deactivate
any that have. For example, use the command vgchange -a n /dev/sgvg00
to deactivate the volume group sgvg00.
NOTE: The minor numbers used by the LVM volume groups must be the same on
all cluster nodes. They will if all the nodes have the same number of unshared volume
groups.
To distribute the shared configuration, follow these steps:
1. Unmount and deactivate the volume group, and remove the tag if necessary. For
example, to deactivate only vgpkgA:
umount /extra
vgchange -a n vgpkgA
vgchange --deltag $(uname -n) vgpkgA
2. To get the node ftsys10 to see the new disk partitioning that was done on
ftsys9, reboot:
reboot
3. Run vgscan to make the LVM configuration visible on the new node and to create
the LVM database on/etc/lvmtab and /etc/lvmtab.d. For example, on
ftsys10:
vgscan
vgchange -a y vgpkgB
mount /dev/vgpkgB/lvol1 /extra
echo ‘Written by’ ‘hostname‘ ‘on’ ‘date‘ > /extra/datestamp
cat /extra/datestamp
You should see something like the following, showing the date stamp written by
the other node:
Written by ftsys9.mydomain on Mon Jan 22 14:23:44 PST 2006
Now unmount the volume group again:
umount /extra
vgchange -a n vgpkgB
vgchange --deltag $(uname -n) vgpkgB
NOTE: You do not need to perform these actions if you have implemented
volume-group activation protection as described under “Enabling Volume Group
Activation Protection” (page 169).
SUSE Linux Enterprise Server
Prevent a vgscan at boot time by removing the /etc/rc.d/boot.d/S07boot.lvm
file from all cluster nodes.
NOTE: Be careful if you use YAST or YAST2 to configure volume groups, as that may
cause all volume groups to be activated. After running YAST or YAST2, check that
volume groups for Serviceguard packages not currently running have not been activated,
and use LVM commands to deactivate any that have. For example, use the command
vgchange -a n /dev/sgvg00 to deactivate the volume group sgvg00.
Red Hat
It is not necessary to prevent vgscan on Red Hat.
To deactivate any volume groups that will be under Serviceguard control, add
vgchange commands to the end of /etc/rc.d/rc.sysinit; for example, if volume
groups sgvg00 and sgvg01 are under Serviceguard control, add the following lines
to the end of the file:
vgchange -a n /dev/sgvg00
vgchange -a n /dev/sgvg01
The vgchange commands activate the volume groups temporarily, then deactivate
them; this is expected behavior.
NOTE: HP strongly recommends that you modify the file so as to send heartbeat over
all possible networks.
The manpage for the cmquerycl command further explains the parameters that appear
in this file. Many are also described in Chapter 4: “Planning and Documenting an HA
Cluster ” (page 93). Modify your /etc/cmcluster/clust1.configfile as needed.
cmquerycl Options
Speeding up the Process
In a larger or more complex cluster with many nodes, networks or disks, the cmquerycl
command may take several minutes to complete. To speed up the configuration process,
you can direct the command to return selected information only by using the -k and
-w options:
-k eliminates some disk probing, and does not return information about potential
cluster lock volume groups and lock physical volumes.
-w local lets you specify local network probing, in which LAN connectivity is verified
between interfaces within each node only. This is the default when you use cmquerycl
with the-C option.
(Do not use -w local if you need to discover nodes and subnets for a cross-subnet
configuration; see “Full Network Probing”.)
-w none skips network querying. If you have recently checked the networks, this
option will save time.
NOTE: This option must be used to discover actual or potential nodes and subnets
in a cross-subnet configuration. See “Obtaining Cross-Subnet Information” (page 180).
It will also validate IP Monitor polling targets; see “Monitoring LAN Interfaces and
Detecting Failure: IP Level” (page 78), and POLLING_TARGET under “Cluster
Configuration Parameters ” (page 105).
IMPORTANT: The following are standard instructions. For special instructions that
may apply to your version of Serviceguard and the Quorum Server see “Configuring
Serviceguard to Use the Quorum Server” in the latest version HP Serviceguard Quorum
Server Version A.04.00 Release Notes, at http://www.docs.hp.com -> High
Availability -> Quorum Server.
A cluster lock LUN or quorum server, is required for two-node clusters. To obtain a
cluster configuration file that includes Quorum Server parameters, use the -q option
of the cmquerycl command, specifying a Quorum Server hostname or IP address, for
example (all on one line):
cmquerycl -q <QS_Host> -n ftsys9 -n ftsys10 -C <ClusterName>.conf
IP subnets:
IPv4:
IPv6:
1 15.13.164.0
15.13.172.0
2 15.13.165.0
15.13.182.0
3 15.244.65.0
4 15.244.56.0
In the Route connectivity section, the numbers on the left (1-4) identify which
subnets are routed to each other (for example 15.13.164.0 and 15.13.172.0).
NOTE: Remember to tune kernel parameters on each node to ensure that they are set
high enough for the largest number of packages that will ever run concurrently on that
node.
Levels of Access
Serviceguard recognizes two levels of access, root and non-root:
• Root access: Full capabilities; only role allowed to configure the cluster.
As Figure 5-1 shows, users with root access have complete control over the
configuration of the cluster and its packages. This is the only role allowed to use
the cmcheckconf, cmapplyconf, cmdeleteconf, and cmmodnet -a
commands.
In order to exercise this Serviceguard role, you must log in as the root user
(superuser) on a node in the cluster you want to administer. Conversely, the root
user on any node in the cluster always has full Serviceguard root access privileges
for that cluster; no additional Serviceguard configuration is needed to grant these
privileges.
IMPORTANT: A remote user (one who is not logged in to a node in the cluster,
and is not connecting via rsh or ssh) can have only Monitor access to the cluster.
(Full Admin and Package Admin can be configured for such a user, but this usage
is deprecated. As of Serviceguard A.11.18 configuring Full Admin or Package
Admin for remote users gives them Monitor capabilities. See “Setting up
Access-Control Policies” (page 185) for more information.)
NOTE: Once nodes are configured into a cluster, the access-control policies you set
in the cluster and package configuration files govern cluster-wide security; changes to
the “bootstrap” cmclnodelist file are ignored (see “Allowing Root Access to an
Unconfigured Node” (page 155)).
Access control policies are defined by three parameters in the configuration file:
• Each USER_NAME can consist either of the literal ANY_USER, or a maximum of
8 login names from the /etc/passwd file on USER_HOST. The names must be
separated by spaces or tabs, for example:
# Policy 1:
USER_NAME john fred patrick
USER_HOST bit
USER_ROLE PACKAGE_ADMIN
• USER_HOST is the node where USER_NAME will issue Serviceguard commands.
NOTE: The commands must be issued on USER_HOST but can take effect on
other nodes; for example patrick can use bit’s command line to start a package
on gryf (assuming bit and gryf are in the same cluster).
Choose one of these three values for USER_HOST:
— ANY_SERVICEGUARD_NODE - any node on which Serviceguard is configured,
and which is on a subnet with which nodes in this cluster can communicate
(as reported bycmquerycl -w full).
NOTE: You do not have to halt the cluster or package to configure or modify access
control policies.
Here is an example of an access control policy:
USER_NAME john
USER_HOST bit
USER_ROLE PACKAGE_ADMIN
If this policy is defined in the cluster configuration file, it grants user john the
PACKAGE_ADMIN role for any package on node bit. User john also has the MONITOR
role for the entire cluster, because PACKAGE_ADMIN includes MONITOR. If the policy
is defined in the package configuration file for PackageA, then user john on node bit
has the PACKAGE_ADMIN role only for PackageA.
Role Conflicts
Do not configure different roles for the same user and host; Serviceguard treats this as
a conflict and will fail with an error when applying the configuration. “Wildcards”,
such as ANY_USER and ANY_SERVICEGUARD_NODE, are an exception: it is acceptable
for ANY_USER and john to be given different roles.
IMPORTANT: Wildcards do not degrade higher-level roles that have been granted to
individual members of the class specified by the wildcard. For example, you might set
up the following policy to allow root users on remote systems access to the cluster:
USER_NAME root
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE MONITOR
This does not reduce the access level of users who are logged in as root on nodes in this
cluster; they will always have full Serviceguard root-access capabilities.
Consider what would happen if these entries were in the cluster configuration file:
# Policy 1:
USER_NAME john
USER_HOST bit
USER_ROLE PACKAGE_ADMIN
# Policy 2:
USER_NAME john
USER_HOST bit
USER_ROLE MONITOR
# Policy 3:
USER_NAME ANY_USER
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE MONITOR
In the above example, the configuration would fail because user john is assigned two
roles. (In any case, Policy 2 is unnecessary, because PACKAGE_ADMIN includes the role
of MONITOR.)
Policy 3 does not conflict with any other policies, even though the wildcard ANY_USER
includes the individual user john.
NOTE: Using the -k option means that cmcheckconf only checks disk connectivity
to the LVM disks that are identified in the cluster configuration file. Omitting the -k
option (the default behavior) means that cmcheckconf tests the connectivity of all
LVM disks on all nodes. Using -k can result in significantly faster operation of the
command.
AUTOSTART_CMCLD=1
NOTE: The /sbin/init.d/cmcluster file may call files that Serviceguard stores
in$SGCONF/rc. (See “Understanding the Location of Serviceguard Files” (page 153)
for information about Serviceguard directories on different Linux distributions.) This
directory is for Serviceguard use only! Do not move, delete, modify, or add files in this
directory.
Single-Node Operation
Single-node operation occurs in a single-node cluster, or in a multi-node cluster in
which all but one node has failed, or in which you have shut down all but one node,
which will probably have applications running. As long as the Serviceguard daemon
cmcld is active, other nodes can rejoin the cluster at a later time.
CAUTION: But you should not try to restart Serviceguard; data corruption might
occur if another node were to attempt to start up a new instance of an application that
is still running on the single node. Instead, choose an appropriate time to shut down
and reboot the node. This will allow the applications to shut down and Serviceguard
to restart the cluster after the reboot.
Disabling identd
Ignore this section unless you have a particular need to disable identd.
You can configure Serviceguard not to use identd.
CAUTION: This is not recommended. Consult the white paper Securing Serviceguard
at http://docs.hp.com -> High Availability -> Serviceguard ->
White Papers for more information.
If you must disable identd, do the following on each node after installing Serviceguard
but before each node rejoins the cluster (e.g. before issuing a cmrunnode or cmruncl).
For Red Hat and SUSE:
1. Change the value of the server_args parameter in the file /etc/xinetd.d/
hacl-cfg from -c to -c -i
2. Change the server_args parameter in the /etc/xinetd.d/hacl-probe file to
include the value -i
On a SUSE system, change
server_args = -f /opt/cmom/log/cmomd.log -r /opt/cmom/run
to
server_args = -i -f /opt/cmom/log/cmomd.log -r /opt/cmom/run
On A Red Hat system, change
server_args = -f /user/local/cmom/log/cmomd.log -r
/user/local/cmom/run
to
NOTE: The cmdeleteconf command removes only the cluster binary file $SGCONF/
cmclconfig. It does not remove any other files from the $SGCONF directory.
Although the cluster must be halted, all nodes in the cluster should be powered up
and accessible before you use the cmdeleteconf command. If a node is powered
down, power it up and allow it to boot. If a node is inaccessible, you will see a list of
inaccessible nodes and the following message:
Checking current status
cmdeleteconf: Unable to reach node lptest1.
WARNING: Once the unreachable node is up, cmdeleteconf
should be executed on the node to remove the configuration.
197
NOTE: This is a new process for configuring packages, as of Serviceguard A.11.18.
This manual refers to packages created by this method as modular packages, and
assumes that you will use it to create new packages.
Packages created using Serviceguard A.11.16 or earlier are referred to as legacy
packages. If you need to reconfigure a legacy package (rather than create a new package),
see “Configuring a Legacy Package” (page 262).
It is also still possible to create new legacy packages by the method described in
“Configuring a Legacy Package”. If you are using a Serviceguard Toolkit, consult the
documentation for that product.
If you decide to convert a legacy package to a modular package, see “Migrating a
Legacy Package to a Modular Package” (page 272). Do not attempt to convert
Serviceguard Toolkit packages.
(Parameters that are in the package control script for legacy packages, but in the package
configuration file instead for modular packages, are indicated by (S) in the tables under
“Optional Package Modules” (page 202).)
IMPORTANT: Before you start, you need to do the package-planning tasks described
under “Package Configuration Planning ” (page 123).
To choose the right package modules, you need to decide the following things about
the package you are creating:
• What type of package it is; see “Types of Package: Failover, Multi-Node, System
Multi-Node” (page 198).
• Which parameters need to be specified for the package (beyond those included in
the base type, which is normally failover, multi-node, or system-multi-node).
See “Package Modules and Parameters” (page 199).
When you have made these decisions, you are ready to generate the package
configuration file; see “Generating the Package Configuration File” (page 222).
IMPORTANT: Multi-node packages must either use a clustered file system such
as Red Hat GFS, or not use shared storage.
To generate a package configuration file that creates a multi-node package,
include-m sg/multi_node on the cmmakepkg command line. See “Generating
the Package Configuration File” (page 222).
• System multi-node packages. System multi-node packages are supported only
for applications supplied by HP.
For more information about types of packages and how they work, see “How the
Package Manager Works” (page 49). For information on planning a package, see
“Package Configuration Planning ” (page 123).
When you have decided on the type of package you want to create, the next step is to
decide what additional package-configuration modules you need to include; see
“Package Modules and Parameters” (page 199).
NOTE: If you are going to create a complex package that contains many modules,
you may want to skip the process of selecting modules, and simply create a configuration
file that contains all the modules:
cmmakepkg -m sg/all $SGCONF/pkg_sg_complex
(The output will be written to $SGCONF/pkg_sg_complex.)
multi_node_all all parameters that can be used by a Use if you are creating a
multi-node package; includes multi_node, multi-node package that
dependency, monitor_subnet, service, requires most or all of the
volume_group, filesystem, pev, optional parameters that are
external_pre, external, and acp available for this type of
modules. package.
NOTE: For more information, see the comments in the editable configuration file
output by the cmmakepkg command, and the cmmakepkg (1m) manpage.
If you are going to browse these explanations deciding which parameters you need,
you may want to generate and print out a configuration file that has the comments for
all of the parameters; you can create such a file as follows:
cmmakepkg -m sg/all $SGCONF/sg-all
or simply
cmmakepkg $SGCONF/sg-all
This creates a file $SGCONF/sg-all that contains all the parameters and comments.
(See “Understanding the Location of Serviceguard Files” (page 153) for the location of
$SGCONF on your version of Linux.)
More detailed instructions for running cmmakepkg are in the next section, “Generating
the Package Configuration File” (page 222).
See also “Package Configuration Planning ” (page 123).
package_name
Any name, up to a maximum of 39 characters, that:
• starts and ends with an alphanumeric character
• otherwise contains only alphanumeric characters or dot (.), dash (-), or underscore
(_)
• is unique among package names in this cluster
module_name
The module name. Do not change it. Used in the form of a relative path (for example
sg/failover) as a parameter to cmmakepkg to specify modules to be used in
configuring the package. (The files reside in the $SGCONF/modules directory; see
“Understanding the Location of Serviceguard Files” (page 153) for the location of
$SGCONF on your version of Linux.)
New for modular packages.
module_version
The module version. Do not change it.
New for modular packages.
package_type
The type can be failover, multi_node, or system multi_node. You can configure
only failover or multi-node packages; see “Types of Package: Failover, Multi-Node,
System Multi-Node” (page 198).
package_description
The application that the package runs. This is a descriptive parameter that can be set
to any value you choose, up to a maximum of 80 characters. Default value is
Serviceguard Package. New for 11.19
node_name
The node on which this package can run, or a list of nodes in order of priority, or an
asterisk (*) to indicate all nodes. The default is *. For system multi-node packages, you
must specify node_name *.
If you use a list, specify each node on a new line, preceded by the literal node_name,
for example:
node_name <node1>
node_name <node2>
node_name <node3>
The order in which you specify the node names is important. First list the primary
node name (the node where you normally want the package to start), then the first
auto_run
Can be set to yes or no. The default is yes.
For failover packages, yes allows Serviceguard to start the package (on the first available
node listed under node_name) on cluster start-up, and to automatically restart it on an
adoptive node if it fails. no prevents Serviceguard from automatically starting the
package, and from restarting it on another node.
This is also referred to as package switching, and can be enabled or disabled while the
package is running, by means of the cmmodpkg command.
auto_run should be set to yes if the package depends on another package, or is depended
on; see “About Package Dependencies” (page 126).
For system multi-node packages, auto_run must be set to yes. In the case of a multi-node
package, setting auto_run to yes allows an instance to start on a new node joining the
cluster; no means it will not.
node_fail_fast_enabled
Can be set to yes or no. The default is no.
yes means the node on which the package is running will be halted (reboot) if the
package fails; no means Serviceguard will not halt the system.
If this parameter is set to yes and one of the following events occurs, Serviceguard
will halt the system (reboot) on the node where the control script fails:
• A package subnet fails and no backup network is available
• Serviceguard is unable to execute the halt function
• The start or halt function times out
run_script_timeout
The amount of time, in seconds, allowed for the package to start; or no_timeout. The
default is no_timeout. The maximum is 4294.
If the package does not complete its startup in the time specified by run_script_timeout,
Serviceguard will terminate it and prevent it from switching to another node. In this
case, if node_fail_fast_enabled is set to yes, the node will be halted (rebooted).
If no timeout is specified (no_timeout), Serviceguard will wait indefinitely for the
package to start.
If a timeout occurs:
• Switching will be disabled.
• The current node will be disabled from running the package.
NOTE: If no_timeout is specified, and the script hangs, or takes a very long time
to complete, during the validation step (cmcheckconf (1m)), cmcheckconf will
wait 20 minutes to allow the validation to complete before giving up.
halt_script_timeout
The amount of time, in seconds, allowed for the package to halt; or no_timeout. The
default is no_timeout. The maximum is 4294.
If the package’s halt process does not complete in the time specified by
halt_script_timeout, Serviceguard will terminate the package and prevent it from
switching to another node. In this case, if node_fail_fast_enabled (page 206) is set to yes,
the node will be halted (reboot).
If a halt_script_timeout is specified, it should be greater than the sum of all the values
set for service_halt_timeout (page 216) for this package.
If a timeout occurs:
• Switching will be disabled.
• The current node will be disabled from running the package.
If a halt-script timeout occurs, you may need to perform manual cleanup. See Chapter 8:
“Troubleshooting Your Cluster” (page 281).
Choosing Package Modules 207
successor_halt_timeout
Specifies how long, in seconds, Serviceguard will wait for packages that depend on
this package to halt, before halting this package. Can be 0 through 4294, or
no_timeout. The default is no_timeout.
• no_timeout means that Serviceguard will wait indefinitely for the dependent
packages to halt.
• 0 means Serviceguard will not wait for the dependent packages to halt before
halting this package.
New as of A.11.18 (for both modular and legacy packages). See also “About Package
Dependencies” (page 126).
script_log_file
The full pathname of the package’s log file. The default
is$SGRUN/log/<package_name>.log. (See “Understanding the Location of
Serviceguard Files” (page 153) for more information about Serviceguard pathnames.)
See also log_level.
operation_sequence
Defines the order in which the scripts defined by the package’s component modules
will start up. See the package configuration file for details.
This parameter is not configurable; do not change the entries in the configuration file.
New for modular packages.
log_level
Determines the amount of information printed to stdout when the package is validated,
and to the script_log_file when the package is started and halted. Valid values are 0
through 5, but you should normally use only the first two (0 or 1); the remainder (2
through 5) are intended for use by HP Support.
• 0 - informative messages
• 1 - informative messages with slightly more detail
• 2 - messages showing logic flow
• 3 - messages showing detailed data structure information
• 4 - detailed debugging information
• 5 - function call flow
New for modular packages.
failback_policy
Specifies whether or not Serviceguard will automatically move a package that is not
running on its primary node (the first node on its node_name list) when the primary
node is once again available. Can be set to automatic or manual. The default is
manual.
• manual means the package will continue to run on the current node.
• automatic means Serviceguard will move the package to the primary node as
soon as that node becomes available, unless doing so would also force a package
with a higher priority to move.
This parameter can be set for failover packages only. If this package will depend on
another package or vice versa, see also “About Package Dependencies” (page 126).
priority
Assigns a priority to a failover package whose failover_policy is configured_node.
Valid values are 1 through 3000, or no_priority. The default is no_priority. See
also the dependency_ parameter descriptions (page 210).
priority can be used to satisfy dependencies when a package starts, or needs to fail over
or fail back: a package with a higher priority than the packages it depends on can force
those packages to start or restart on the node it chooses, so that its dependencies are
met.
If you assign a priority, it must be unique in this cluster. A lower number indicates a
higher priority, and a numerical priority is higher than no_priority. HP recommends
assigning values in increments of 20 so as to leave gaps in the sequence; otherwise you
may have to shuffle all the existing priorities when assigning priority to a new package.
dependency_name
A unique identifier for a particular dependency (see dependency_condition) that must
be met in order for this package to run (or keep running). It must be unique among
this package's dependency_names. The length and formal restrictions for the name are
the same as for package_name (page 204).
dependency_condition
The condition that must be met for this dependency to be satisfied. As of Serviceguard
A.11.18, the only condition that can be set is that another package must be running.
The syntax is: <package_name> = UP, where <package_name> is the name of the
package depended on. The type and characteristics of the current package (the one we
are configuring) impose the following restrictions on the type of package it can depend
on:
dependency_location
Specifies where the dependency_condition must be met. The only legal value is
same_node.
weight_name, weight_value
These parameters specify a weight for a package; this weight is compared to a node's
available capacity (defined by the CAPACITY_NAME and CAPACITY_VALUE
parameters in the cluster configuration file) to determine whether the package can run
there.
Both parameters are optional, but if weight_value is specified, weight_name must also
be specified, and must come first. You can define up to four weights, corresponding
to four different capacities, per cluster. To specify more than one weight for this package,
repeat weight_name and weight_value.
NOTE: But if weight_name is package_limit, you can use only that one weight and
capacity throughout the cluster. package_limit is a reserved value, which, if used,
must be entered exactly in that form. It provides the simplest way of managing weights
and capacities; see “Simple Method” (page 135) for more information.
The rules for forming weight_name are the same as those for forming package_name
(page 204). weight_name must exactly match the corresponding CAPACITY_NAME.
weight_value is an unsigned floating-point value between 0 and 1000000 with at most
three digits after the decimal point.
You can use these parameters to override the cluster-wide default package weight that
corresponds to a given node capacity. You can define that cluster-wide default package
weight by means of the WEIGHT_NAME and WEIGHT_DEFAULT parameters in the
cluster configuration file (explicit default). If you do not define an explicit default (that
is, if you define a CAPACITY_NAME in the cluster configuration file with no
corresponding WEIGHT_NAME and WEIGHT_DEFAULT), the default weight is
assumed to be zero (implicit default). Configuring weight_name and weight_value here
in the package configuration file overrides the cluster-wide default (implicit or explicit),
and assigns a particular weight to this package.
monitored_subnet
The LAN subnet that is to be monitored for this package. Replaces legacy SUBNET
which is still supported in the package configuration file for legacy packages; see
“Configuring a Legacy Package” (page 262).
You can specify multiple subnets; use a separate line for each.
If you specify a subnet as a monitored_subnet the package will not run on any node not
reachable via that subnet. This normally means that if the subnet is not up, the package
will not run. (For cross-subnet configurations, in which a subnet may be configured
on some nodes and not on others, see monitored_subnet_access below, ip_subnet_node
(page 214), and “About Cross-Subnet Failover” (page 147).)
Typically you would monitor the ip_subnet, specifying it here as well as in the ip_subnet
parameter (page 213), but you may want to monitor other subnets as well; you can
specify any subnet that is configured into the cluster (via the STATIONARY_IP
parameter in the cluster configuration file). See “Stationary and Relocatable IP Addresses
and Monitored Subnets” (page 71) for more information.
If any monitored_subnet fails, Serviceguard will switch the package to any other node
specified by node_name (page 205) which can communicate on all the monitored_subnets
defined for this package. See the comments in the configuration file for more information
and examples.
monitored_subnet_access
In cross-subnet configurations, specifies whether each monitored_subnet is accessible
on all nodes in the package’s node_name list (page 205), or only some. Valid values are
PARTIAL, meaning that at least one of the nodes has access to the subnet, but not all;
and FULL, meaning that all nodes have access to the subnet. The default is FULL, and
it is in effect if monitored_subnet_access is not specified.
See also ip_subnet_node (page 214) and “About Cross-Subnet Failover” (page 147).
New for modular packages. For legacy packages, see “Configuring Cross-Subnet
Failover” (page 269).
CAUTION: HP recommends that this subnet be configured into the cluster. You do
this in the cluster configuration file by specifying a HEARTBEAT_IP or STATIONARY_IP
under a NETWORK_INTERFACE on the same subnet, for each node in this package's
NODE_NAME list. For example, an entry such as the following in the cluster
configuration file configures subnet 192.10.25.0 (lan1) on node ftsys9:
NODE_NAME ftsys9
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.10.25.18
See“Cluster Configuration Parameters ” (page 105) for more information.
If the subnet is not configured into the cluster, Serviceguard cannot manage or monitor
it, and in fact cannot guarantee that it is available on all nodes in the package's node-name
list (page 205) . Such a subnet is referred to as an external subnet, and relocatable
addresses on that subnet are known as external addresses. If you use an external subnet,
you risk the following consequences:
• If the subnet fails, the package will not fail over to an alternate node.
• Even if the subnet remains intact, if the package needs to fail over because of some
other type of failure, it could fail to start on an adoptive node because the subnet
is not available on that node.
For these reasons, configure all ip_subnets into the cluster, unless you are using a
networking technology that does not support DLPI. In such cases, follow instructions
in the networking product's documentation to integrate the product with Serviceguard.
For each subnet used, specify the subnet address on one line and, on the following
lines, the relocatable IP addresses that the package uses on that subnet. These will be
configured when the package starts and unconfigured when it halts.
For example, if this package uses subnet 192.10.25.0 and the relocatable IP addresses
192.10.25.12 and 192.10.25.13, enter:
ip_subnet 192.10.25.0
ip_address 192.10.25.12
ip_address 192.10.25.13
If you want the subnet to be monitored, specify it in the monitored_subnet parameter
(page 212) as well.
In a cross-subnet configuration, you also need to specify which nodes the subnet is
configured on; see ip_subnet_node below. See also monitored_subnet_access (page 212)
and “About Cross-Subnet Failover” (page 147).
ip_subnet_node
In a cross-subnet configuration, specifies which nodes an ip_subnet is configured on.
If no ip_subnet_nodes are listed under an ip_subnet, it is assumed to be configured on
all nodes in this package’s node_name list (page 205).
Can be added or deleted while the package is running, with these restrictions:
• The package must not be running on the node that is being added or deleted.
• The node must not be the first to be added to, or the last deleted from, the list of
ip_subnet_nodes for this ip_subnet.
See also monitored_subnet_access (page 212) and “About Cross-Subnet Failover” (page 147).
New for modular packages. For legacy packages, see “Configuring Cross-Subnet
Failover” (page 269).
ip_address
A relocatable IP address on a specified ip_subnet. Replaces IP, which is still supported
in the package control script for legacy packages.
For more information about relocatable IP addresses, see “Stationary and Relocatable
IP Addresses and Monitored Subnets” (page 71).
This parameter can be set for failover packages only.
service_name
A service is a program or function which Serviceguard monitors as long the package
is up. service_name identifies this function and is used by the cmrunserv and
cmhaltserv commands. You can configure a maximum of 30 services per package
and 900 services per cluster.
The length and formal restrictions for the name are the same as for package_name
(page 204). service_name must be unique among all packages in the cluster.
service_cmd
The command that runs the program or function for this service_name, for example,
/usr/bin/X11/xclock -display 15.244.58.208:0
An absolute pathname is required; neither the PATH variable nor any other environment
variable is passed to the command. The default shell is /bin/sh.
NOTE: Be careful when defining service run commands. Each run command is
executed in the following way:
• The cmrunserv command executes the run command.
• Serviceguard monitors the process ID (PID) of the process the run command
creates.
• When the command exits, Serviceguard determines that a failure has occurred
and takes appropriate action, which may include transferring the package to an
adoptive node.
• If a run command is a shell script that runs some other command and then exits,
Serviceguard will consider this normal exit as a failure.
Make sure that each run command is the name of an actual service and that its process
remains alive until the actual service stops. One way to manage this is to configure a
package such that the service is actually a monitoring program that checks the health
of the application that constitutes the main function of the package, and exits if it finds
the application has failed. The application itself can be started by an external_script
(page 220).
service_restart
The number of times Serviceguard will attempt to re-run the service_cmd. Valid values
are unlimited, none or any positive integer value. Default is none.
If the value is unlimited, the service will be restarted an infinite number of times. If
the value is none, the service will not be restarted.
This parameter is in the package control script for legacy packages.
service_halt_timeout
The length of time, in seconds, Serviceguard will wait for the service to halt before
forcing termination of the service’s process. The maximum value is 4294.
The value should be large enough to allow any cleanup required by the service to
complete.
If no value is specified, a zero timeout will be assumed, meaning that Serviceguard
will not wait any time before terminating the process.
vgchange_cmd
Replaces VGCHANGE, which is still supported for legacy packages; see “Configuring
a Legacy Package” (page 262). Specifies the method of activation for each Logical Volume
Manager (LVM) volume group identified by a vg entry.
The default is vgchange -a y.
vg
Specifies an LVM volume group (one per vg, each on a new line) on which a file system
(other than Red Hat GFS; see fs_type) needs to be mounted. A corresponding
vgchange_cmd (see above) specifies how the volume group is to be activated. The package
script generates the necessary filesystem commands on the basis of the fs_ parameters
(see “File system parameters” ).
concurrent_fsck_operations
The number of concurrent fsck operations allowed on file systems being mounted
during package startup. Not used for Red Hat GFS (see fs_type).
Legal value is any number greater than zero. The default is 1.
If the package needs to run fsck on a large number of file systems, you can improve
performance by carefully tuning this parameter during testing (increase it a little at
time and monitor performance each time).
concurrent_mount_and_umount_operations
The number of concurrent mounts and umounts to allow during package startup or
shutdown.
Legal value is any number greater than zero. The default is 1.
If the package needs to mount and unmount a large number of file systems, you can
improve performance by carefully tuning this parameter during testing (increase it a
little at time and monitor performance each time).
fs_mount_retry_count
The number of mount retries for each file system. Legal value is zero or any greater
number. The default is zero. The only valid value for Red Hat GFS (see fs_type) is zero.
If the mount point is busy at package startup and fs_mount_retry_count is set to zero,
package startup will fail.
If the mount point is busy and fs_mount_retry_count is greater than zero, the startup
script will attempt to kill the user process responsible for the busy mount point (fuser
-ku) and then try to mount the file system again. It will do this the number of times
specified by fs_mount_retry_count.
If the mount still fails after the number of attempts specified by fs_mount_retry_count,
package startup will fail.
This parameter is in the package control script for legacy packages.
fs_name
This parameter, in conjunction with fs_directory, fs_type, fs_mount_opt, fs_umount_opt,
and fs_fsck_opt, specifies a filesystem that is to be mounted by the package. Replaces
LV, which is still supported in the package control script for legacy packages.
fs_name must specify the block devicefile for a logical volume.
File systems are mounted in the order you specify in the package configuration file,
and unmounted in the reverse order.
See “File system parameters” (page 216) and the comments in the FILESYSTEMS section
of the configuration file for more information and examples. See also “Volume Manager
Planning ” (page 99), and the mount manpage.
NOTE: For filesystem types other than Red Hat GFS (see fs_type), a volume group
must be defined in this file (using vg; see (page 216)) for each logical volume specified
by an fs_name entry.
fs_directory
The root of the file system specified by fs_name. Replaces FS, which is still supported
in the package control script for legacy packages; see “Configuring a Legacy Package”
(page 262).
See the mount manpage and the comments in the configuration file for more
information.
fs_type
The type of the file system specified by fs_name. This parameter is in the package control
script for legacy packages.
Supported types are ext2, ext3, reiserfs, and gfs.
See the comments in the package configuration file template for more information.
fs_mount_opt
The mount options for the file system specified by fs_name. See the comments in the
configuration file for more information. This parameter is in the package control script
for legacy packages.
fs_umount_opt
The umount options for the file system specified by fs_name. See the comments in the
configuration file for more information. This parameter is in the package control script
for legacy packages.
fs_fsck_opt
The fsck options for the file system specified by fs_name. Not used for Red Hat GFS
(see fs_type). This parameter is in the package control script for legacy packages.
See the fsck manpage, and the comments in the configuration file, for more information.
pv
Physical volume on which persistent reservations (PR) will be made if the device
supports it. New for 11.19.
IMPORTANT: This parameter is for use only by HP partners, who should follow the
instructions in the package configuration file.
For information about Serviceguard's implementation of PR, see “About Persistent
Reservations” (page 86).
pev_
Specifies a package environment variable that can be passed to external_pre_script,
external_script, or both, by means of the cmgetpkgenv command. New for modular
packages.
external_pre_script
The full pathname of an external script to be executed before volume groups and disk
groups are activated during package startup, and after they have been deactivated
during package shutdown; that is, effectively the first step in package startup and last
step in package shutdown. New for modular packages.
If more than one external_pre_script is specified, the scripts will be executed on package
startup in the order they are entered into the package configuration file, and in the
reverse order during package shutdown.
See “About External Scripts” (page 143), as well as the comments in the configuration
file, for more information and examples.
external_script
The full pathname of an external script. This script is often the means of launching and
halting the application that constitutes the main function of the package. New for
modular packages.
The script is executed on package startup after volume groups and file systems are
activated and IP addresses are assigned, but before services are started; and during
package shutdown after services are halted but before IP addresses are removed and
volume groups and file systems deactivated.
If more than one external_script is specified, the scripts will be executed on package
startup in the order they are entered into this file, and in the reverse order during
package shutdown.
See “About External Scripts” (page 143), as well as the comments in the configuration
file, for more information and examples. See also service_cmd (page 215).
user_host
The system from which a user specified by user_name (page 221) can execute
package-administration commands.
Legal values are any_serviceguard_node, or cluster_member_node, or a specific
cluster node. If you specify a specific node it must be the official hostname (the
hostname portion, and only the hostname portion, of the fully qualified domain
name). As with user_name, be careful to spell the keywords exactly as given.
user_role
Must be package_admin, allowing the user access to the cmrunpkg, cmhaltpkg,
and cmmodpkg commands (and the equivalent functions in Serviceguard Manager)
and to the monitor role for the cluster. See “Controlling Access to the Cluster”
(page 183) for more information.
IMPORTANT: The following parameters are used only by legacy packages. Do not
try to use them in modular packages. See “Creating the Legacy Package Configuration
” (page 262) for more information.
PATH Specifies the path to be used by the script.
SUBNET Specifies the IP subnets that are to be monitored
for the package.
RUN_SCRIPTand HALT_SCRIPT Use the full pathname of each script.
These two parameters allow you to separate
package run instructions and package halt
instructions for legacy packages into separate
scripts if you need to. In this case, make sure you
include identical configuration information (such
as node names, IP addresses, etc.) in both scripts.
In most cases, though, HP recommends that you
use the same script for both run and halt
instructions. (When the package starts, the script
cmmakepkg Examples
The cmmakepkg command generates a package configuration file. Some examples
follow; see the cmmakepkg (1m) manpage for complete information. All the examples
create an editable configuration file pkg1.conf in the $SGCONF/pkg1 directory.
Next Step
The next step is to edit the configuration file you have generated; see “Editing the
Configuration File” (page 223).
NOTE: Optional parameters are commented out in the configuration file (with a # at
the beginning of the line). In some cases these parameters have default values that will
take effect unless you uncomment the parameter (remove the #) and enter a valid value
different from the default. Read the surrounding comments in the file, and the
explanations in this chapter, to make sure you understand the implications both of
accepting and of changing a given default.
In all cases, be careful to uncomment each parameter you intend to use and assign it
the value you want it to have.
• package_name. Enter a unique name for this package. Note that there are stricter
formal requirements for the name as of A.11.18.
• package_type. Enter failover or multi_node. ( system_multi_node is
reserved for special-purpose packages supplied by HP.) Note
that there are restrictions if another package depends on this package; see “About
Package Dependencies” (page 126).
See “Types of Package: Failover, Multi-Node, System Multi-Node” (page 198) for
more information.
• node_name. Enter the name of each cluster node on which this package can run,
with a separate entry on a separate line for each node.
• auto_run. For failover packages, enter yes to allow Serviceguard to start the package
on the first available node specified by node_name, and to automatically restart it
NOTE: The package(s) this package depends on must already be part of the
cluster configuration by the time you validate this package (via cmcheckconf;
see “Verifying and Applying the Package Configuration” (page 227)); otherwise
validation will fail.
Cluster Status
The status of a cluster, as shown by cmviewcl, can be one of the following:
• up - At least one node has a running cluster daemon, and reconfiguration is not
taking place.
• down - No cluster daemons are running on any cluster node.
• starting - The cluster is in the process of determining its active membership.
At least one cluster daemon is running.
• unknown - The node on which the cmviewcl command is issued cannot
communicate with other nodes in the cluster.
Network Status
The network interfaces have only status, as follows:
• Up.
• Down.
• Unknown. Serviceguard cannot determine whether the interface is up or down.
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service2
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10 (current)
Alternate up enabled ftsys9
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
UNOWNED_PACKAGES
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Service down service2
Subnet up 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9
pkg2 now has the status down, and it is shown as unowned, with package switching
disabled. Note that switching is enabled for both nodes, however. This means that once
global switching is re-enabled for the package, it will attempt to start up on the primary
node.
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS NAME MAX_RESTARTS RESTARTS
Service up 0 0 service2
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9 (current)
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Subnet up manx 192.8.15.0
Subnet up burmese 192.8.15.0
Subnet up tabby 192.8.15.0
Subnet up persian 192.8.15.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled manx
Alternate up enabled burmese
Alternate up enabled tabby
Alternate up enabled persian
CAUTION: HP Serviceguard cannot guarantee data integrity if you try to start a cluster
with the cmruncl -n command while a subset of the cluster's nodes are already
running a cluster. If the network connection is down between nodes, using cmruncl
-n might result in a second cluster forming, and this second cluster might start up the
same applications that are already running on the other cluster. The result could be
two applications overwriting each other's data on the disks.
NOTE: HP recommends that you remove a node from participation in the cluster (by
running cmhaltnode as shown below, or Halt Node in Serviceguard Manger) before
running the Linux shutdown command, especially in cases in which a packaged
application might have trouble during shutdown and not halt cleanly.
Starting a Package
Ordinarily, a package configured as part of the cluster will start up on its primary node
when the cluster starts up. You may need to start a package manually after it has been
halted manually. You can do this either in Serviceguard Manager, or with Serviceguard
commands as described below.
The cluster must be running, and if the package is dependent on other packages, those
packages must be either already running, or started by the same command that starts
this package (see the subsection that follows, and “About Package Dependencies”
(page 126).)
You can use Serviceguard Manager to start a package, or Serviceguard commands as
shown below.
Halting a Package
You halt a package when you want to stop the package but leave the node running.
Halting a package has a different effect from halting the node. When you halt the node,
its packages may switch to adoptive nodes (assuming that switching is enabled for
them); when you halt the package, it is disabled from switching to another node, and
must be restarted manually on another node or on the same node.
System multi-node packages run on all cluster nodes simultaneously; halting these
packages stops them running on all nodes. A multi-node package can run on several
nodes simultaneously; you can halt it on all the nodes it is running on, or you can
specify individual nodes.
You can use Serviceguard Manager to halt a package, or cmhaltpkg; for example:
cmhaltpkg pkg1
This halts pkg1 and disables it from switching to another node.
NOTE: But a failure in the package control script will cause the package to fail.
The package will also fail if an external script (or pre-script) cannot be executed
or does not exist.
IMPORTANT: See the latest Serviceguard release notes for important information
about version requirements for package maintenance.
• The package must have package switching disabled before you can put it in
maintenance mode.
• You can put a package in maintenance mode only on one node.
— The node must be active in the cluster and must be eligible to run the package
(on the package's node_name list).
— If the package is not running, you must specify the node name when you run
cmmodpkg (1m) to put the package in maintenance mode.
— If the package is running, you can put it into maintenance only on the node on
which it is running.
— While the package is in maintenance mode on a node, you can run the package
only on that node.
• You cannot put a package in maintenance mode, or take it out maintenance mode,
if doing so will cause another running package to halt.
• Since package failures are ignored while in maintenance mode, you can take a
running package out of maintenance mode only if the package is healthy.
Serviceguard checks the state of the package’s services and subnets to determine
if the package is healthy. If it is not, you must halt the package before taking it out
of maintenance mode.
• You cannot do online configuration as described under “Reconfiguring a Package”
(page 271).
• You cannot configure new dependencies involving this package; that is, you cannot
make it dependent on another package, or make another package depend on it.
See also “Dependency Rules for a Package in Maintenance Mode or Partial-Startup
Maintenance Mode ” (page 248).
• You cannot use the -t option of any command that operates on a package that is
in maintenance mode; see “Previewing the Effect of Cluster Changes” (page 252)
for information about the -t option.
Procedure
Follow these steps to perform maintenance on a package's networking components.
In this example, we'll call the package pkg1 and assume it is running on node1.
1. Place the package in maintenance mode:
cmmodpkg -m on -n node1 pkg1
2. Perform maintenance on the networks or resources and test manually that they
are working correctly.
Procedure
Follow this procedure to perform maintenance on a package. In this example, we'll
assume a package pkg1 is running on node1, and that we want to do maintenance on
the package's services.
1. Halt the package:
cmhaltpkg pkg1
2. Place the package in maintenance mode:
cmmodpkg -m on -n node1 pkg1
NOTE: If you now run cmviewcl, you'll see that the STATUS of pkg1 is up and
its STATE is maintenance.
Reconfiguring a Cluster
You can reconfigure a cluster either when it is halted or while it is still running. Some
operations can only be done when the cluster is halted. The table that follows shows
the required cluster state for many kinds of changes.
Table 7-1 Types of Changes to the Cluster Configuration
Change to the Cluster Configuration Required Cluster State
Change Quorum Server Configuration Cluster can be running; see“What Happens when You
Change the Quorum Configuration Online” (page 48).
Change Cluster Lock Configuration (lock LUN) Cluster can be running. See “Updating the Cluster Lock
LUN Configuration Online” (page 261) and“What
Happens when You Change the Quorum Configuration
Online” (page 48).
Add NICs and their IP addresses to the cluster Cluster can be running. See “Changing the Cluster
configuration Networking Configuration while the Cluster Is
Running” (page 257).
Delete NICs and their IP addresses, from the Cluster can be running. See“Changing the Cluster
cluster configuration Networking Configuration while the Cluster Is
Running” (page 257).
Change the designation of an existing interface Cluster can be running. See “Changing the Cluster
from HEARTBEAT_IP to STATIONARY_IP, or Networking Configuration while the Cluster Is
vice versa Running” (page 257).
Change an interface from IPv4 to IPv6, or vice Cluster can be running. See “Changing the Cluster
versa Networking Configuration while the Cluster Is
Running” (page 257)
Reconfigure IP addresses for a NIC used by the Must delete the interface from the cluster configuration,
cluster reconfigure it, then add it back into the cluster
configuration. See “What You Must Keep in Mind”
(page 258). Cluster can be running throughout.
Change IP Monitor parameters: SUBNET, Cluster can be running. See the entries for these
IP_MONITOR, POLLING TARGET parameters under “Cluster Configuration Parameters
” (page 105)for more information.
NOTE: You cannot use the -t option with any command operating on a package in
maintenance mode; see “Maintaining a Package: Maintenance Mode” (page 245).
For more information about these commands, see their respective manpages. You can
also perform these preview functions in Serviceguard Manager: check the Preview
[...] box for the action in question.
When you use the -t option, the command, rather than executing as usual, predicts
the results that would occur, sending a summary to $stdout. For example, assume
that pkg1 is a high-priority package whose primary node is node1, and which depends
on pkg2 and pkg3 to run on the same node. These are lower-priority packages which
are currently running on node2. pkg1 is down and disabled, and you want to see the
effect of enabling it:
cmmodpkg -e -t pkg1
You will see output something like this:
package:pkg3|node:node2|action:failing
package:pkg2|node:node2|action:failing
package:pkg2|node:node1|action:starting
package:pkg3|node:node1|action:starting
package:pkg1|node:node1|action:starting
cmmodpkg: Command preview completed successfully
This shows that pkg1, when enabled, will “drag” pkg2 and pkg3 to its primary node,
node1. It can do this because of its higher priority; see “Dragging Rules for Simple
Dependencies” (page 128). Running the preview confirms that all three packages will
successfully start on node2 (assuming conditions do not change between now and
when you actually enable pkg1, and there are no failures in the run scripts).
Using cmeval
You can use cmeval to evaluate the effect of cluster changes on Serviceguard packages.
You can also use it simply to preview changes you are considering making to the cluster
as a whole.
You can use cmeval safely in a production environment; it does not affect the state of
the cluster or packages. Unlike command preview mode (the -t discussed above)
cmeval does not require you to be logged in to the cluster being evaluated, and in fact
that cluster does not have to be running, though it must use the same Serviceguard
release and patch version as the system on which you run cmeval.
Use cmeval rather than command preview mode when you want to see more than
the effect of a single command, and especially when you want to see the results of
large-scale changes, or changes that may interact in complex ways, such as changes to
package priorities, node order, dependencies and so on.
Using cmeval involves three major steps:
1. Use cmviewcl -v -f line to write the current cluster configuration out to a
file.
2. Edit the file to include the events or changes you want to preview
3. Using the file from Step 2 as input, run cmeval to preview the results of the
changes.
For example, assume that pkg1 is a high-priority package whose primary node is
node1, and which depends on pkg2 and pkg3 to be running on the same node. These
lower-priority-packages are currently running on node2. pkg1 is down and disabled,
and you want to see the effect of enabling it.
In the output of cmviewcl -v -f line, you would find the line
package:pkg1|autorun=disabled and change it to
package:pkg1|autorun=enabled. You should also make sure that the nodes the
package is configured to run on are shown as available; for example:
package:pkg1|node:node1|available=yes. Then save the file (for example as
newstate.in) and run cmeval:
cmeval -v newstate.in
You would see output something like this:
package:pkg3|node:node2|action:failing
package:pkg2|node:node2|action:failing
package:pkg2|node:node1|action:starting
IMPORTANT: For detailed information and examples, see the cmeval (1m) manpage.
NOTE: Before you start, make sure you have configured access to ftsys10 as
described under “Configuring Root-Level Access” (page 155).
1. Use the following command to store a current copy of the existing cluster
configuration in a temporary file in case you need to revert to it:
cmgetconf -C temp.conf
2. Specify a new set of nodes to be configured and generate a template of the new
configuration (all on one line):
cmquerycl -C clconfig.conf -c cluster1 -n ftsys8 -n ftsys9
-n ftsys10
3. Edit clconfig.conf to check the information about the new node.
4. Verify the new configuration:
cmcheckconf -C clconfig.conf
5. Apply the changes to the configuration and send the new binary configuration
file to all cluster nodes:
cmapplyconf -C clconfig.conf
Use cmrunnode to start the new node, and, if you so decide, set the
AUTOSTART_CMCLD parameter to 1 in the $SGAUTOSTART file (see “Understanding
the Location of Serviceguard Files” (page 153)) to enable the new node to join the cluster
automatically each time it reboots.
NOTE: If you want to remove a node from the cluster, run the cmapplyconf
command from another node in the same cluster. If you try to issue the command on
the node you want removed, you will get an error message.
1. Use the following command to store a current copy of the existing cluster
configuration in a temporary file:
cmgetconf -c cluster1 temp.conf
2. Specify the new set of nodes to be configured (omitting ftsys10) and generate
a template of the new configuration:
cmquerycl -C clconfig.conf -c cluster1 -n ftsys8 -n ftsys9
3. Edit the file clconfig.conf to check the information about the nodes that remain
in the cluster.
4. Halt the node you are going to remove (ftsys10in this example):
cmhaltnode -f -v ftsys10
5. Verify the new configuration:
cmcheckconf -C clconfig.conf
6. From ftsys8 or ftsys9, apply the changes to the configuration and distribute
the new binary configuration file to all cluster nodes.:
cmapplyconf -C clconfig.conf
NOTE: If you are trying to remove an unreachable node on which many packages
are configured to run, you may see the following message:
The configuration change is too large to process while the cluster is running.
Split the configuration change into multiple requests or halt the cluster.
In this situation, you must halt the cluster to remove the node.
CAUTION: Do not add IP addresses to network interfaces that are configured into
the Serviceguard cluster, unless those IP addresses themselves will be immediately
configured into the cluster as stationary IP addresses. If you configure any address
other than a stationary IP address on a Serviceguard network interface, it could collide
with a relocatable package address assigned by Serviceguard.
Some sample procedures follow.
IMPORTANT: See “What Happens when You Change the Quorum Configuration
Online” (page 48) for important information.
1. In the cluster configuration file, modify the value of CLUSTER_LOCK_LUN for
each node.
2. Run cmcheckconf to check the configuration.
3. Run cmapplyconf to apply the configuration.
If you need to replace the physical device, see “Replacing a Lock LUN” (page 283).
Changing MAX_CONFIGURED_PACKAGES
As of Serviceguard A.11.18, you can change MAX_CONFIGURED_PACKAGES while
the cluster is running. The default for MAX_CONFIGURED_PACKAGES is the maximum
number allowed in the cluster. You can use Serviceguard Manager to change
MAX_CONFIGURED_PACKAGES, or Serviceguard commands as shown below.
Use the cmgetconf command to obtain a current copy of the cluster's existing
configuration, for example:
cmgetconf -C <cluster_name> clconfig.conf
Edit the clconfig.conf file to include the new value for
MAX_CONFIGURED_PACKAGES. Then use the cmcheckconf command to verify
the new configuration. Using the -k or -K option can significantly reduce the response
time.
Reconfiguring a Cluster 261
Use the cmapplyconf command to apply the changes to the configuration and send
the new configuration file to all cluster nodes. Using -k or -K can significantly reduce
the response time.
IMPORTANT: You can still create a new legacy package. If you are using a Serviceguard
Toolkit such as Serviceguard NFS Toolkit, consult the documentation for that product.
Otherwise, use this section to maintain and re-work existing legacy packages rather
than to create new ones. The method described in Chapter 6: “Configuring Packages
and Their Services ” (page 197), is simpler and more efficient for creating new packages,
allowing packages to be built from smaller modules, and eliminating the separate
package control script and the need to distribute it manually.
If you decide to convert a legacy package to a modular package, see “Migrating a
Legacy Package to a Modular Package” (page 272). Do not attempt to convert
Serviceguard Toolkit packages.
IMPORTANT: Note that the rules for valid SERVICE_NAMEs are more restrictive
as of Serviceguard A.11.18.
CAUTION: If you are not using the XDC or CLX products, do not modify the
REMOTE DATA REPLICATION DEFINITION section. If you are using one of these
products, consult the product’s documentation.
• If you are using LVM, enter the names of volume groups to be activated using the
VG[] array parameters, and select the appropriate options for the storage activation
command, including options for mounting and unmounting file systems, if
necessary. Specify the file system type (ext2is the default; ext3, reiserfs, or
gfs can also be used; see the fs_ parameter descriptions starting with
fs_mount_retry_count (page 217) for more information).
• Add the names of logical volumes and the file system that will be mounted on
them.
• Specify the filesystem mount and unmount retry options.
• If your package uses a large number of volume groups or disk groups or mounts
a large number of file systems, consider increasing the number of concurrent
vgchange, mount/umount, and fsck operations;
• Define IP subnet and IP address pairs for your package. IPv4 or IPv6 addresses
are allowed.
• Add service name(s).
• Add service command(s)
• Add a service restart parameter, if you so decide.
For more information about services, see the discussion of the service_ parameters
starting with service_name (page 214).
function customer_defined_run_cmds
{
# ADD customer defined run commands.
: # do nothing instruction, because a function must contain some command.
date >> /tmp/pkg1.datelog
echo 'Starting pkg1' >> /tmp/pkg1.datelog
test_return 51
}
function customer_defined_halt_cmds
{
# ADD customer defined halt commands.
: # do nothing instruction, because a function must contain some command.
date >> /tmp/pkg1.datelog
echo 'Halting pkg1' >> /tmp/pkg1.datelog
test_return 52
}
NOTE: You must use cmcheckconf and cmapplyconf again any time you make
changes to the cluster and package configuration files.
Configuring node_name
First you need to make sure that pkg1 will fail over to a node on another subnet only
if it has to. For example, if it is running on NodeA and needs to fail over, you want it
to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing
over to NodeC or NodeD.
Assuming nodeA is pkg1’s primary node (where it normally starts), create node_name
entries in the package configuration file as follows:
node_name nodeA
node_name nodeB
node_name nodeC
node_name nodeD
Configuring monitored_subnet_access
In order to monitor subnet 15.244.65.0 or 15.244.56.0, you would configure
monitored_subnet and monitored_subnet_access in pkg1’s package configuration file as
follows:
monitored_subnet 15.244.65.0
monitored_subnet_access PARTIAL
monitored_subnet 15.244.56.0
monitored_subnet_access PARTIAL
Reconfiguring a Package
You reconfigure a package in much the same way as you originally configured it; for
modular packages, see Chapter 6: “Configuring Packages and Their Services ” (page 197);
for older packages, see “Configuring a Legacy Package” (page 262).
The cluster can be either halted or running during package reconfiguration, and in
some cases the package itself can be running; the types of change you can make and
the times when they take effect depend on whether the package is running or not.
If you reconfigure a package while it is running, it is possible that the package could
fail later, even if the cmapplyconf succeeded.
For example, consider a package with two volume groups. When this package started,
it activated both volume groups. While the package is running, you could change its
configuration to list only one of the volume groups, and cmapplyconf would succeed.
If you issue cmhaltpkg command, however, the halt would fail. The modified package
would not deactivate both of the volume groups that it had activated at startup, because
it would only see the one volume group in its current configuration file.
For more information, see “Allowable Package States During Reconfiguration ”
(page 274).
NOTE: The cmmigratepkg command requires Perl version 5.8.3 or higher on the
system on which you run the command.
NOTE: If neither legacy nor modular is called out under “Change to the Package”, the
“Required Package State” applies to both types of package. Changes that are allowed,
but which HP does not recommend, are labeled “should not be running”.
IMPORTANT: Actions not listed in the table can be performed for both types of package
while the package is running.
In all cases the cluster can be running, and packages other than the one being
reconfigured can be running. And remember too that you can make changes to package
configuration files at any time; but do not apply them (using cmapplyconf or
Serviceguard Manager) to a running package in the cases indicated in the table.
Change run script contents: Package can be running, but should not be starting.
legacy package Timing problems may occur if the script is changed while the package is
starting.
Change halt script Package can be running, but should not be halting.
contents: legacy package Timing problems may occur if the script is changed while the package is
halting.
Add or remove a SUBNET Package must not be running. (Also applies to cross-subnet configurations.)
(in control script): legacy Package must not be running. Subnet must already be configured into the
package cluster.
Add or remove an IP (in Package must not be running. (Also applies to cross-subnet configurations.)
control script): legacy
package
Change a file system: Package should not be running (unless you are only changing fs_umount_opt).
modular package Changing file-system options other than fs_umount_opt may cause problems
because the file system must be unmounted (using the existing
fs_umount_opt) and remounted with the new options; the CAUTION under
“Remove a file system: modular package” applies in this case as well.
If only fs_umount_opt is being changed, the file system will not be unmounted;
the new option will take effect when the package is halted or the file system
is unmounted for some other reason.
NOTE: You will not be able to cancel if you use cmapplyconf -f.
• Package nodes
• Package dependencies
• Package weights (and also node capacity, defined in the cluster configuration file)
• Package priority
• auto_run
• failback_policy
Single-Node Operation
In a multi-node cluster, you could have a situation in which all but one node has failed,
or you have shut down all but one node, leaving your cluster in single-node operation.
This remaining node will probably have applications running on it. As long as the
Serviceguard daemon cmcld is active, other nodes can rejoin the cluster.
If the Serviceguard daemon fails when the cluster is in single-node operation, it will
leave the single node up and your applications running
CAUTION: Remove the node from the cluster first. If you run the rpm -e command
on a server that is still a member of a cluster, it will cause that cluster to halt, and the
cluster to be deleted.
To remove Serviceguard:
CAUTION: In testing the cluster in the following procedures, be aware that you are
causing various components of the cluster to fail, so that you can determine that the
cluster responds correctly to failure situations. As a result, the availability of nodes and
applications may be disrupted.
Monitoring Hardware
Good standard practice in handling a high availability system includes careful fault
monitoring so as to prevent failures if possible or at least to react to them swiftly when
they occur. For information about disk monitoring, see “Creating a Disk Monitor
Replacing Disks
The procedure for replacing a faulty disk mechanism depends on the type of disk
configuration you are using. Refer to your Smart Array documentation for issues related
to your Smart Array.
CAUTION: Before you start, make sure that all nodes have logged a message such as
the following in syslog:
WARNING: Cluster lock LUN /dev/sda1 is corrupt: bad label. Until
this situation is corrected, a single failure could cause all
nodes in the cluster to crash.
Once all nodes have logged this message, use a command such as the following to
specify the new cluster lock LUN:
cmdisklock reset /dev/sda1
CAUTION: You are responsible for determining that the device is not being used by
LVM or any other subsystem on any node connected to the device before using
cmdisklock. If you use cmdisklock without taking this precaution, you could lose
data.
NOTE: cmdisklock is needed only when you are repairing or replacing a lock LUN;
see the cmdisklock (1m) manpage for more information.
Serviceguard checks the lock LUN every 75 seconds. After using the cmdisklock
command, review the syslog file of an active cluster node for not more than 75 seconds.
By this time you should see a message showing that the lock disk is healthy again.
Examples
The following command will clear all the PR reservations registered with the key abc12
on the set of LUNs listed in the file /tmp/pr_device_list
pr_cleanup -k abc12 lun -f /tmp/pr_device_list
pr_device_list contains entries such as the following:
/dev/sdb1
/dev/sdb2
Alternatively you could enter the device-file names on the command line:
pr_cleanup -k abc12 lun /dev/sdb1 /dev/sdb2
The next command clears all the PR reservations registered with the PR key abcde on
the underlying LUNs of the volume group vg01:
pr_cleanup -k abcde vg01
NOTE: Because the keyword lun is not included, the device is assumed to be a volume
group.
2. Use the cmapplyconf command to apply the configuration and copy the new
binary file to all cluster nodes:
cmapplyconf -C config.conf
This procedure updates the binary file with the new MAC address and thus avoids
data inconsistency between the outputs of the cmviewconf and ifconfig commands.
NOTE: The quorum server reads the authorization file at startup. Whenever you
modify the file qs_authfile, run the following command to force a re-read of
the file. For example on a Red Hat distribution:
/usr/local/qs/bin/qs -update
On a SUSE distribution:
/opt/qs/bin/qs -update
CAUTION: Make sure that the old system does not rejoin the network with the old
IP address.
NOTE: While the old quorum server is down and the new one is being set up:
• The cmquerycl, cmcheckconf and cmapplyconf commands will not work
• The cmruncl, cmhaltcl, cmrunnode, and cmhaltnode commands will work
• If there is a node or network failure that creates a 50-50 membership split, the
quorum server will not be available as a tie-breaker, and the cluster will fail.
Troubleshooting Approaches
The following sections offer a few suggestions for troubleshooting by reviewing the
state of the running system and by examining cluster status data, log files, and
configuration files. Topics include:
• Reviewing Package IP Addresses
• Reviewing the System Log File
• Reviewing Configuration Files
• Reviewing the Package Control Script
• Using cmquerycl and cmcheckconf
• Using cmviewcl
• Reviewing the LAN Configuration
NOTE: Many other products running on Linux in addition to Serviceguard use the
syslog file to save messages. Refer to your Linux documentation for additional
information on using the system log.
Solving Problems
Problems with Serviceguard may be of several types. The following is a list of common
categories of problem:
• Serviceguard Command Hangs.
• Cluster Re-formations.
• System Administration Errors.
• Package Control Script Hangs.
• Package Movement Errors.
• Node and Network Failures.
• Quorum Server Messages.
The default Serviceguard control scripts are designed to take the straightforward steps
needed to get an application running or stopped. If the package administrator specifies
a time limit within which these steps need to occur and that limit is subsequently
exceeded for any reason, Serviceguard takes the conservative approach that the control
script logic must either be hung or defective in some way. At that point the control
script cannot be trusted to perform cleanup actions correctly, thus the script is terminated
and the package administrator is given the opportunity to assess what cleanup steps
must be taken.
If you want the package to switch automatically in the event of a control script timeout,
set the node_fail_fast_enabled parameter (page 206) to YES. In this case, Serviceguard will
cause a reboot on the node where the control script timed out. This effectively cleans
up any side effects of the package’s run or halt attempt. In this case the package will
be automatically restarted on any available alternate node for which it is configured.
NOTE: See the HP Serviceguard Quorum Server Version A.04.00 Release Notes for
information about configuring the Quorum Server. Do not proceed without reading
the Release Notes for your version.
Messages
The coordinator node in Serviceguard sometimes sends a request to the quorum server
to set the lock state. (This is different from a request to obtain the lock in tie-breaking.)
If the quorum server’s connection to one of the cluster nodes has not completed, the
request to set may fail with a two-line message like the following in the quorum server’s
log file:
Oct 008 16:10:05:0: There is no connection to the applicant
2 for lock /sg/lockTest1
Oct 08 16:10:05:0:Request for lock /sg/lockTest1 from
applicant 1 failed: not connected to all applicants.
This condition can be ignored. The request will be retried a few seconds later and will
succeed. The following message is logged:
Oct 008 16:10:06:0: Request for lock /sg/lockTest1
succeeded. New lock owners: 1,2.
Use Checkpoints
Design applications to checkpoint complex transactions. A single transaction from the
user's perspective may result in several actual database transactions. Although this
issue is related to restartable transactions, here it is advisable to record progress locally
on the client so that a transaction that was interrupted by a system failure can be
completed after the failover occurs.
For example, suppose the application being used is calculating PI. On the original
system, the application has gotten to the 1,000th decimal point, but the application has
not yet written anything to disk. At that moment in time, the node crashes. The
application is restarted on the second node, but the application is started up from
scratch. The application must recalculate those 1,000 decimal points. However, if the
application had written to disk the decimal points on a regular basis, the application
could have restarted from where it left off.
Use DNS
DNS provides an API which can be used to map hostnames to IP addresses and vice
versa. This is useful for BSD socket applications such as telnet which are first told the
target system name. The application must then map the name to an IP address in order
to establish a connection. However, some calls should be used with caution.
Applications should not reference official hostnames or IP addresses. The official
hostname and corresponding IP address for the hostname refer to the primary LAN
card and the stationary IP address for that card. Therefore, any application that refers
to, or requires the hostname or primary IP address may not work in an HA environment
where the network identity of the system that supports a given application moves from
one system to another, but the hostname does not move.
One way to look for problems in this area is to look for calls to gethostname(2) in
the application. HA services should use gethostname() with caution, since the
response may change over time if the application migrates. Applications that use
gethostname() to determine the name for a call to gethostbyname(3) should also
be avoided for the same reason. Also, the gethostbyaddr() call may return different
answers over time if called with a stationary IP address.
Instead, the application should always refer to the application name and relocatable
IP address rather than the hostname and stationary IP address. It is appropriate for the
application to call gethostbyname(3), specifying the application name rather than
the hostname. gethostbyname(3) will pass in the IP address of the application. This
IP address will move with the application to the new node.
319
Hardware Worksheet
=============================================================================
SPU Information:
===============================================================================
=============================================================================
Bus Type ______ Slot Number ____ Address ____ Disk Device File _________
Bus Type ______ Slot Number ___ Address ____ Disk Device File __________
Bus Type ______ Slot Number ___ Address ____ Disk Device File _________
Bus Type ______ Slot Number ___ Address ____ Disk Device File _________
============================================================================
Disk Power:
============================================================================
Tape Backup Power:
============================================================================
Other Power:
OR
==============================================================================
=============================================================================
PATH______________________________________________________________
VGCHANGE_________________________________
VG[0]__________________LV[0]______________________FS[0]____________________
VG[1]__________________LV[1]______________________FS[1]____________________
VG[2]__________________LV[2]______________________FS[2]____________________
NOTE: MD, RAIDTAB, and RAIDSTART are deprecated and should not be used. See
“Multipath for Storage ” (page 96).
Anycast An address for a set of interfaces. In most cases these interfaces belong to different
nodes. A packet sent to an anycast address is delivered to one of these interfaces
identified by the address. Since the standards for using anycast addresses are still
evolving, they are not supported in Linux at present.
Multicast An address for a set of interfaces (typically belonging to different nodes). A packet
sent to a multicast address will be delivered to all interfaces identified by that address.
Unlike IPv4, IPv6 has no broadcast addresses; their functions are superseded by
multicast.
Unicast Addresses
IPv6 unicast addresses are classified into different types. They are: global aggregatable
unicast address, site-local address and link-local address. Typically a unicast address
is logically divided as follows:
Table D-2
n bits 128-n bits
Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a
link. Interface identifiers are required to be unique on that link. The link is generally
identified by the subnet prefix.
A unicast address is called an unspecified address if all the bits in the address are zero.
Textually it is represented as “::”.
The unicast address ::1 or 0:0:0:0:0:0:0:1 is called the loopback address. It is
used by a node to send packets to itself.
Example:
::192.168.0.1
Example:
::ffff:192.168.0.1
where
FP = Format prefix. Value of this is “001” for Aggregatable Global unicast addresses.
TLA ID = Top-level Aggregation Identifier.
RES = Reserved for future use.
NLA ID = Next-Level Aggregation Identifier.
Link-Local Addresses
Link-local addresses have the following format:
Table D-6
10 bits 54 bits 64 bits
1111111010 0 interface ID
Link-local address are supposed to be used for addressing nodes on a single link.
Packets originating from or destined to a link-local address will not be forwarded by
a router.
Site-Local Addresses
Site-local addresses have the following format:
Table D-7
10 bits 38 bits 16 bits 64 bits
Link-local address are supposed to be used within a site. Routers will not forward any
packet with site-local source or destination address outside the site.
Multicast Addresses
A multicast address is an identifier for a group of nodes. Multicast addresses have the
following format:
Table D-8
8 bits 4 bits 4 bits 112 bits
“FF” at the beginning of the address identifies the address as a multicast address.
The “flags” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and
must be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A
value of zero indicates that it is permanently assigned otherwise it is a temporary
assignment.
The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
For example, a value of ‘1’ indicates that it is a node-local multicast group. A value of
‘2’ indicates that the scope is link-local. A value of “5” indicates that the scope is
site-local.
Configuring a Channel Bonding Interface with Persistent IPv6 Addresses on Red Hat
Linux
Configure the following parameters in
/etc/sysconfig/network-scripts/ifcfg-bond0:
DEVICE=bond0
IPADDR=12.12.12.12
NOTE: If a cluster is not yet configured, you will not see the Serviceguard Cluster
section on this screen. To create a cluster, from the SMH Tools menu, click the
Serviceguard Manager link in the Serviceguard box first, then click Create Cluster.
The figure below shows a browser session at the HP Serviceguard Manager Main Page.
1 Cluster and Displays information about the Cluster status, alerts and general information.
Overall
status and NOTE: The System Tools menu item is not available in this version of
alerts Serviceguard Manager.
2 Menu tool The menu tool bar is available from the HP Serviceguard Manager
bar Homepage, and from any cluster, node or package view-only property page.
Menu option availability depends on which type of property page (cluster,
node or package) you are currently viewing.
3 Tab bar The default Tab bar allows you to view additional cluster-related
information. The Tab bar displays different content when you click on a
specific node or package.
4 Node Displays information about the Node status, alerts and general information.
information
5 Package Displays information about the Package status, alerts and general
information information.
NOTE: If you click on a cluster running an earlier Serviceguard release, the page will
display a link that will launch Serviceguard Manager A.05.01 (if installed) via Java
Webstart.
341
defined, 42 support for additional productss, 268
dynamic re-formation, 44 troubleshooting, 290
heartbeat subnet parameter, 111 controlling the speed of application failover, 301
initial configuration of the cluster, 42 creating the package configuration, 262
main functions, 42 customer defined functions
maximum configured packages parameter, 123 adding to the control script, 267
member timeout parameter, 118
monitored non-heartbeat subnet, 114 D
network polling interval parameter, 119, 122 data
planning the configuration, 105
disks, 35
quorum server parameter, 108 data congestion, 43
testing, 282 deciding when and where to run packages, 50, 51
cluster node deleting a package configuration
parameter in cluster manager configuration, 105, 108 using cmdeleteconf, 273
cluster parameters deleting a package from a running cluster, 273
initial configuration, 42 deleting nodes while the cluster is running, 256
cluster re-formation deleting the cluster configuration
scenario, 90 using cmdeleteconf, 195
cluster startup dependencies
manual, 43 configuring, 126
cmapplyconf, 255, 269 designing applications to run on multiple systems, 304
cmapplyconf command, 227 disk
cmcheckconf, 189, 227, 268 data, 35
troubleshooting, 291 interfaces, 35
cmcheckconf command, 227 root, 35
cmcld daemon sample configurations, 35
and node reboot, 39 disk I/O
and node TOC, 39 hardware planning, 96
and safety timer, 39 disk layout
cmclnodelist bootstrap file, 155 planning, 99
cmdeleteconf disk logical units
deleting a package configuration, 273 hardware planning, 97
deleting the cluster configuration, 195 disk monitoring
cmmakepkg configuring, 228
examples, 222 disks
cmmodnet in Serviceguard, 34
assigning IP addresses in control scripts, 72 replacing, 283
cmnetassist daemon, 40 supported types in Serviceguard, 35
cmnetd daemon, 38 distributing the cluster and package configuration, 227, 268
cmquerycl DNS services, 158
troubleshooting, 291 down time
cmsnmpd daemon, 38 minimizing planned, 311
configuration dynamic cluster re-formation, 44
basic tasks and steps, 28
cluster planning, 100
of the cluster, 42 E
package, 197 enclosure for disks
package planning, 123 replacing a faulty mechanism, 283
service, 197 Ethernet
configuration file redundant configuration, 31
for cluster manager, 42 exclusive access
troubleshooting, 290 relinquishing via TOC, 90
configuring packages and their services, 197 expanding the cluster
control script planning ahead, 94
adding customer defined functions, 267 expansion
in package configuration, 265 planning for, 125
pathname parameter in package configuration, 221 explanations
342 Index
package parameters, 204 halting a package, 243
halting the entire cluster, 242
F handling application failures, 310
hardware
failback policy
monitoring, 282
used by package manager, 57
power supplies, 36
FAILBACK_POLICY parameter
hardware failures
used by package manager, 57
response to, 91
failover
hardware planning
controlling the speed in applications, 301
blank planning worksheet, 319
defined, 25
Disk I/O Bus Type, 96
failover behavior
disk I/O information for shared disks, 96
in packages, 125
host IP address, 95, 99
failover package, 49, 198
host name, 94
failover policy
I/O bus addresses, 96
used by package manager, 53
I/O slot numbers, 96
FAILOVER_POLICY parameter
LAN interface name, 95, 99
used by package manager, 53
LAN traffic type, 95
failure
memory capacity, 94
kinds of responses, 89
number of I/O slots, 94
network communication, 92
planning the configuration, 94
response to hardware failures, 91
S800 series number, 94
responses to package and service failures, 92
SPU information, 94
restarting a service after failure, 92
subnet, 95, 99
failures
worksheet, 97
of applications, 310
heartbeat messages, 24
FibreChannel, 35
defined, 43
figures
heartbeat subnet address
mirrored disks connected for high availability, 36
parameter in cluster configuration, 111
redundant LANs, 32
HEARTBEAT_IP
Serviceguard software components, 38
parameter in cluster configuration, 111
tasks in configuring an Serviceguard cluster, 28
high availability, 23
typical cluster after failover, 25
HA cluster defined, 29
typical cluster configuration, 24
objectives in planning, 93
file locking, 309
host IP address
file system name parameter in package control script, 222
hardware planning, 95, 99
file systems
host name
planning, 99
hardware planning, 94
floating IP address
HOSTNAME_ADDRESS_FAMILY
defined, 71
defined, 106
floating IP addresses
discussion and restrictions, 101
in Serviceguard packages, 72
how the cluster manager works, 42
FS, 222
how the network manager works, 71
in sample package control script, 266
FS_MOUNT_OPT
in sample package control script, 266 I
I/O bus addresses
G hardware planning, 96
general planning, 93 I/O slots
hardware planning, 94, 96
gethostbyname(), 306
identifying cluster-aware volume groups, 182
Installing Serviceguard, 153
H installing software
HALT_SCRIPT quorum server, 165
parameter in package configuration, 221 integrating HA applications with Serviceguard, 315
HALT_SCRIPT_TIMEOUT (halt script timeout) introduction
parameter in package configuration, 222 Serviceguard at a glance, 23
halting a cluster, 242 understanding Serviceguard hardware, 29
343
understanding Serviceguard software, 37 M
IP MAC addresses, 305
in sample package control script, 266 managing the cluster and nodes, 239
IP address manual cluster startup, 43
adding and deleting in packages, 73 MAX_CONFIGURED_PACKAGES
for nodes and packages, 71 parameter in cluster manager configuration, 123
hardware planning, 95, 99 maximum number of nodes, 29
portable, 72 MEMBER_TIMEOUT
reviewing for packages, 288 and safety timer, 39
switching, 52, 53, 83 configuring, 118
IP_MONITOR defined, 117
defined, 120 maximum and minimum values , 117
membership change
J reasons for, 44
JFS, 301 memory capacity
hardware planning, 94
memory requirements
K lockable memory for Serviceguard, 94
kernel minimizing planned down time, 311
hang, and TOC, 89 mirrored disks connected for high availability
safety timer, 39 figure, 36
kernel consistency monitor cluster with Serviceguard commands, 191
in cluster configuration, 159 monitored non-heartbeat subnet
kernel interrupts parameter in cluster configuration, 114
and possible TOC, 118 monitored resource failure
Serviceguard behavior, 29
L monitoring disks, 228
LAN monitoring hardware, 282
heartbeat, 43 moving a package, 244
interface name, 95, 99 multi-node package, 49, 198
LAN failure multiple systems
Serviceguard behavior, 29 designing applications for, 304
LAN interfaces
primary and secondary, 30 N
LAN planning name resolution services, 158
host IP address, 95, 99 network
traffic type, 95 adding and deleting package IP addresses, 73
link-level addresses, 305 load sharing with IP addresses, 73
load sharing with IP addresses, 73 local interface switching, 74
local switching, 74 redundancy, 31
lock remote system switching, 82
cluster locks and power supplies, 36 network communication failure, 92
use of the cluster lock, 46 network components
use of the cluster lock disk, 45 in Serviceguard, 30
lock volume group, reconfiguring, 255 network manager
logical volume parameter in package control script, 222 adding and deleting package IP addresses, 73
logical volumes main functions, 71
creating the infrastructure, 165 network planning
planning, 99 subnet, 95, 99
LV, 222 network polling interval (NETWORK_POLLING_INTERVAL)
in sample package control script, 266 parameter in cluster manager configuration, 119, 122
LVM network time protocol (NTP)
commands for cluster use, 165 for clusters, 159
disks, 35 networking
planning, 99 redundant subnets, 95
networks
344 Index
binding to IP addresses, 307 subnet parameter, 221
binding to port addresses, 307 using Serviceguard commands, 262
IP addresses and naming, 304 verifying, 227
node and package IP addresses, 71 verifying the configuration, 227, 268
packages using IP addresses, 306 writing the package control script, 265
supported types in Serviceguard, 30 package configuration file, 204
writing network applications as HA services, 301 editing, 223
no cluster lock generating, 222
choosing, 48 package dependency paramters, 210
node successor_halt_timeout, 208
basic concepts, 29 package configuration parameters, 204
halt (TOC), 90 package control script
in Serviceguard cluster, 23 FS parameter, 222
IP addresses, 71 LV parameter, 222
timeout and TOC example, 90 package coordinator
node types defined, 43
active, 25 package dependency
primary, 25 parameters, 210
NODE_FAIL_FAST_ENABLED successor_halt_timeou, 208
effect of setting, 92 package failover behavior, 125
NODE_NAME package failures
parameter in cluster configuration, 110 responses, 92
parameter in cluster manager configuration, 105, 108 package IP address
nodetypes defined, 71
primary, 25 package IP addresses
NTP defined, 72
time protocol for clusters, 159 reviewing, 288
package manager
O blank planning worksheet, 325, 326
testing, 281
Object Manager, 290
package modules, 199
outages
base, 200
insulating users from, 300
optional, 202
package switching behavior
P changing, 244
package packages
adding and deleting package IP addresses, 73 deciding where and when to run, 50, 51
basic concepts, 29 managed by cmcld, 39
blank planning worksheet, 325, 326 parameter explanations, 204
changes allowed while the cluster is running, 275 parameters, 204
halting, 243 types, 198
in Serviceguard cluster, 23 parameters
local interface switching, 74 for failover, 125
moving, 244 pacakge configuration, 204
reconfiguring while the cluster is running, 272 parameters for cluster manager
reconfiguring with the cluster offline, 273 initial configuration, 42
remote switching, 82 PATH, 221
starting, 242 physical volume
package administration, 242 for cluster lock, 45, 46
solving problems, 291 physical volumes
package and cluster maintenance, 229 blank planning worksheet, 323
package configuration planning, 99
applying, 227 planning
distributing the configuration file, 227, 268 cluster configuration, 100
planning, 123 cluster lock and cluster expansion, 99
run and halt script timeout parameters, 222 cluster manager configuration, 105
step by step, 197 disk I/O information, 96
345
for expansion, 125 redundancy in network interfaces, 30
hardware configuration, 94 redundant Ethernet configuration, 31
high availability objectives, 93 redundant LANS
overview, 93 figure, 32
package configuration, 123 redundant networks
power, 97 for heartbeat, 24
quorum server, 99 relocatable IP address
SPU information, 94 defined, 71
volume groups and physical volumes, 99 relocatable IP addresses
worksheets, 97 in Serviceguard packages, 72
planning and documenting an HA cluster, 93 remote switching, 82
planning for cluster expansion, 94 removing nodes from operation in a running cluster, 241
planning worksheets removing packages on a running cluster, 228
blanks, 319 removing Serviceguard from a system, 279
point of failure replacing disks, 283
in networking, 31 resources
POLLING_TARGET disks, 35
defined, 121 responses
ports to cluster events, 278
dual and single aggregated, 76 to package and service failures, 92
power planning responses to failures, 89
power sources, 97 responses to hardware failures, 91
worksheet, 98, 322 restart
power supplies automatic restart of cluster, 44
blank planning worksheet, 321 following failure, 92
power supply restartable transactions, 302
and cluster lock, 36 restarting the cluster automatically, 242
UPS, 36 restoring client connections in applications, 309
primary LAN interfaces rotating standby
defined, 30 configuring with failover policies, 54
primary node, 25 setting package policies, 54
RUN_SCRIPT
Q parameter in package configuration, 221
RUN_SCRIPT_TIMEOUT (run script timeout)
QS_ADDR
parameter in package configuration, 222
parameter in cluster manager configuration, 108
running cluster
quorum
adding or removing packages, 228
and cluster reformation, 90
quorum server
and safety timer, 39 S
installing, 165 S800 series number
parameters in cluster manager configuration, 108 hardware planning, 94
planning, 99 safety timer
and node TOC, 39
R and syslog, 39
duration, 39
re-formation
sample disk configurations, 35
of cluster, 44
service administration, 242
reconfiguring a package
while the cluster is running, 272 service configuration
step by step, 197
reconfiguring a package with the cluster offline, 273
service failures
reconfiguring a running cluster, 255
responses, 92
reconfiguring the entire cluster, 255
service restarts, 92
reconfiguring the lock volume group, 255
SERVICE_CMD
recovery time, 100
in sample package control script, 266
redundancy
SERVICE_FAIL_FAST_ENABLED
in networking, 31
and node TOC, 92
of cluster components, 29
SERVICE_NAME
346 Index
in sample package control script, 266 defined, 120
SERVICE_RESTART successor_halt_timeout parameter, 208
in sample package control script, 266 supported disks in Serviceguard, 35
Serviceguard supported networks in Serviceguard, 30
install, 153 switching
introduction, 23 ARP messages after switching, 83
Serviceguard at a Glance, 23 local interface switching, 74
Serviceguard behavior remote system switching, 82
in LAN failure, 29 switching IP addresses, 52, 53, 83
in monitored resource failure, 29 system log, 283
in software failure, 29 system log file
Serviceguard commands troubleshooting, 289
to configure a package, 262 system message
Serviceguard Manager, 27 changing for clusters, 193
overview, 26 system multi-node package, 49, 198
Serviceguard software components
figure, 38 T
shared disks tasks in Serviceguard configuration
planning, 96 figure, 28
shutdown and startup testing
defined for applications, 300 cluster manager, 282
single point of failure package manager, 281
avoiding, 23 testing cluster operation, 281
single-node operation, 193, 279 time protocol (NTP)
size of cluster for clusters, 159
preparing for changes, 150 TOC
SMN package, 49 and package availability, 90
SNA applications, 308 and safety timer, 118
software failure and the safety timer, 39
Serviceguard behavior, 29 when a node fails, 89
software planning traffic type
LVM, 99 LAN hardware planning, 95
solving problems, 291 troubleshooting
SPU information approaches, 288
planning, 94 monitoring hardware, 282
standby LAN interfaces replacing disks, 283
defined, 30 reviewing control scripts, 290
starting a package, 242 reviewing package IP addresses, 288
startup and shutdown reviewing system log file, 289
defined for applications, 300 using cmquerycl and cmcheckconf, 291
startup of cluster troubleshooting your cluster, 281
manual, 43 typical cluster after failover
stationary IP addresses, 71 figure, 25
STATIONARY_IP typical cluster configuration
parameter in cluster configuration, 114 figure, 24
status
cmviewcl, 229
package IP address, 288 U
system log file, 289 uname(2), 307
stopping a cluster, 242 understanding network components in Serviceguard, 30
SUBNET UPS
in sample package control script, 266 in power planning, 97
parameter in package configuration, 221 power supply, 36
subnet use of the cluster lock, 45, 46
hardware planning, 95, 99
parameter in package configuration, 221 V
SUBNET (for IP Monitor) verifying cluster configuration, 189
347
verifying the cluster and package configuration, 227, 268
VG
in sample package control script, 266
vgcfgbackup
using to back up volume group configuration, 175
VGCHANGE
in package control script, 266
VGChange, 222
volume group
for cluster lock, 45, 46
planning, 99
volume group and physical volume planning, 99
W
WEIGHT_DEFAULT
defined, 121
WEIGHT_NAME
defined, 121
What is Serviceguard?, 23
worksheet
blanks, 319
cluster configuration, 123, 324
hardware configuration, 97, 319
package configuration, 325, 326
power supply configuration, 98, 321, 322
use in planning, 93
348 Index