Installation PDF
Installation PDF
PRIMECLUSTER
Linux
J2UL-2264-02ENZ0(00)
August 2018
Preface
This manual serves as your starting point for using PRIMECLUSTER. It explains the workflow of the series of operations from installation
to operation management of the PRIMECLUSTER system. Since the PRIMECLUSTER system comprises multiple features, there are
several other manuals besides this one for each feature. However, by reading this manual first, you will be able to perform the series of
operations because this manual refers readers to other manuals that contain feature-specific information that is necessary for the operations.
This manual also provides a functional overview of products that are supported by the PRIMECLUSTER system and describes operation
procedures.
This manual only covers the basic operation of PRIMECLUSTER. For operations using different hardware and software configurations,
see "Related Documentation."
The table below shows the operation flow from PRIMECLUSTER installation to the start of operation management and indicates the
reference location in this manual for each operation.
For detailed procedural explanations, refer to the reference manuals that are indicated in the target location of each part.
Target Readers
This manual is intended for all users who use PRIMECLUSTER 4.5 and perform cluster system installation and operation management.
It is also intended for programmers who develop applications that operate on PRIMECLUSTER.
-i-
Appendix A PRIMECLUSTER Products
Audience: Users who operate PRIMECLUSTER products on PRIMECLUSTER systems
Contents: This appendix describes the list of products supported by PRIMECLUSTER systems.
Appendix B Manual Pages
Audience: All users who use PRIMECLUSTER systems
Contents: This appendix describes the online manual pages that are used by the individual features of the PRIMECLUSTER system.
Appendix C Troubleshooting
Audience: All users who use PRIMECLUSTER systems
Contents: This appendix describes corrective actions for problems that may occur in the PRIMECLUSTER system. It also explains how
to collect data when requesting a problem investigation.
Appendix D Registering, Changing, and Deleting State Transition Procedure Resources for PRIMECLUSTER Compatibility
Audience: All users who use PRIMECLUSTER-compatible resources
Contents: This appendix describes procedures for registering, changing, and deleting procedure resources when the cluster applications
use procedure resources.
Appendix E Configuration Update Service for SA
Audience: All users who use PRIMECLUSTER systems
Contents: This appendix descries Configuration Update Service for SA.
Appendix F Setting up Cmdline Resource to Control Guest OS from Cluster Application of Host OS in KVM Environment
Audience: All users who control the guest OS from the cluster application of host OS in a KVM environment
Contents: This appendix describes how to set up the Cmdline resource to control the guest OS from the cluster application of host OS
in a KVM environment.
Appendix G Using the Migration Function in KVM Environment
Audience: All users who use the migration function in a KVM Environment
Contents: This appendix describes the procedure for using the migration function in a KVM Environment.
Appendix H Using PRIMECLUSTER in a VMware Environment
Audience: All users who use PRIMECLUSTER systems in a VMware environment
Contents: This appendix describes the installation procedures for using the PRIMECLUSTER system in a VMware environment.
Appendix I Using PRIMECLUSTER in RHOSP Environment
Audience: All users who use PRIMECLUSTER systems in RHOSP environment
Contents: This appendix describes the installation procedure for using the PRIMECLUSTER systems in RHOSP environment.
Appendix J Startup Scripts and Startup Daemons, and Port Numbers in PRIMECLUSTER
Audience: System administrators who build PRIMECLUSTER systems
Contents: This appendix provides explanations on scripts and daemons that are started by PRIMECLUSTER, and the port numbers
being used.
Appendix K Systemd Service and Startup Daemons, and Port Numbers in PRIMECLUSTER
Audience: System administrators who build PRIMECLUSTER systems
Contents: This appendix provides explanations on systemd services and daemons that are started by PRIMECLUSTER, and the port
numbers being used.
Appendix L Using Firewall
Audience: All users who use PRIMECLUSTER systems
Contents: This appendix describes the procedure when using Firewall in the PRIMECLUSTER system.
Appendix M Cloning the Cluster System Environment
Audience: System administrators who clone PRIMECLUSTER systems
Contents: This appendix describes the procedures for cloning the PRIMECLUSTER system.
- ii -
Appendix N Changes in Each Version
Audience: All users who use PRIMECLUSTER 4.0A20, 4.1A20, 4.1A30, 4.2A00, 4.2A30, 4.3A00, 4.3A10, 4.3A20, 4.3A30, 4.3A40,
4.4A00, or 4.5A00.
Contents: This appendix describes the changes made to the specifications of PRIMECLUSTER 4.5A10.
Appendix O Release Information
Audience: All users who use PRIMECLUSTER systems
Contents: This appendix lists the main changes in this manual.
Glossary
Audience: All users who use PRIMECLUSTER systems
Contents: This section explains terms used to describe the PRIMECLUSTER system.
Related Documentation
Refer to the following manuals as necessary when setting up the cluster:
Note
The PRIMECLUSTER documentation includes the following documentation in addition to those listed above:
- iii -
Manual Series
Manual Printing
If you want to print a manual, use the PDF file found on the DVD for the PRIMECLUSTER product. The correspondences between the PDF
file names and manuals are described in the Software Release Guide for PRIMECLUSTER that comes with the product.
Adobe Reader is required to read and print this PDF file. To get Adobe Reader, see Adobe Systems Incorporated's website.
Online Manuals
To allow users to view the online manuals, use the Cluster management server to register each user name to one of the user groups (wvroot,
clroot, cladmin, or clmon).
For information on user group registration procedures and user group definitions, see "4.3.1 Assigning Users to Manage the Cluster."
- iv -
Conventions
Notation
Prompts
Command line examples that require system administrator (or root) rights to execute are preceded by the system administrator
prompt, the hash sign (#). Entries that do not require system administrator rights are preceded by a dollar sign ($).
Manual page section numbers
References to the Linux(R) operating system commands are followed by their manual page section numbers in parentheses - for
example, cp(1)
The keyboard
Keystrokes that represent nonprintable characters are displayed as key icons such as [Enter] or [F1]. For example, [Enter] means
press the key labeled Enter; [Ctrl-b] means hold down the key labeled Ctrl or Control and then press the [B] key.
Typefaces
The following typefaces highlight specific elements in this manual.
Typeface Usage
Constant Width Computer output and program listings; commands, file names, manual page names
and other literal programming elements in the main body of text.
Italic Variables that you must replace with an actual value.
Bold Items in a command line that you must type exactly as shown.
Example 1
Several entries from an /etc/passwd file are shown below:
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/bin/bash
daemon:x:2:2:daemon:/sbin:/bin/bash
lp:x:4:7:lp daemon:/var/spool/lpd:/bin/bash
Example 2
To use the cat(1) command to display the contents of a file, enter the following command line:
$ cat file
Notation symbols
Material of particular interest is preceded by the following symbols in this manual:
Point
Contains important information about the subject at hand.
Note
Describes an item to be noted.
Example
Describes operation using an example.
-v-
Information
Describes reference information.
See
Provides the names of manuals to be referenced.
Abbreviations
Export Controls
Exportation/release of this document may require necessary procedures in accordance with the regulations of your resident country and/or
US export control laws.
Trademarks
Red Hat is a trademark of Red Hat, Inc. in the U.S. and other countries.
Linux is a registered trademark of Linus Torvalds.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Microsoft, Windows, and Internet Explorer are registered trademarks of Microsoft Corporation in the United States and other countries.
Dell EMC, PowerPath, and NetWorker are registered trademarks or trademarks of EMC Corporation in the United States and other
countries.
VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions.
Other product names are product names, trademarks, or registered trademarks of these companies.
Requests
- No part of this documentation may be reproduced or copied without permission of FUJITSU LIMITED.
- The contents of this documentation may be revised without prior notice.
- vi -
Date of publication and edition
Copyright notice
All Rights Reserved, Copyright (C) FUJITSU LIMITED 2017-2018.
- vii -
Contents
Part 1 Planning......................................................................................................................................................................... 1
Part 2 Installation....................................................................................................................................................................45
- viii -
3.2.2.4 NTP setup (host OS and guest OS)......................................................................................................................................74
3.2.2.5 Installing PRIMECLUSTER on guest OSes....................................................................................................................... 74
3.2.2.6 Checking and setting the kernel parameters........................................................................................................................ 74
3.2.2.7 Installing and setting up applications...................................................................................................................................74
3.2.3 When building a cluster system between guest OSes on multiple host OSes using Host OS failover function........................ 75
3.2.3.1 Installation and Setup of Software (Host OS)..................................................................................................................... 75
3.2.3.1.1 Network setup............................................................................................................................................................... 75
3.2.3.1.2 NTP setup..................................................................................................................................................................... 75
3.2.3.1.3 Host OS setup (before installing the operating system on guest OS)........................................................................... 76
3.2.3.1.4 Host OS setup (after installing the operating system on guest OS)..............................................................................76
3.2.3.1.5 Installing PRIMECLUSTER on the host OS................................................................................................................81
3.2.3.1.6 Setting up the cluster high-speed failover function...................................................................................................... 81
3.2.3.1.7 Checking and setting the kernel parameters................................................................................................................. 82
3.2.3.2 Preparation prior to building a cluster (Host OS)................................................................................................................ 82
3.2.3.3 Building a cluster (Host OS)................................................................................................................................................82
3.2.3.4 Software installation and setup (Guest OS)......................................................................................................................... 82
3.2.3.4.1 Guest OS setup..............................................................................................................................................................82
3.2.3.4.2 NTP setup (Guest OS).................................................................................................................................................. 84
3.2.3.4.3 Installing PRIMECLUSTER on guest OSes................................................................................................................ 84
3.2.3.4.4 Checking and setting the kernel parameters................................................................................................................. 84
3.2.3.4.5 Installing and setting up applications............................................................................................................................84
3.2.3.5 Preparation prior to building a cluster (Guest OS).............................................................................................................. 84
3.2.3.6 Building a Cluster (Guest OS)............................................................................................................................................. 84
3.2.3.7 Building cluster applications (Guest OS)............................................................................................................................ 85
3.3 PRIMECLUSTER Installation.......................................................................................................................................................... 85
3.4 Installation and Environment Setup of Applications.........................................................................................................................87
- ix -
5.1.2.3.1 Checking the Shutdown Agent Information............................................................................................................... 114
5.1.2.3.2 Setting up the Shutdown Daemon.............................................................................................................................. 115
5.1.2.3.3 Setting up IPMI Shutdown Agent...............................................................................................................................116
5.1.2.3.4 Setting up Blade Shutdown Agent..............................................................................................................................118
5.1.2.3.5 Setting up kdump Shutdown Agent............................................................................................................................ 120
5.1.2.3.6 Starting up the Shutdown Facility.............................................................................................................................. 121
5.1.2.3.7 Test for Forced Shutdown of Cluster Nodes.............................................................................................................. 122
5.1.2.4 Setup Procedure for Shutdown Facility in PRIMEQUEST 2000 Series........................................................................... 122
5.1.2.4.1 Checking the Shutdown Agent Information............................................................................................................... 122
5.1.2.4.2 Setting up the MMB Shutdown Agent....................................................................................................................... 123
5.1.2.4.3 Setting up the Shutdown Daemon.............................................................................................................................. 124
5.1.2.4.4 Starting the MMB Asynchronous Monitoring Daemon............................................................................................. 125
5.1.2.4.5 Setting I/O Completion Wait Time.............................................................................................................................125
5.1.2.4.6 Starting the Shutdown Facility................................................................................................................................... 125
5.1.2.4.7 Test for Forced Shutdown of Cluster Nodes.............................................................................................................. 128
5.1.2.5 Setup Procedure for Shutdown Facility in PRIMEQUEST 3000 Series........................................................................... 128
5.1.2.5.1 Checking the Shutdown Agent Information............................................................................................................... 128
5.1.2.5.2 Setting up the iRMC Shutdown Agent....................................................................................................................... 129
5.1.2.5.3 Setting up the Shutdown Daemon.............................................................................................................................. 131
5.1.2.5.4 Starting the iRMC Asynchronous Monitoring Daemon............................................................................................. 132
5.1.2.5.5 Setting I/O Completion Wait Time.............................................................................................................................132
5.1.2.5.6 Starting the Shutdown Facility................................................................................................................................... 133
5.1.2.5.7 Test for Forced Shutdown of Cluster Nodes.............................................................................................................. 133
5.1.2.6 Setup Procedure for Shutdown Facility in Virtual Machine Environment........................................................................134
5.1.2.6.1 Checking the Shutdown Agent Information............................................................................................................... 134
5.1.2.6.2 Setting up libvirt Shutdown Agent............................................................................................................................. 134
5.1.2.6.3 Setting Up vmchkhost Shutdown Agent.....................................................................................................................136
5.1.2.6.4 Setting up the Shutdown Daemon.............................................................................................................................. 137
5.1.2.6.5 Starting the Shutdown Facility................................................................................................................................... 138
5.1.2.6.6 Setting up the Host OS Failover Function to the Host OS (PRIMEQUEST only).................................................... 139
5.1.2.6.7 Test for Forced Shutdown of Cluster Nodes.............................................................................................................. 141
5.1.3 Initial Setup of the Cluster Resource Management Facility..................................................................................................... 141
5.1.3.1 Initial Configuration Setup................................................................................................................................................ 142
5.1.3.2 Registering Hardware Devices.......................................................................................................................................... 144
5.2 Setting up Fault Resource Identification and Operator Intervention Request................................................................................. 148
-x-
6.7.3.4 Setting Up Gds Resources................................................................................................................................................. 208
6.7.3.5 Setting Up Gls Resources.................................................................................................................................................. 209
6.7.3.6 Setting Up Takeover Network Resources..........................................................................................................................210
6.7.3.7 Setting Up Procedure Resources........................................................................................................................................213
6.7.4 Generate and Activate...............................................................................................................................................................215
6.7.5 Registering the Cluster Service of a PRIMECLUSTER-compatible product.......................................................................... 216
6.7.6 Attributes.................................................................................................................................................................................. 216
6.7.7 Exclusive Relationships Between Cluster Applications...........................................................................................................216
6.8 Setting Up the RMS Environment................................................................................................................................................... 224
6.9 Checking the Cluster Environment..................................................................................................................................................224
6.10 Setting Contents and Notes on Cluster Application...................................................................................................................... 224
6.10.1 Setting Contents of a Cluster Application.............................................................................................................................. 224
6.10.2 Notes on Configuration...........................................................................................................................................................233
6.11 Notes When Setting Cmdline Resources....................................................................................................................................... 234
6.11.1 Scripts and State Transition.................................................................................................................................................... 236
6.11.1.1 Scripts to be Executed in Each Resource State................................................................................................................238
6.11.1.2 Script States When Online ..............................................................................................................................................238
6.11.1.3 Script States When Standby ............................................................................................................................................239
6.11.1.4 Script States When Offline ............................................................................................................................................. 240
6.11.1.5 Flow of the Cmdline Resource Operation....................................................................................................................... 240
6.11.1.6 Operation for Each Exit Code of the Check Script..........................................................................................................244
6.11.2 Notes When Creating Scripts..................................................................................................................................................247
6.11.2.1 start and stop Scripts........................................................................................................................................................ 248
6.11.2.1.1 Examples of start and stop Scripts............................................................................................................................248
6.11.2.1.2 Environment Variables can be referred to within the Start and Stop Scripts........................................................... 250
6.11.2.1.3 Exit Code of Start and Stop Scripts.......................................................................................................................... 251
6.11.2.1.4 Notes When Setting the NULLDETECTOR Flag....................................................................................................252
6.11.2.1.5 Timeout of Scripts.................................................................................................................................................... 252
6.11.2.2 Check Script.....................................................................................................................................................................252
6.11.2.2.1 Example of the Check Script.................................................................................................................................... 252
6.11.2.2.2 Environment Variables that can be referred to within the Check Scripts.................................................................254
6.11.2.2.3 Check Script Exit Code.............................................................................................................................................254
6.11.2.2.4 Timeout of Check Script...........................................................................................................................................254
6.11.3 Notes on Scripts...................................................................................................................................................................... 255
6.12 Notes When Setting Fsystem Resource......................................................................................................................................... 256
6.12.1 Monitoring Fsystem ...............................................................................................................................................................256
6.12.2 Fsystem Resource Attribute....................................................................................................................................................256
6.12.3 File System on the Shared Disk Device..................................................................................................................................257
6.12.3.1 Corrective Actions for the Forced File System Check.................................................................................................... 257
6.12.3.2 Corrective Actions for delayed allocation....................................................................................................................... 258
6.12.4 Other Notes............................................................................................................................................................................. 258
6.12.5 Maintaining File Systems Controlled by the Fsystem Resource............................................................................................ 259
- xi -
7.2 Operating the PRIMECLUSTER System........................................................................................................................................273
7.2.1 RMS Operation......................................................................................................................................................................... 273
7.2.1.1 Starting RMS..................................................................................................................................................................... 273
7.2.1.2 Stopping RMS....................................................................................................................................................................273
7.2.2 Cluster Application Operations................................................................................................................................................ 274
7.2.2.1 Starting a Cluster Application............................................................................................................................................274
7.2.2.2 Stopping a Cluster Application..........................................................................................................................................274
7.2.2.3 Switching a Cluster Application........................................................................................................................................ 274
7.2.2.4 Bringing Faulted Cluster Application to available state....................................................................................................275
7.2.2.5 Clearing the Wait State of a Node..................................................................................................................................... 275
7.2.2.6 Entering maintenance mode for Cluster Application........................................................................................................ 275
7.2.3 Resource Operation...................................................................................................................................................................276
7.2.3.1 Starting Resources............................................................................................................................................................. 277
7.2.3.2 Stopping Resources............................................................................................................................................................277
7.2.3.3 Clearing Fault Traces of Resources................................................................................................................................... 277
7.3 Monitoring the PRIMECLUSTER System......................................................................................................................................278
7.3.1 Monitoring the State of a Node.................................................................................................................................................278
7.3.2 Monitoring the State of a Cluster Application..........................................................................................................................279
7.3.3 Concurrent Viewing of Node and Cluster Application States.................................................................................................. 280
7.3.4 Viewing Logs Created by the PRIMECLUSTER System........................................................................................................281
7.3.4.1 Viewing switchlogs............................................................................................................................................................281
7.3.4.2 Viewing application logs................................................................................................................................................... 281
7.3.5 Viewing Detailed Resource Information.................................................................................................................................. 282
7.3.6 Displaying environment variables............................................................................................................................................ 283
7.3.7 Monitoring Cluster Control Messages......................................................................................................................................284
7.4 Corrective Actions for Resource Failures........................................................................................................................................284
7.4.1 Corrective Action in the event of a resource failure................................................................................................................. 284
7.4.1.1 Failure Detection and Cause Identification if a Failure Occurs.........................................................................................284
7.4.1.2 Corrective Action for Failed Resources.............................................................................................................................286
7.4.1.3 Recovery of Failed Cluster Interconnect........................................................................................................................... 287
7.4.2 Corrective Action in the event of the LEFTCLUSTER state when the virtual machine function is used............................... 288
7.4.2.1 When the host OS becomes the panic state....................................................................................................................... 288
7.4.2.2 When the host OS hangs up............................................................................................................................................... 288
7.5 Notes on Operation ......................................................................................................................................................................... 288
7.5.1 Notes on Switching a Cluster Application Forcibly ................................................................................................................ 290
7.6 CF and RMS Heartbeats.................................................................................................................................................................. 292
7.7 cron Processing................................................................................................................................................................................ 293
- xii -
9.1.2 Changing the SF Node Weight................................................................................................................................................. 313
9.2 Changing the Network Environment............................................................................................................................................... 313
9.2.1 Changing the IP Address of the Public LAN............................................................................................................................313
9.2.2 Changing the IP Address of the Administrative LAN.............................................................................................................. 315
9.2.3 Changing the IP Address of CF over IP................................................................................................................................... 316
9.2.4 Changing a CIP Address...........................................................................................................................................................317
9.2.5 Changing the Subnet Mask of CIP........................................................................................................................................... 318
9.2.6 Changing the MTU Value of a Network Interface Used for Cluster Interconnects................................................................. 318
9.2.7 Changing the IP Address Used for the Mirroring among Servers............................................................................................ 319
9.3 Changing Option Hardware Settings............................................................................................................................................... 319
9.3.1 Changing MMB Settings.......................................................................................................................................................... 319
9.3.1.1 Changing the MMB IP Address.........................................................................................................................................319
9.3.1.1.1 PRIMEQUEST 2000 Series........................................................................................................................................319
9.3.1.1.2 PRIMEQUEST 3000 Series (Except B Model)..........................................................................................................320
9.3.1.2 Changing the User Name and Password for Controlling the MMB with RMCP..............................................................320
9.3.1.2.1 PRIMEQUEST 2000 Series........................................................................................................................................320
9.3.1.2.2 PRIMEQUEST 3000 Series (Except B Model)..........................................................................................................321
9.3.2 Changing iRMC Settings.......................................................................................................................................................... 322
9.3.2.1 Changing iRMC IP Address.............................................................................................................................................. 322
9.3.2.1.1 Using PRIMERGY RX/TX series and BX series with ServerView Resource Orchestrator Virtual Edition............ 322
9.3.2.1.2 PRIMEQUEST 3000 Series........................................................................................................................................323
9.3.2.2 Changing the User Name and Password for iRMC........................................................................................................... 323
9.3.2.2.1 Using PRIMERGY RX/TX series and BX series with ServerView Resource Orchestrator Virtual Edition............ 323
9.3.2.2.2 PRIMEQUEST 3000 Series........................................................................................................................................324
9.3.3 Changing Blade Settings...........................................................................................................................................................325
9.3.3.1 Changing the IP Address of the Management Blade......................................................................................................... 325
9.3.3.2 Changing the Slot Number of Server Blades.....................................................................................................................326
9.4 Changing Virtual Machine Settings.................................................................................................................................................326
9.4.1 Changing Host OS Settings (KVM environment).................................................................................................................... 327
9.4.1.1 Changing the IP address of the Host OS............................................................................................................................327
9.4.1.2 Changing the Password of the Host OS Account (Shutdown Facility)............................................................................. 327
9.4.1.3 Changing the Settings in /etc/sysconfig/libvirt-guests.......................................................................................................328
- xiii -
Chapter 12 Maintenance of the PRIMECLUSTER System.................................................................................................. 364
12.1 Maintenance Types........................................................................................................................................................................ 364
12.2 Maintenance Flow..........................................................................................................................................................................364
12.2.1 Detaching Resources from Operation.....................................................................................................................................364
12.2.2 Executing Standby Restoration for an Operating Job.............................................................................................................365
12.3 Software Maintenance................................................................................................................................................................... 365
12.3.1 Notes on Applying Corrections to the PRIMECLUSTER System.........................................................................................365
12.3.2 Overview of the Correction Application Procedure............................................................................................................... 365
12.3.2.1 Procedure for Applying Corrections by Stopping an Entire System............................................................................... 366
12.3.2.2 Procedure for Applying Correction by Rolling Update...................................................................................................367
Appendix D Registering, Changing, and Deleting State Transition Procedure Resources for PRIMECLUSTER Compatibility
..................................................................................................................................................................................386
D.1 Registering a Procedure Resource.................................................................................................................................................. 386
D.2 Changing a Procedure Resource..................................................................................................................................................... 387
D.2.1 Changing a state transition procedure......................................................................................................................................387
D.2.2 Changing the Startup Priority of a State Transition Procedure................................................................................................387
D.2.3 Changing registration information of a procedure resource.................................................................................................... 388
D.3 Deleting a Procedure Resource....................................................................................................................................................... 389
- xiv -
E.3.1 Startup Configuration for the IPMI Service............................................................................................................................. 395
E.3.2 Activating Configuration Update Service for SA.................................................................................................................... 396
E.3.2.1 Startup Configuration for Update Service for SA............................................................................................................. 396
E.3.2.2 Checking the Configuration.............................................................................................................................................. 396
E.3.2.3 Checking the BMC or iRMC IP Address and the Configuration Information of the Shutdown Agent........................... 398
E.4 Operation Check..............................................................................................................................................................................399
E.4.1 Operation Check by Restarting the System..............................................................................................................................399
E.5 Cancellation.....................................................................................................................................................................................400
E.5.1 Deactivating Configuration Update Service for SA.................................................................................................................400
E.5.2 Restoring the Startup Configuration of the IPMI Service........................................................................................................ 400
E.6 Restoration.......................................................................................................................................................................................400
E.6.1 Restoration Method When Correct Information is not Distributed to All the Nodes.............................................................. 400
E.7 sfsacfgupdate................................................................................................................................................................................... 402
E.8 Output Message (syslog)................................................................................................................................................................. 403
Appendix F Setting up Cmdline Resource to Control Guest OS from Cluster Application of Host OS in KVM Environment406
F.1 Controlling and monitoring a guest OS by a cluster application on a host OS............................................................................... 406
- xv -
H.2.3.3 Setting Up the Shutdown Facility (when using I/O fencing function)............................................................................. 434
H.2.3.4 Initial Setup of the Cluster Resource Management Facility............................................................................................. 436
H.2.3.5 Setting Up Fault Resource Identification and Operator Intervention Request................................................................. 437
H.2.4 Building Cluster Applications..................................................................................................................................................437
H.2.4.1 Setting Up I/O Fencing Function......................................................................................................................................437
H.3 Operations....................................................................................................................................................................................... 441
H.3.1 Actions When Virtual Machine is Migrated by VMware vSphere HA...................................................................................441
H.4 Changing the Configuration............................................................................................................................................................443
H.5 Maintenance.................................................................................................................................................................................... 443
Appendix J Startup Scripts and Startup Daemons, and Port Numbers in PRIMECLUSTER............................................... 466
J.1 Explanation Formats........................................................................................................................................................................ 466
J.2 Startup Script Lists........................................................................................................................................................................... 466
J.3 Necessary Daemons for PRIMECLUSTER to Operate................................................................................................................... 476
Appendix K Systemd Services and Startup Daemons, and Port Numbers in PRIMECLUSTER......................................... 477
K.1 Explanation Formats....................................................................................................................................................................... 477
K.2 systemd Service Lists......................................................................................................................................................................478
K.3 Necessary Services for PRIMECLUSTER to Operate................................................................................................................... 497
- xvi -
M.1.1 Backing up the GFS Configuration Information.....................................................................................................................502
M.1.2 Backing up the GDS Configuration Information.................................................................................................................... 503
M.1.3 Canceling System Disk Mirroring...........................................................................................................................................503
M.2 Copying System Image Using the Cloning Function.....................................................................................................................504
M.2.1 Copying Disk Data.................................................................................................................................................................. 504
M.2.2 Setting up System Disk Mirroring.......................................................................................................................................... 504
M.3 Changing Cluster System Settings................................................................................................................................................. 505
M.3.1 Deleting the Setup Information for System Disk Mirroring................................................................................................... 505
M.3.2 Setup in Single-User Mode..................................................................................................................................................... 505
M.3.3 Changing the Settings in Multi-User Mode ........................................................................................................................... 512
M.3.4 Restoring the GDS Configuration Information.......................................................................................................................517
M.3.5 Restoring the GFS Configuration Information........................................................................................................................518
M.3.6 Setting Up System Disk Mirroring..........................................................................................................................................520
M.3.7 Changing the Settings of Cluster Application Information.....................................................................................................520
M.3.7.1 When Using GLS............................................................................................................................................................. 520
M.3.7.2 When Using the Takeover Network.................................................................................................................................522
M.3.7.3 When Using neither GLS nor the Takeover Network......................................................................................................524
- xvii -
N.2.7 hvdump command....................................................................................................................................................................548
N.2.8 Posting Notification of a Resource Failure or Recovery......................................................................................................... 548
N.2.9 Operator Intervention Request................................................................................................................................................. 549
N.2.10 Node state ..............................................................................................................................................................................549
N.2.11 Operation Procedures and Displayed Items for Cluster Application Setup and Modification.............................................. 550
N.2.12 Setting Up Fsystem Resources...............................................................................................................................................555
N.2.13 Client Environment for Web-Based Admin View.................................................................................................................555
N.2.14 Changes of the Behavior of CF Startup................................................................................................................................. 556
N.2.15 HV_CONNECT_TIMEOUT................................................................................................................................................. 556
N.2.16 Changes of the ports used by RMS........................................................................................................................................556
N.2.17 Changes of the port number used by the shutdown facility...................................................................................................557
N.2.18 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 557
N.2.19 Display of the resource fault trace......................................................................................................................................... 557
N.2.20 Change of /etc/cip.cf file........................................................................................................................................................558
N.2.21 Changes in CF over IP setting window of CF Wizard...........................................................................................................558
N.2.22 Changes of the RMS message................................................................................................................................................558
N.2.23 Changes of the importance of the message in the RMS wizard............................................................................................ 559
N.2.24 Changes of RMS console message........................................................................................................................................ 559
N.2.25 Changes of the response message for the operator intervention request............................................................................... 560
N.2.25.1 Message: 1421................................................................................................................................................................ 560
N.2.25.2 Message: 1423................................................................................................................................................................ 560
N.3 Changes in PRIMECLUSTER 4.5A10 from 4.1A30..................................................................................................................... 561
N.3.1 ciptool command......................................................................................................................................................................561
N.3.2 sdtool command....................................................................................................................................................................... 562
N.3.3 hvshut command...................................................................................................................................................................... 562
N.3.4 hvswitch command.................................................................................................................................................................. 563
N.3.5 hvdump command....................................................................................................................................................................563
N.3.6 Posting Notification of a Resource Failure or Recovery......................................................................................................... 563
N.3.7 Operator Intervention Request................................................................................................................................................. 564
N.3.8 Operation Procedures and Displayed Items for Cluster Application Setup and Modification................................................ 565
N.3.9 Setting Up Fsystem Resources.................................................................................................................................................570
N.3.10 Client Environment for Web-Based Admin View.................................................................................................................571
N.3.11 Changes of the Behavior of CF Startup................................................................................................................................. 571
N.3.12 HV_CONNECT_TIMEOUT................................................................................................................................................. 571
N.3.13 Changes of the ports used by RMS........................................................................................................................................572
N.3.14 Changes of the port number used by the shutdown facility...................................................................................................572
N.3.15 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 572
N.3.16 Display of the resource fault trace......................................................................................................................................... 573
N.3.17 Change of /etc/cip.cf file........................................................................................................................................................573
N.3.18 Changes in CF over IP setting window of CF Wizard...........................................................................................................573
N.3.19 Changes of the RMS message................................................................................................................................................574
N.3.20 Changes of the importance of the message in the RMS wizard............................................................................................ 574
N.3.21 Changes of RMS console message........................................................................................................................................ 574
N.3.22 Changes of the response message for the operator intervention request............................................................................... 575
N.3.22.1 Message: 1421................................................................................................................................................................ 575
N.3.22.2 Message: 1423................................................................................................................................................................ 575
N.4 Changes in PRIMECLUSTER 4.5A10 from 4.1A40..................................................................................................................... 576
N.4.1 sdtool command....................................................................................................................................................................... 577
N.4.2 hvshut command...................................................................................................................................................................... 577
N.4.3 hvswitch command.................................................................................................................................................................. 578
N.4.4 hvdump command....................................................................................................................................................................578
N.4.5 Posting Notification of a Resource Failure or Recovery......................................................................................................... 578
N.4.6 Operator Intervention Request................................................................................................................................................. 579
N.4.7 Setting Up Fsystem Resources.................................................................................................................................................580
N.4.8 Client Environment for Web-Based Admin View...................................................................................................................580
N.4.9 Changes of the Behavior of CF Startup................................................................................................................................... 581
N.4.10 HV_CONNECT_TIMEOUT................................................................................................................................................. 581
- xviii -
N.4.11 Changes of the ports used by RMS........................................................................................................................................581
N.4.12 Changes of the port number used by the shutdown facility...................................................................................................582
N.4.13 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 582
N.4.14 Display of the resource fault trace......................................................................................................................................... 582
N.4.15 Change of /etc/cip.cf file........................................................................................................................................................583
N.4.16 Changes in CF over IP setting window of CF Wizard...........................................................................................................583
N.4.17 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 583
N.4.18 Changes of the RMS message................................................................................................................................................584
N.4.19 Changes of the importance of the message in the RMS wizard............................................................................................ 584
N.4.20 Changes of RMS console message........................................................................................................................................ 584
N.4.21 Changes of the response message for the operator intervention request............................................................................... 585
N.4.21.1 Message: 1421................................................................................................................................................................ 585
N.4.21.2 Message: 1423................................................................................................................................................................ 585
N.5 Changes in PRIMECLUSTER 4.5A10 from 4.2A00..................................................................................................................... 586
N.5.1 sdtool command....................................................................................................................................................................... 587
N.5.2 hvshut command...................................................................................................................................................................... 587
N.5.3 hvswitch command.................................................................................................................................................................. 588
N.5.4 hvdump command....................................................................................................................................................................588
N.5.5 Posting Notification of a Resource Failure or Recovery......................................................................................................... 588
N.5.6 Operator Intervention Request................................................................................................................................................. 589
N.5.7 Setting Up Fsystem Resources.................................................................................................................................................590
N.5.8 Client Environment for Web-Based Admin View...................................................................................................................590
N.5.9 Changes of the Behavior of CF Startup................................................................................................................................... 591
N.5.10 HV_CONNECT_TIMEOUT................................................................................................................................................. 591
N.5.11 Changes of the ports used by RMS........................................................................................................................................591
N.5.12 Configuring the IPMI Shutdown Agent.................................................................................................................................592
N.5.13 Changes of the port number used by the shutdown facility...................................................................................................592
N.5.14 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 593
N.5.15 Display of the resource fault trace......................................................................................................................................... 593
N.5.16 Change of /etc/cip.cf file........................................................................................................................................................593
N.5.17 Changes in CF over IP setting window of CF Wizard...........................................................................................................594
N.5.18 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 594
N.5.19 Changes of the RMS message................................................................................................................................................594
N.5.20 Changes of the importance of the message in the RMS wizard............................................................................................ 595
N.5.21 Changes of RMS console message........................................................................................................................................ 595
N.5.22 Changes of the response message for the operator intervention request............................................................................... 595
N.5.22.1 Message: 1421................................................................................................................................................................ 595
N.5.22.2 Message: 1423................................................................................................................................................................ 596
N.6 Changes in PRIMECLUSTER 4.5A10 from 4.2A30..................................................................................................................... 597
N.6.1 sdtool command....................................................................................................................................................................... 597
N.6.2 hvshut command...................................................................................................................................................................... 598
N.6.3 hvswitch command.................................................................................................................................................................. 598
N.6.4 hvdump command....................................................................................................................................................................599
N.6.5 Posting Notification of a Resource Failure or Recovery......................................................................................................... 599
N.6.6 Operator Intervention Request................................................................................................................................................. 600
N.6.7 Setting Up Fsystem Resources.................................................................................................................................................600
N.6.8 Client Environment for Web-Based Admin View...................................................................................................................601
N.6.9 Changes of the Behavior of CF Startup................................................................................................................................... 601
N.6.10 HV_CONNECT_TIMEOUT................................................................................................................................................. 601
N.6.11 Changes of the ports used by RMS........................................................................................................................................602
N.6.12 Configuring the IPMI Shutdown Agent.................................................................................................................................602
N.6.13 Changes of the port number used by the shutdown facility...................................................................................................602
N.6.14 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 603
N.6.15 Display of the resource fault trace......................................................................................................................................... 603
N.6.16 Change of /etc/cip.cf file........................................................................................................................................................603
N.6.17 Changes in CF over IP setting window of CF Wizard...........................................................................................................604
N.6.18 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 604
- xix -
N.6.19 Changes of the RMS message................................................................................................................................................604
N.6.20 Changes of the importance of the message in the RMS wizard............................................................................................ 605
N.6.21 Changes of RMS console message........................................................................................................................................ 605
N.6.22 Changes of the response message for the operator intervention request............................................................................... 606
N.6.22.1 Message: 1421................................................................................................................................................................ 606
N.6.22.2 Message: 1423................................................................................................................................................................ 606
N.7 Changes in PRIMECLUSTER 4.5A10 from 4.3A00..................................................................................................................... 607
N.7.1 sdtool command....................................................................................................................................................................... 608
N.7.2 hvshut command...................................................................................................................................................................... 608
N.7.3 hvswitch command.................................................................................................................................................................. 609
N.7.4 hvdump command....................................................................................................................................................................609
N.7.5 Posting Notification of a Resource Failure or Recovery......................................................................................................... 609
N.7.6 Operator Intervention Request................................................................................................................................................. 610
N.7.7 Setting Up Fsystem Resources.................................................................................................................................................611
N.7.8 Client Environment for Web-Based Admin View...................................................................................................................611
N.7.9 Changes of the Behavior of CF Startup................................................................................................................................... 611
N.7.10 HV_CONNECT_TIMEOUT................................................................................................................................................. 612
N.7.11 Changes of the ports used by RMS........................................................................................................................................612
N.7.12 Configuring the IPMI Shutdown Agent.................................................................................................................................612
N.7.13 Changes of the port number used by the shutdown facility...................................................................................................613
N.7.14 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 613
N.7.15 Display of the resource fault trace......................................................................................................................................... 613
N.7.16 Change of /etc/cip.cf file........................................................................................................................................................614
N.7.17 Changes in CF over IP setting window of CF Wizard...........................................................................................................614
N.7.18 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 614
N.7.19 Changes of the RMS message................................................................................................................................................615
N.7.20 Changes of the importance of the message in the RMS wizard............................................................................................ 615
N.7.21 Changes of RMS console message........................................................................................................................................ 615
N.7.22 Changes of the response message for the operator intervention request............................................................................... 616
N.7.22.1 Message: 1421................................................................................................................................................................ 616
N.7.22.2 Message: 1423................................................................................................................................................................ 617
N.8 Changes in PRIMECLUSTER 4.5A10 from 4.3A10..................................................................................................................... 617
N.8.1 sdtool command....................................................................................................................................................................... 618
N.8.2 hvshut command...................................................................................................................................................................... 618
N.8.3 hvswitch command.................................................................................................................................................................. 619
N.8.4 hvdump command....................................................................................................................................................................619
N.8.5 Posting Notification of a Resource Failure or Recovery......................................................................................................... 619
N.8.6 Operator Intervention Request................................................................................................................................................. 620
N.8.7 Setting Up Fsystem Resources.................................................................................................................................................621
N.8.8 Changes of the ports used by RMS..........................................................................................................................................621
N.8.9 Configuring the IPMI Shutdown Agent...................................................................................................................................621
N.8.10 Changes of the port number used by the shutdown facility...................................................................................................622
N.8.11 Setting up the Host OS failover function used in the PRIMEQUEST KVM environment................................................... 622
N.8.12 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 623
N.8.13 Display of the resource fault trace......................................................................................................................................... 623
N.8.14 Change of /etc/cip.cf file........................................................................................................................................................623
N.8.15 Changes in CF over IP setting window of CF Wizard...........................................................................................................624
N.8.16 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 624
N.8.17 Changes of RMS console message........................................................................................................................................ 624
N.8.18 Changes of the response message for the operator intervention request............................................................................... 625
N.8.18.1 Message: 1421................................................................................................................................................................ 625
N.8.18.2 Message: 1423................................................................................................................................................................ 625
N.9 Changes in PRIMECLUSTER 4.5A10 from 4.3A20..................................................................................................................... 626
N.9.1 hvshut command...................................................................................................................................................................... 627
N.9.2 hvswitch command.................................................................................................................................................................. 627
N.9.3 hvdump command....................................................................................................................................................................628
N.9.4 Posting Notification of a Resource Failure or Recovery......................................................................................................... 628
- xx -
N.9.5 Operator intervention request.................................................................................................................................................. 628
N.9.6 Setting Up Fsystem Resources.................................................................................................................................................629
N.9.7 Configuring the IPMI Shutdown Agent...................................................................................................................................629
N.9.8 Changes of the port number used by the shutdown facility.....................................................................................................630
N.9.9 Setting up the Host OS failover function used in the PRIMEQUEST KVM environment..................................................... 630
N.9.10 Changes of the target node to forcibly shut down when a heartbeat failure occurs.............................................................. 631
N.9.11 Display of the resource fault trace......................................................................................................................................... 631
N.9.12 Change of /etc/cip.cf file........................................................................................................................................................631
N.9.13 Changes in CF over IP setting window of CF Wizard...........................................................................................................632
N.9.14 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 632
N.9.15 Changes of RMS console message........................................................................................................................................ 632
N.9.16 Changes of the response message for the operator intervention request............................................................................... 633
N.9.16.1 Message: 1421................................................................................................................................................................ 633
N.9.16.2 Message: 1423................................................................................................................................................................ 634
N.10 Changes in PRIMECLUSTER 4.5A10 from 4.3A30................................................................................................................... 634
N.10.1 hvdump command..................................................................................................................................................................634
N.10.2 Posting Notification of a Resource Failure or Recovery....................................................................................................... 635
N.10.3 Operator intervention request................................................................................................................................................ 635
N.10.4 Setting Up Fsystem Resources...............................................................................................................................................636
N.10.5 Setting up the Host OS failover function when using it in KVM environment.....................................................................636
N.10.6 Display of the resource fault trace......................................................................................................................................... 637
N.10.7 Change of /etc/cip.cf file........................................................................................................................................................637
N.10.8 Changes in CF over IP setting window of CF Wizard...........................................................................................................637
N.10.9 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 638
N.11 Changes in PRIMECLUSTER 4.5A10 from 4.3A40................................................................................................................... 638
N.11.1 Setting up the Host OS failover function when using it in KVM environment.....................................................................638
N.11.2 Changes in CF over IP setting window of CF Wizard...........................................................................................................639
N.11.3 Setting up the migration function when using it in KVM environment................................................................................ 639
N.11.4 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 639
N.12 Changes in PRIMECLUSTER 4.5A10 from 4.4A00................................................................................................................... 640
N.12.1 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 640
N.13 Changes in PRIMECLUSTER 4.5A10 from 4.5A00................................................................................................................... 640
N.13.1 Changing "turnkey wizard "STANDBY"" of hvw command............................................................................................... 640
Glossary............................................................................................................................................................................... 644
Index.....................................................................................................................................................................................658
- xxi -
Part 1 Planning
Part 1 describes the workflow from PRIMECLUSTER design to installation and operation management. Users who are installing a
PRIMECLUSTER system for the first time need to read this part.
-1-
Chapter 1 Build Flow
This chapter describes the workflow for building a PRIMECLUSTER system. To build a PRIMECLUSTER system, follow the procedure
described below.
1.1 Planning
Before building a PRIMECLUSTER system, you must first design the system.
-2-
5. Determine the cluster applications.
Determine the number of cluster applications. Also determine which nodes are to be used for each application.
- Determine the switchover network type (IP address takeover) and the takeover address.
- Determine whether a user-defined RMS configuration script is to be used. Determine whether there are other items to be used
as resources.
- For a disk device, determine which nodes will be sharing the device, whether the device is to be used as a RAW device (database
system), whether the device is to be used as a file system (general files), and whether the device is to be grouped.
See
For details on designing the system, see "Chapter 2 Site Preparation."
1.2 Installation
After completing the design of the PRIMECLUSTER system and determining the configuration of the PRIMECLUSTER system to be
built, install the PRIMECLUSTER system.
Since the work will be performed based on PRIMECLUSTER Designsheets that were created, check that all items on PRIMECLUSTER
Designsheets have been entered.
Information
PRIMECLUSTER Designsheets are stored in documents/designsheet in PRIMECLUSTER DVD.
Install the PRIMECLUSTER system by performing the following procedure in sequence from (1).
Perform the operations described in the dotted line sections if the system design matches the described conditions.
If you are installing applications after you install the PRIMECLUSTER system, go back to the operations from the Application environment
setup to the Application installation.
The screens to be used differ according to the operation. The work procedures to be performed with GUI from Web-Based Admin View and
the work procedures to be performed with CLI and CUI from console screens are shown in separate boxes.
Information
In the flow of PRIMECLUTSER system installation described below, "Cluster building" and "Cluster application building" can be
performed with PRIMECLUSTER Easy Design and Configuration Feature.
For details on PRIMECLUSTER Easy Design and Configuration Feature, refer to "PRIMECLUSTER Easy Design and Configuration
Guide."
-3-
Figure 1.2 Flow of PRIMECLUSTER system installation
The abbreviations in the flowchart for PRIMECLUSTER system installation are explained below.
CF: Cluster Foundation
RMS: Reliant Monitor Services
WT: Wizard Tools
GDS: Global Disk Services
GFS: Global File Services
-4-
GLS: Global Link Services
For detailed information on each item, refer as necessary to the corresponding manual reference section mentioned in the table below.
1.3 Development
To monitor a user application using PRIMECLUSTER, you need to create an RMS configuration script.
- Online script
This script executes a process that sets the resources to Online or Standby.
- Offline script
This script executes a process that sets the resources to Offline.
To check the state of a user application, the following RMS configuration script must be developed.
- Check script
This script checks the state of the resource.
-5-
See
For details on the Online/Offline script and the Check script settings, see "6.6 Setting Up Online/Offline Scripts."
1.4 Test
Purpose
When you build a cluster system using PRIMECLUSTER, you need to confirm before starting production operations that the entire system
will operate normally and cluster applications will continue to run in the event of failures.
For 1:1 standby operation, the PRIMECLUSTER system takes an operation mode like the one shown in the figure below.
The PRIMECLUSTER system switches to different operation modes according to the state transitions shown in the figure below. To check
that the system operates normally, you must test all operation modes and each state transition that switches to an operation mode.
State Description
Dual instance operation A cluster application is running, and it can switch to the other instance in the
event of a failure (failover). Two types of the dual instance operation are
OPERATING and STANDBY.
Even if an error occurs while the system is operating, the standby system
takes over ongoing operations as an operating system. This operation
ensures the availability of the cluster application even after failover.
Single instance operation A cluster application is running, but failover is disabled.
-6-
State Description
Two types of the single instance operation are OPERATING and STOP.
Since the standby system is not supported in this operation, a cluster
application cannot switch to other instance in the event of a failure. So,
ongoing operations are disrupted.
Stopped state A cluster application is stopped.
The above-mentioned "OPERATING", "STANDBY", and "STOP" are defined by the state of RMS and cluster application as follows:
* It is displayed when referring to the stopped (STOP) cluster application in the status icon of the rms tab in GUI (Cluster Admin).
- View the Cluster Admin screen of Web-Based Admin View, and check that the cluster system starts as designed when the startup
operation is executed.
- If an RMS configuration script was created, check that the commands written in the script are executed properly as follows.
- For a command that outputs a message when it is executed, check that a message indicating that the command was executed
properly is displayed on the console.
- Check that the command has been executed properly by executing the "ps(1)" command.
- A new cluster application is not started automatically during the PRIMECLUSTER system startup. To start the cluster application
automatically, you must set "AutoStartUp" for that cluster application. The AutoStartUp setting must be specified as a
userApplication attribute when the application is created. For details, see "6.7.2 Setting Up userApplication."
Clear fault
If a failure occurs in a cluster application, the state of that application changes to Faulted.
To build and run this application in a cluster system again, you need to execute "Clear Fault" and clear the Faulted state.
Conduct a clear-fault test and confirm the following:
- Check that the Faulted state of a failed application can be cleared without disrupting ongoing operations.
- If an RMS configuration script was created, check that the commands written in the script are executed properly as follows.
- For a command that outputs a message when it is executed, check that a message indicating that the command was executed
properly is displayed on the console.
- Check that the command has been executed properly by executing the "ps(1)" command.
Switchover
Conduct a failover or switchover test and confirm the following:
-7-
- Check that failover or switchover is normally done for the following:
- Disk switchover
Check that the disk can be accessed from the OPERATING node.
For a switchover disk, you need to check whether a file system is mounted on the disk by executing the "df(1)" command.
- If the Cmdline resources are to be used, check that the commands written in the Start and Stop scripts for the Cmdline resources
are executed properly.
- For a command that outputs a message when it is executed, check that a message indicating that the command was executed
properly is displayed on the console.
- Check that the command has been executed properly by executing the "ps(1)" command.
- If IP address takeover is set, check that the process takes place normally by executing the "ip(8)" command or the "ifconfig(8)"
command.
- Check that the OPERATING and STANDBY instances of the OPERATING business application occur normally when the cluster
application replacement is executed. Check the following:
- If disk switchover is to be used, check that the disk can be accessed from the OPERATING node but not from the STANDBY
node.
For a switchover disk, you need to check whether a file system is mounted on the disk by executing the "df(1)" command.
- If Cmdline resources are to be used, check that the commands written in the Start and Stop scripts for the Cmdline resources
are executed properly.
- For a command that outputs a message when it is executed, check that a message indicating that the command was executed
properly is displayed on the console.
- Check that the command has been executed properly by executing the "ps(1)" command.
- If IP address takeover is to be used, check that IP address takeover takes place normally.
Check that an application is switched to other node.
Stop
Conduct a stop test and confirm the following:
- Check that an OPERATING work process can be stopped normally by the stop operation.
- Check that work processes can be started by restarting all the nodes simultaneously.
- If Cmdline resources are to be used, check that the commands written in the Start and Stop scripts for the Cmdline resources are
executed properly.
- For a command that outputs a message when it is executed, check that a message indicating that the command was executed
properly is displayed on the console.
- Check that the command has been executed properly by executing the "ps(1)" command.
Work process continuity
Conduct work process continuity and confirm the following:
- Generating some state transitions in a cluster system, check that the application operates normally without triggering inconsistencies
in the application data in the event of a failure.
- For systems in which work processes are built as server/client systems, check that while a state transition is generated in the cluster
system, work process services can continue to be used by clients, according to the specifications.
-8-
Test for forced shutdown of cluster nodes
Check that the settings of the shutdown facility work correctly.
Conduct a test to check that every node in the cluster is shut down at least once with the following viewpoints:
- Induce an OS error to check that the cluster node in which a failure has occurred is forcibly shut down.
- Disconnect the cluster interconnect to check that the cluster node with the lowest priority is forcibly shut down.
Note
So as to detect an NIC linkdown event on both paths, disconnect the cluster interconnect.
For example, if two nodes are connected through a switch instead of being connected directly, disconnect the two cluster
interconnects from the same node side. If you perform a method of disconnection that does not allow for the detection of an NIC
linkdown event on both paths, there will be time differences in detecting an error for each route and the node that detected the error
first will have priority and stop peer node forcibly.
In addition, check that crash dumps for the cluster node which has been forcibly shut down are collected.
See
- For information on the operation procedures for start, clear fault, failover, switchover, and stop, see "7.2 Operating the
PRIMECLUSTER System."
Note
The cluster system can continue work processes even if a failure occurs. However, work processes cannot be continued if another failure
occurs during single node operation before the first failure is corrected. To enhance reliability, you need to eliminate the cause of the failure
immediately and recover the dual node operation.
See
For details for collecting information required for an investigation, see "Appendix C Troubleshooting."
See
For details on changing the operation mode, see "Part 4 System Configuration Modification."
-9-
1.7 Notes When Building a System
See
For details on a parameter value, see "Setup (initial configuration)" of PRIMECLUSTER Designsheets.
- 10 -
Configure the required Shutdown Facility depending on a server to be used
The required Shutdown Facility varies depending on a server to be used. See "5.1.2 Setting up the Shutdown Facility" to check the required
Shutdown Facility according to a server that is to be used. After that, configure it.
See
For the method of setting the time to detect CF heartbeat timeout, see "11.3.1 Changing Time to Detect CF Heartbeat Timeout."
Make sure to set the environment variable: RELIANT_SHUT_MIN_WAIT specifying the RMS shutdown wait
time
The required time to stop RMS and cluster applications varies depending on an environment. Be sure to estimate its value corresponding
to the configuration setup, and then set it.
See
For details on RELIANT_SHUT_MIN_WAIT, see "E.2 Global environment variables" in "PRIMECLUSTER Reliant Monitor Services
(RMS) with Wizard Tools Configuration and Administration Guide."
For the method of referring to and changing RMS environment variables, see "E.1 Setting environment variables" in "PRIMECLUSTER
Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
Example
When DHCP setting is being set
<Contents of /etc/sysconfig/network-scripts/ifcfg-ethX>
DEVICE=ethX
BOOTPROTO=dhcp
ONBOOT=yes
TYPE=Ethernet
DHCP_HOSTNAME=Node1
DEVICE=ethX
BOOTPROTO=static
ONBOOT=yes
IPADDR=xxx.xxx.xxx.xxx
NETMASK=xxx.xxx.xxx.x
TYPE=Ethernet
- 11 -
When using Global Link Services (hereinafter GLS), set up the configuration file (ifcfg-ethX) of network
interface according to the redundant line control methods.
Setting items are different for each redundant line control method of GLS. For details, refer to "PRIMECLUSTER Global Link Services
Configuration and Administration Guide: Redundant Line Control Function."
To use iptables or ip6tables as Firewall in a cluster node, see "Appendix L Using Firewall."
If Firewall is not set correctly, PRIMECLUSTER may not operate properly.
To use the IPMI shutdown agent or the BLADE shutdown agent, also set the kdump shutdown agent.
If the kdump shutdown agent is not set, a node is forcibly stopped without collecting panic dumps.
The kdump shutdown agent is set with the panicinfo_setup command.
When using the IPMI shutdown agent, assign the iRMC user to the Administrator permission group.
Without the administrator authority, the IPMI shutdown agent will not work correctly.
For PRIMEQUEST 3000 series, iRMC/MMB and the cluster node must belong to the same network segment.
If they do not belong to the same network segment, the iRMC asynchronous function does not operate properly.
- 12 -
When setting up redundant iRMC by using Shared LAN in PRIMEQUEST 3000 B model, the administrative
LAN of the cluster node must be separated from Shared LAN.
If Shared LAN is set as the administrative LAN, connection test status of own node may be TestFailed.
When configuring the cluster system using the extended partitions in PRIMEQUEST 3000 series (except B
model), up to 4 nodes can be supported per cluster system.
If configuring 5 or more nodes in one cluster system using extended partitions, the iRMC asynchronous function cannot operate.
When configuring the cluster system using the extended partitions in PRIMEQUEST 3000 series (except B
model), VGA/USB/rKVMS of Home SB must be assigned to any one of the extended partitions.
In the cluster system using the extended partitions, VGA/USB/rKVMS of Home SB must be assigned to any of the extended partitions (it
can also be an extended partition not configuring the cluster system). If VGA/USB/rKVMS of Home SB is "Free" without an assignment,
the iRMC asynchronous function cannot operate correctly.
For how to assign VGA/USB/rKVMS to the extended partitions, refer to the following manual:
When configuring the cluster system using the extended partitions in PRIMEQUEST 3000 series (except B
model), the iRMC asynchronous function may not operate correctly if an assignment of VGA/USB/rKVMS of
Home SB is changed.
If an assignment of VGA/USB/rKVMS of Home SB is changed in the cluster system using the extended partitions, connection confirmation
of the iRMC asynchronous function or panic/reset forcible stop may fail until the change is completed.
- 13 -
To build multiple cluster systems, note the following points:
- For cluster interconnects, use a virtual bridge for each cluster system.
- Use a common virtual bridge for the administrative LAN.
For a virtual bridge used for the administrative LAN, determine whether or not to distinguish cluster systems based on the communication
volume used in the operation. The virtual bridge can be distinguished based on the communication volume.
- 14 -
Chapter 2 Site Preparation
You must plan the items listed below before building the PRIMECLUSTER system.
Planning items
Point
An overview of each PRIMECLUSTER product is described in "PRIMECLUSTER Concepts Guide." Be sure to read the guide before
designing the PRIMECLUSTER system.
The following table shows the components (modules) that are included in each product.
- 15 -
Components Products
Names Features EE HA CB LP
PCLsnap Refers to the function that collects information Y Y Y Y
on a system or cluster that is needed to investigate
the failures.
Web-Based Admin View Refers to the function for realizing Y Y Y Y
PRIMECLUSTER operations and monitoring
with the GUI (management view).
Cluster Foundation (CF) Refers to the basic function that is required for Y Y Y Y
user applications or other PRIMECLUSTER
services to manage or communicate within the
cluster.
Reliant Monitor Services Refers to the software monitoring function that is Y Y Y Y
(RMS) used to realize high-availability (HA) of the
application that is to be executed within the
cluster.
Wizard Tools Refers to the function that is used to create an Y Y Y Y
application that is to be controlled with RMS.
RAO Refers to the function that is used to manage Y Y Y Y
resources that run on PRIMECLUSTER.
SA Refers to the shutdown agent function for which Y Y Y Y
BMC, iRMC, Blade, and MMB are used.
Global Link Services (GLS) Provides highly reliable transmission routes by Y Y - -
setting up redundant network.
Global File Services Refers to the function that is used to realize Y Y - -
(hereinafter GFS) simultaneous access to the shared file system
from multiple nodes to which the shared disk
device is connected.
Global Disk Services Refers to the volume management function that Y Y - Y
(hereinafter GDS) is used to improve the availability and
manageability of the data stored on the shared
disk device.
Parallel Application Refers to the function that enables the high- Y - - -
Services (PAS) performance and high-speed communication
with the parallel databases.
- 16 -
See
For details on the operation environment, see "Chapter 2 Operation Environment" in the Installation Guide for PRIMECLUSTER.
Information
- When using the virtual machine function in a VMware environment, see "Appendix H Using PRIMECLUSTER in a VMware
Environment."
- When using PRIMECLUSTER in RHOSP environment, see "Appendix I Using PRIMECLUSTER in RHOSP Environment."
- When using PRIMECLUSTER on FUJITSU Cloud Service K5, see "PRIMECLUSTER Installation and Administration Guide
FUJITSU Cloud Service K5."
Note
Do not set the name: cipX (X is a number from 0 to 7) for the device name of the network device that exists in the system. Since
PRIMECLUSTER creates and uses the name: cipX of the virtual network device, if the name has already existed in the network device,
PRIMECLUSTER cannot be set nor operated.
Note
- In a KVM environment, read the "host OS" as "hypervisor," in a VMware environment, read "host OS" as "ESXi host."
- When installing PRIMECLUSTER in a virtual machine environment, do not perform the following procedures:
- Temporary stopping the Guest OS
- Restart the Guest OS from a temporary stopped state
- Restart or stop of the host OS when the guest OS is not stopped
See
- For details on the virtual machine function in a KVM environment, see "Red Hat Enterprise Linux 6 Virtualization Administration
Guide" or "Red Hat Enterprise Linux 7 Virtualization Deployment and Administration Guide."
- For details on the virtual machine function in a VMware environment, see the documentation for VMware.
- 17 -
Cluster system in the virtual machine function
The virtual machine function provides the following methods to build a cluster system:
- 18 -
Method Use Note
- Restart the Guest OS from a
temporary stopped state.
Note
When an error occurs in the guest OS in VMware environment, the node state becomes LEFTCLUSTER.
For how to recover from LEFTCLUSTER, refer to "5.2 Recovering from LEFTCLUSTER" in "PRIMECLUSTER Cluster Foundation
Configuration and Administration." For the following operations, refer to "7.2 Operating the PRIMECLUSTER System."
- 19 -
When building a cluster system between guest OSes on multiple host OSes
This configuration allows you to continue work processes by a failover even if hardware such as a network or a disk fails.
Note
If the host OS cannot run in a KVM environment, the node may become the LEFTCLUSTER state. For details, see "7.4.2 Corrective
Action in the event of the LEFTCLUSTER state when the virtual machine function is used" or "7.2 Operating the PRIMECLUSTER
System."
When building a cluster system between guests on multiple host OSes in a KVM environment, you can use a function that automatically
perform a failover when the host OS fails (Host OS failover function).
Host OS failover function
When building a cluster between guests in different units on a virtual machine, if an error occurs in the host OS, nodes in the cluster
may become the LEFTCLUSTER state. Host OS failover function allows for automatically switching cluster applications on the
guest OSes in the case of the following errors in a cluster system between guests in different units in a KVM environment.
- Panic of the host OS
- Hang-up of the host OS (slowdown)
This function is achieved by linking PRIMECLUSTER installed on the host OS with guest OSes.
Note that there are some precautions for operations, for example, setting the priority of RMS is not available by using this function.
Then, you should take these precautions into consideration when designing the system.
Note
- When creating a cluster application for a guest OS, do not set the ShutdownPriority attribute of RMS.
- 20 -
- When a host OS failure is detected, the host OS is forcibly shut down. Then, all guest OSes on that host OS with a failure will
stop regardless of whether they are clusters or not.
- Do not register resources (except the following resources necessary on the guest OS) in the cluster application.
- Gls resource which controls the network used on the guest OS
- Cmdline resource to control the guest OS (see "Appendix F Setting up Cmdline Resource to Control Guest OS from Cluster
Application of Host OS in KVM Environment")
If the operation was performed on the host OS and it was overloaded, the host OS is forcibly shut down and it affects the guest
OS running on the host OS.
Figure 2.1 Cluster system using the Host OS failover function on the virtual machine
- 21 -
Figure 2.2 Failover image in the case of host OS failure
Moreover, you can replicate the cluster system by doing live migration of guest OSes in which PRIMECLUSTER is installed or by
copying the virtual machine image.
- Live Migration
Transferring an active guest OS.
- Offline Migration
Transferring a suspended guest OS.
- Migration by Export/Import
Exporting/Importing the XML setup files of stopped guest OSes.
The Migration function in a KVM environment can be used in the following cluster system configurations:
- When building a cluster system between guest OSes on multiple host OSes without using the Host OS failover function
- When building a cluster system between guest OSes on multiple host OSes using the Host OS failover function
- 22 -
- Live Migration
By migrating a guest OS while it is running (Live Migration), you can do server maintenance while maintaining the redundant
configuration for active and standby servers.
- 23 -
Figure 2.4 Live Migration to a spare server (before performing)
- 24 -
- Offline Migration
By migrating a suspended guest OS (Offline Migration), you can do standby server maintenance while maintaining the redundant
configuration for active and standby servers.
- 25 -
Figure 2.6 Offline Migration to a spare server (in performing)
- 26 -
Figure 2.7 Offline Migration to a spare server (after performing)
- 27 -
- Migration by Export/Import
By migrating a stopped guest OS by Export/Import, the guest OS can be started in a spare server, and you can do standby server
maintenance while maintaining the redundant configuration for active and standby servers.
- 28 -
Figure 2.9 Migration by Export/Import to a spare server (in performing)
- 29 -
Figure 2.10 Migration by Export/Import to a spare server (after performing)
Prerequisites are needed for using the Migration function of KVM in a cluster system. For details, see "Appendix G Using the Migration
Function in KVM Environment."
Note
- 30 -
Classification Operation mode Number of cluster Number of nodes
applications
Priority transfer 2 to (number of nodes - 1) 3 to (number of supported
nodes)
Scalable Scalable 1 to (number of nodes) 1 to (number of supported
operation nodes)
High-availability 1 to (number of nodes) 2 to (number of supported
scalable operation nodes)
Single-node - 1 1
cluster operation
Note
- If an operating node in one side is disconnected abruptly due to a power failure or other power supply problem, failover may not work.
Take corrective action as follows:
Information
The topologies for standby operation include hot-standby and cold-standby operation.
Hot-standby operation enables preliminary operation so that the operating state can be established immediately on the standby node. In hot-
standby operation, the state of the cluster application running on the operating node will be Online, while that of the cluster application on
the standby node will be Standby. To perform hot-standby operation, hot-standby must be supported by the PRIMECLUSTER product to
be used, the ISV application, and the user applications.
Cold-standby operation does not allow the preliminary operation needed to establish the operating state immediately on the standby node.
In cold-standby operation, the state of the cluster application on the operating node will be Online, while that of the standby node will be
Offline.
1:1 standby
Definition
- It is an operation mode in which a cluster system consists of 2 nodes. One is operating, and the other is standby. When a failure occurs
in the operating node, a cluster application switches to the standby node. This does not disrupt ongoing operation.
Advantage
- This operation mode ensures the availability of the cluster application even after failover.
Note
- 31 -
Failover image
Mutual standby
Definition
- It is an operation mode in which a cluster system consists of 2 or more nodes. Normally, 2 nodes are used in this operation mode.
Each node has one operating and one standby cluster applications. The operating cluster application has its own standby in each
other's node.
Advantage
- Since all the nodes are operating for cluster application, the nodes in whole system can be used efficiently.
Note
- If failover occurs for any of the cluster applications, the performance of the cluster applications may drop because two or more
cluster applications will be operating in the failover node. For this operation mode, you need to estimate adequate resources.
- 32 -
Failover image
See
For information on how to set the cluster application priority, see Step 4 in "6.7.2.1 Creating Standby Cluster Applications."
N:1 standby
Definition
- It is an operation mode in which a cluster system consists of 3 or more nodes. One is standby, and the others are operating. When
a failure occurs in one of the operating nodes, a cluster application switches to the standby node. If a failure occurs in two or more
operating nodes at the same time, the cluster applications switch to the standby node.
Advantages
- This operation mode ensures the availability of the cluster application even after failover.
- Since one node serves as the STANDBY node for multiple cluster applications, the STANDBY cost can be reduced when the
number of cluster applications is large.
Note
- If failover occurs for multiple cluster applications, the performance of the cluster applications is reduced because multiple cluster
applications will be operating in one node.
- 33 -
Failover image
- It is an operation mode in which a cluster system consists of 3 or more nodes: one is operating, and the others are standby. When
a failure occurs in the operating node, a cluster application switches to one of the standby nodes. When a failover is even failed, this
cluster application switches to other standby node.
Advantages
- Even after one node is stopped, the redundant configuration of the cluster application can be maintained by using other nodes. The
availability is guaranteed during system maintenance.
- This operation mode ensures the availability of cluster applications even after failover.
Note
- As the system has a redundant configuration, nodes in whole system cannot normally be used efficiently.
Failover image
In this example, the nodes are defined in the sequence Node 1, Node 2, and Node 3 starting from the node with the highest cluster
application priority. These nodes are defined when the cluster application is set up.
- 34 -
Priority transfer (application of N:1 standby)
Definition
- One node functions as STANDBY for multiple cluster applications. For the other nodes, one cluster application functions as
OPERATING for every node of the other nodes while the other multiple cluster applications function as STOP.
- This topology uses the exclusivity function between cascade and cluster applications.
Advantages
- On that node on which one cluster application is OPERATING, the other cluster applications do not become either OPERATING
or STANDBY. Therefore, the throughput of that cluster application is guaranteed even after failover occurs.
- Because failback of the cluster application is not necessary during the restoration of a cluster application, a job can also be continued
during the restoration.
- Since one node is used as STANDBY exclusively for multiple cluster applications, the cost incurred for standby can be saved when
there are many cluster applications.
Notes
- Since one node is used as STANDBY of multiple cluster applications, availability decreases when there are many cluster
applications.
- If a failover occurs due to the occurrence of an error on one node, the availability decreases because no standby node is available
until the completion of the maintenance work.
- 35 -
Failover image
- 36 -
Scalable
Definition
- A cluster system consists of two or more operating nodes, and all the nodes are used for online cluster applications. This operation
mode is suitable for parallel jobs that use the I/O load balancing and load sharing on a parallel database.
Advantage
- If part of the cluster applications stops, throughput of the cluster applications cannot be guaranteed because degenerated operation
is assumed.
Failover image
Note
Scalable operation can be used in combination with some PRIMECLUSTER-related products. For information on the related products, see
the manuals of PRIMECLUSTER-related products.
- Refers to the topology in which standby operation is configured for each cluster application that constitutes scalable operation.
Suitable for a parallel database for which scalability and availability are required, as well as parallel job execution for which load
share/load balance is used.
- Standby operation that constitutes scalable operation can be combined with 1:1 standby and N:1 standby, with priority transfer.
Advantages
- Even if failover occurs in one of the cluster applications that constitute scalable operation, the throughput of all the cluster
applications can be maintained by using a redundant configuration.
- 37 -
- Nodes in whole system cannot be used efficiently because of a redundant configuration.
Failover image
The following illustrates failover when two 1:1 standby operations are combined to enable scalable operation.
Note
High-availability scalable operation can be used in combination with some PRIMECLUSTER-related products. For information on the
related products, see the manuals of PRIMECLUSTER-related products.
- This operation mode enables monitoring and control jobs on the node in a single node configuration.
- 38 -
- If an error occurs in the resource to which the AUTORECOVER attribute is set, the availability can be improved by automatically
restarting the system for restoration.
- You can also use this mode as a development environment for creating and testing cluster applications.
Notes
- Jobs will be suspended in the case of a hardware failure because a single-node cluster has no hardware to switch to. Build a cluster
with multiple nodes if you need to switch hardware when a hardware failure occurs.
- If multiple cluster systems exist in an environment in which the virtual machine function is used, build a single-node cluster on the
highest priority node as the figure shown below.
- In an environment in which the virtual machine environment is used, a guest OS on the single-node cluster is shut down under the
following conditions (see the figure below):
- 39 -
- The node is forcibly shutdown (due to an inter-node communication failure or other causes).
Failover image
No failover occurs in the single-node cluster operation.
- 40 -
Note
You need at least one network interface card for cluster interconnect that is used in PRIMECLUSTER also in the single-node cluster
operation.
See
For information on the operation modes of Web-Based Admin View, see "1.2 Web-Based Admin View topology" in "PRIMECLUSTER
Web-Based Admin View Operation Guide."
- 41 -
Topology where separate LANs are used
In this topology, the public LAN and the LAN that is connected to the management client are separate. When using a management
client from a public network, this topology is recommended for security. After the PRIMECLUSTER installation is done, you will
need to modify the Web-Based Admin View configuration.
Specify IP addresses used for a cluster node and a client respectively. For details, see "5.1.1 Setting Up CF and CIP."
- 42 -
This model supports 2 types of topology, which are described below.
Topology where a network is shared
In this topology, the public LAN and the LAN that is connected to the management client are the same. You can adopt this topology
if the network users and network range are limited for security. This is the default Web-Based Admin View configuration after
PRIMECLUSTER installation.
- 43 -
2.5 Determining the Failover Timing of Cluster Application
Determine the failover timing of cluster application. You can choose from the following:
Multiple choices are possible from 2 to 4.
See
The failover timing is set in "6.7.2 Setting Up userApplication."
- 44 -
Part 2 Installation
This part describes procedures for installing the PRIMECLUSTER system and running Web-Based Admin View.
The operations include the procedures up to installing a new PRIMECLUSTER system.
For procedures on changing the PRIMECLUSTER system configuration after the system is installed, see "Chapter 8 Changing the Cluster
System Configuration."
- 45 -
Chapter 3 Software Installation and Setup
This chapter describes how to install and set up software products related to PRIMECLUSTER for the following cases:
Note
- For the security, set "No Firewall" when a Red Hat Enterprise Linux is installed or when the setup command is executed. If Firewall
has already been set for the security, change the setting to "No Firewall." If the "Firewall" setting is left as is, the clsetup (setting of the
resource database) command will operate abnormally.
- PRIMECLUSTER guarantees the performance of any required software when the umask value is set to 022. Do not modify the umask
value.
- For immediate cluster failover if an I/O device where the system volume is placed fails
If an I/O device where the system volume is placed fails, a cluster failover does not occur and the system operation may continue based
on the data stored on the memory.
If you want PRIMECLUSTER to trigger a cluster failover by panicking a node in the event that an I/O device where the system volume
is placed fails, set the ext3 or the ext4 file system to the system volume and perform the following setting.
Setting
Specify "errors=panic" to the mount option of each partition (the ext3 or the ext4 file system) included in the system volume.
Example: To set it in /etc/fstab (when /, /var, and /home exist in one system volume)
However, an immediate cluster failover may not become available due to taking time for an I/O error to reach the file system. The
regularly writing to the system volume enhances the detection frequency of I/O error.
- 46 -
Figure 3.1 Flow of building the cluster system when not using the virtual machine function
See
For information on changing the public LAN and administrative LAN that the PRIMECLUSTER system uses, see "9.2 Changing the
Network Environment."
Information
Web-Based Admin View automatically sets up an interface that was assigned the IP address of the host name corresponding to the node on
which PRIMECLUSTER was installed. This interface will be used as a transmission path between cluster nodes and cluster management
server, and between cluster management servers and clients.
- 47 -
Installation and Setup of Related Software
Install and set up the software products (ETERNUS Multipath driver) required for using shared disk units. For details on the installation and
setup procedure, see "Software Information" for ETERNUS Multipath Driver.
Note
No failover will be triggered by PRIMECLUSTER even if the operating system hangs up as long as communication with cluster
interconnect is performed normally.
This state can be avoided by enabling watchdog timer monitoring.
See
For information about behavior setup, see the ServerView Operations Manager manual.
3.1.6.1 PRIMERGY
Overview
If heartbeat monitoring fails because of a node failure, PRIMECLUSTER shutdown facility removes the failed node. If this occurs during
crash dump collection, you might not be able to acquire information for troubleshooting.
The cluster high-speed failover function prevents node elimination during crash dump collection, and at the same time, enables the ongoing
operations on the failed node to be quickly moved to another node.
- 48 -
kdump
As shown in the above figure, the cluster fast switching function allows for panic status setting and reference through BMC (Baseboard
Management Controller) or iRMC when a heartbeat monitoring failure occurs. The node that detects the failure can consider that the other
node is stopped and takes over ongoing operation without eliminating the node that is collecting crash dump.
Note
- If you reset the node that is collecting crash dump, collection of the crash dump will fail.
- When the node completes collecting the crash dump after it gets panicked, the behavior of the node follows the setting of kdump.
1. Configure kdump
When using kdump, it is necessary to configure the kdump.
For details on the configuration procedure, see the manual of your OS.
Note
Configure the kdump again if it is already configured with the installation of Red Hat Enterprise Linux.
2. Check kdump
[RHEL6]
Check if the kdump is available. If not, enable the kdump using the "runlevel(8)" and "chkconfig(8)" commands.
# /sbin/runlevel
N 3
- 49 -
- Check if the kdump is available using the "chkconfig(8)" command.
Example:
The above example shows that the kdump of the runlevel 3 is currently off.
# /sbin/chkconfig kdump on
[RHEL7]
Check if the kdump is available. If not, enable the kdump using the "runlevel(8)" and "systemctl(1)" commands.
# /sbin/runlevel
N 3
Information
The IPMI shutdown agent is used with the hardware device in which BMC or iRMC is installed.
- IP address
- User for the IPMI shutdown agent (*1)
For details, see "User Guide" provided with the hardware and the ServerView Operations Manager manual.
*1) Assign this user as the administrator. Set the user password with seven-bit ASCII characters except the following characters.
><"/\=!?;,&
Prerequisites for the Blade shutdown agent settings
Set the following for the BLADE server:
- 50 -
- Install ServerView
- Set SNMP community for the management blade (*2)
- Set an IP address of the management blade
For details, see the operation manual provided with the hardware and the ServerView Operations Manager manual.
*2) When configuring the cluster across multiple chassis, set the same SNMP community for all the management blades.
As shown in the above figure, if a panic occurs, the cluster control facility uses the MMB units to receive the panic notice. This allows the
system to detect the node panic status faster than it would be a heartbeat failure.
See
PRIMEQUEST allows you to set the panic environment so that a crash dump is collected if a panic occurs.
For details about the PRIMEQUEST dump function, setup method, and confirmation method, see the following manuals:
To use asynchronous monitoring, you must install software that controls the MMB units and specify appropriate settings for the driver. This
section describes procedures for installing the MMB control software and setting up the driver, which are required for realizing high-speed
failover.
- 51 -
1. Installing the HBA blockage function and the SVmco
The HBA blockage function and the SVmco report node status changes through the MMB units to the shutdown facility. Install the
HBA blockage function and the SVmco before setting up the shutdown facility. For installation instructions, see the following
manuals:
For details about creating a user who uses RMCP to control the MMB units, see the following manual provided with the unit:
Note
The MMB units have two types of users:
Note
Be sure to carry out this setup when using shared disks.
If a panic occurs, the HBA units that are connected to the shared disks are closed, and I/O processing to the shared disk is terminated.
This operation maintains data consistency in the shared disk and enables high-speed failover.
On all the nodes, specify the device paths of the shared disks (GDS device paths if GDS is being used) in the HBA blockage function
command, and add the shared disks as targets for which the HBA function is to be stopped. If GDS is being used, perform this setup
after completing the GDS setup. For setup instructions, see the following manuals:
- 52 -
4. Setting the I/O completion wait time
To maintain consistent I/O processing to the shared disk if a node failure (panic, etc.) occurs and failover takes place, some shared
disk units require a fixed I/O completion wait time, which is the duration after a node failure occurs until the new operation node starts
operating.
The initial value of the I/O completion wait time is set to 0 second. However, change the value to an appropriate value if you are using
shared disk units that require an I/O completion wait time.
Information
ETERNUS Disk storage systems do not require an I/O completion wait time. Therefore, this setting is not required.
Specify this setting after completing the CF setup. For setting instructions, see "5.1.2.4.5 Setting I/O Completion Wait Time."
Note
If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.
As shown in the above figure, if a panic occurs, the cluster control facility uses the iRMC/MMB units to receive the panic notice. This allows
the system to detect the node panic status faster than it would be a heartbeat failure.
- 53 -
See
PRIMEQUEST allows you to set the panic environment so that a crash dump is collected if a panic occurs.
For details about the PRIMEQUEST dump function, setup method, and confirmation method, see the following manuals:
To use the asynchronous monitoring, install the required software and set up the driver appropriately. This section describes how to install
the required software and set up the driver to enable the fast switching.
The created user name and the specified password are used when the shutdown facility is set up. Record the user name and the
password.
- "FUJITSU Server PRIMEQUEST 3000 Series Business Model iRMC S5 Web Interface"
3. Setting up MMB (except B model)
MMB must be set up so that the node status change is reported properly to the shutdown facility through MMB.
You must create the RMCP user so that PRIMECLUSTER can link with the MMB units. On all PRIMEQUEST 3000 instances that
make up the PRIMECLUSTER system, make sure to create a user to control the MMB units with RMCP. To create a user to control
MMB with RMCP, log in to MMB Web-UI, and create the user from "Remote Server Management" screen of "Network
Configuration" menu. Create the user as shown below:
- [Privilege]: "Admin"
- 54 -
- [Status]: "Enabled"
Set the user password with seven-bit ASCII characters except the following characters.
For details about creating a user who uses RMCP to control the MMB units, see the following manual provided with the unit:
Note
The MMB units have two types of users:
Note
Be sure to carry out this setup when using shared disks.
If a panic occurs, the HBA units that are connected to the shared disks are closed, and I/O processing to the shared disk is terminated.
This operation maintains data consistency in the shared disk and enables high-speed failover.
On all the nodes, specify the device paths of the shared disks (GDS device paths if GDS is being used) in the HBA blockage function
command, and add the shared disks as targets for which the HBA function is to be stopped. If GDS is being used, perform this setup
after completing the GDS setup. For setup instructions, see the following manuals:
Information
ETERNUS Disk storage systems do not require an I/O completion wait time. Therefore, this setting is not required.
Specify this setting after completing the CF setup. For setting instructions, see "5.1.2.5.5 Setting I/O Completion Wait Time."
Note
If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.
- 55 -
Perform this setup before restarting the installed PRIMECLUSTER.
Target node:
All the nodes in which PRIMECLUSTER is to be installed
The kernel parameters differ according to the products and components to be used.
Check PRIMECLUSTER Designsheets and edit the value if necessary.
Note
To enable modifications, you need to restart the operating system.
Set an appropriate kernel parameter as follows based on the type of "Characteristics" in each table.
- Addition
Set the total number of the recommended values and specified values for system default values and for each software.
- Maximum value
Specify the maximum value in the recommended values and specified values for each software.
However, make sure to use the system default value if the maximum value is less than that.
The kernel parameter values differ depending upon:
- CF Configuration
(*1)
Estimate the value required for resource database according to the following equation:
Number of resources = Number of disks in shared system devices x (number of shared nodes +1) x 2
Note
For system expansion, if you increase the logical disks, you need to re-estimate the resources and restart each node in the cluster system.
If you add disks to the cluster after installation, you must then calculate the resources required for the total number of logical disks after
addition.
- 56 -
- RMS Configuration
In order to ensure that RMS runs normally, the following kernel parameters need to be set. Therefore, when RMS is installed, the
definitions of the parameters in /etc/sysctl.conf are automatically updated if not defined or defined with smaller value than the following
"Value".
Note
- Even if definitions of the kernel parameters in /etc/sysctl.conf are automatically added/updated, change the value as necessary in
consideration of the value required by other software and user applications.
- Using GFS
Note
The values used by products and user applications that operate in the PRIMECLUSTER system must also be included in the kernel
parameter values.
Described below is the procedure for changing the kernel parameters and setting new values. (Any other kernel parameters may be displayed
in addition to the examples below.)
- 57 -
2. Determine the kernel parameter values.
The kernel parameter values are determined by the current effective values that were checked in step 1 and the values in the above
table. If the example displayed in step 1 shows the current effective values of the kernel parameters, the edited line becomes the
following:
SEMMSL value: 20
SEMMNS value: 131
SEMOPM value: 10
SEMMNI value: 42
kernel.shmmni: 4345
kernel.shmmax: 4000000000
kernel.msgmnb: 4194304
kernel.msgmni: 16391
kernel.msgmax: 32768
# sysctl -p
Check that the displayed values are the values that were determined in step 2.
- 58 -
3.1.8 Installing and Setting Up Applications
Install software products to be operated on the PRIMECLUSTER system and configure the environment as necessary.
For details, see "3.4 Installation and Environment Setup of Applications."
- When building a cluster system between guest OSes on multiple host OSes
- Without using Host OS failover function
See "3.2.2 When building a cluster system between guest OSes on multiple host OSes without using Host OS failover function."
See
When using the virtual machine function in a VMware environment, see "Appendix H Using PRIMECLUSTER in a VMware
Environment."
3.2.1 When building a cluster system between guest OSes on one host OS
This section describes how to install and set up related software when building a cluster system between guest OSes on one host OS.
Perform the steps shown in the figure below as necessary.
- 59 -
Figure 3.2 Flow of building and using the cluster system between guest OSes on one host OS
3.2.1.1 Host OS setup (before installing the operating system on guest OS)
If you plan to operate a guest OS as part of a cluster, set up the required disk devices, virtual bridges, virtual disks, user IDs, and guest OS
initializations on the host OS.
Perform the following setup on the host OS after installing the operating system on the host OS and also before installing the operating
system on the guest OS.
Note
- 60 -
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7
Virtualization Deployment and Administration Guide."
3.2.1.2 Host OS setup (after installing the operating system on guest OS)
Perform the following setup after installing the operating system on guest OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<shareable/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
4. Select a virtual disk (VirtIO Disk) from the hardware list in the left.
5. In the [Virtual disk] window, perform the following settings and click [Apply].
- Select the Shareable check box.
- Select 'none' for the cache model.
- 61 -
6. Check the version of the libvirt package on the host OS by using the rpm(8) command.
7. If the version of the libvirt package is libvirt-0.9.4-23.el6_2.4 or later, change the device attribute from disk to lun, which
is set in the guest setting file (/etc/libvirt/qemu/guestname.xml) on the host OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<shareable/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<shareable/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
- 62 -
</disk>
:
4. Select a virtual disk (VirtIO Disk) from the hardware list in the left.
5. In the [Virtual disk] window, set the serial number on [Serial number] of [Advanced options], and click [Apply].
The serial number should be a character string of up to 10 characters that does not duplicate in the virtual machine.
6. Check the version of the libvirt package on the host OS by using the rpm(8) command.
7. If the version of the libvirt package is libvirt-0.9.4-23.el6_2.4 or later, change the device attribute from disk to lun, which
is set in the guest setting file (/etc/libvirt/qemu/guestname.xml) on the host OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<serial>serial number</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<serial>serial number</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
# ls -l /dev/disk/by-id
:
lrwxrwxrwx 1 root root 9 Apr 18 08:44 virtio-disk001 -> ../../vdg
lrwxrwxrwx 1 root root 9 Apr 18 08:43 virtio-disk002 -> ../../vdh
: serial number
- 63 -
2. Setting up the virtual bridge (administrative LAN/public LAN/cluster interconnect)
For the network interfaces, including the administrative LAN, public LAN and cluster interconnect, that are used by virtual domains,
you need to set up virtual bridges for the virtual networks beforehand.
(1) Setting up a virtual bridge for the administrative LAN
Edit the /etc/sysconfig/network-scripts/ifcfg-ethX file as follows:
DEVICE=ethX
BOOTPROTO=none
ONBOOT=yes
BRIDGE=brX
Create the interface setting file, /etc/sysconfig/network-scripts/ifcfg-brX, for the virtual bridge.
DEVICE=brX
TYPE=Bridge
BOOTPROTO=static
IPADDR=xxx.xxx.xxx.xxx
NETMASK=xxx.xxx.xxx.xxx
ONBOOT=yes
Note
For IPADDR and NETMASK, set IP addresses and netmasks to connect to the external network. When IPv6 addresses are required,
make the setting so that IPv6 addresses are assigned.
DEVICE=ethX
BOOTPROTO=none
ONBOOT=yes
BRIDGE=brX
Create the interface setting file, /etc/sysconfig/network-scripts/ifcfg-brX, for the virtual bridge.
DEVICE=brX
TYPE=Bridge
ONBOOT=yes
DEVICE=brX
TYPE=Bridge
BOOTPROTO=static
ONBOOT=yes
- ON_SHUTDOWN=shutdown
- SHUTDOWN_TIMEOUT=300
Specify the timeout duration (seconds) for shutdown of the guest OS to SHUTDOWN_TIMEOUT. Estimate the length of time for
shutting down the guest OS and set the value. When multiple guest OSes are set, set the time whichever is greater. The above is an
example when the time is 300 seconds (5 minutes).
- 64 -
Note
- When setting /etc/sysconfig/libvirt-guests, do not describe the setting values and comments on the same line.
- When changing the settings in /etc/sysconfig/libvirt-guests during operation, make sure to follow the procedure in "9.4.1.3
Changing the Settings in /etc/sysconfig/libvirt-guests."
4. Creating a user ID
Point
This user ID will be the one used by the shutdown facility to log in to the host OS to force shut down the nodes. This user ID and
password are used for configuring the shutdown facility.
You need to set up a user for the shutdown facility for the guest OS control by PRIMECLUSTER.
(1) Creating a general user ID (optional)
Create a general user ID (optional) for the shutdown facility in the host OS.
Moreover, in order to permit the "sudo" execution without "tty", add "#" to the beginning of the following line to comment it out.
Defaults requiretty
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7 Virtualization
Deployment and Administration Guide."
See
For information on changing the public LAN and administrative LAN that the PRIMECLUSTER system uses, see "9.2 Changing the
Network Environment."
Information
Web-Based Admin View automatically sets up an interface that was assigned the IP address of the host name corresponding to the
node on which PRIMECLUSTER was installed. This interface will be used as a transmission path between cluster nodes and cluster
management server, and between cluster management servers and clients.
- 65 -
2. Installing the bundled software on the guest OS
Install the bundled software on the guest OS.
3. Initial setting
Initialize the guest OS.
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7
Virtualization Deployment and Administration Guide."
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7 Virtualization
Deployment and Administration Guide."
See
For information on the kernel parameters, see "3.1.7 Checking and Setting the Kernel Parameters."
Note
To enable modifications, you need to restart the operating system.
- 66 -
3.2.1.7 Installing and setting up applications
Install software products to be operated on the PRIMECLUSTER system and configure the environment as necessary.
For details, see "3.4 Installation and Environment Setup of Applications."
3.2.2 When building a cluster system between guest OSes on multiple host
OSes without using Host OS failover function
This section describes how to install and set up related software when building a cluster system between guest OSes on multiple host OSes
without using Host OS failover function.
Perform the steps shown in the figure below as necessary.
Figure 3.3 Flow of building the cluster system when not using the host OS failover function
3.2.2.1 Host OS setup (before installing the operating system on guest OS)
If you plan to operate a guest OS as part of a cluster, set up the required disk devices, virtual bridges, virtual disks, user IDs, and guest OS
initializations on the host OS.
Perform the following setup on the host OS after installing the operating system on the host OS and also before installing the operating
system on the guest OS.
- 67 -
Note
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7
Virtualization Deployment and Administration Guide."
3.2.2.2 Host OS setup (after installing the operating system on guest OS)
Perform the following setup after installing the operating system on guest OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<shareable/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
- 68 -
Using virtio block device as a shared disk
4. Select a virtual disk (VirtIO Disk) from the hardware list in the left.
5. In the [Virtual disk] window, perform the following settings and click [Apply].
- Select the Shareable check box.
- Select 'none' for the cache model.
6. Check the version of the libvirt package on the host OS by using the rpm(8) command.
7. If the version of the libvirt package is libvirt-0.9.4-23.el6_2.4 or later, change the device attribute from disk to lun, which
is set in the guest setting file (/etc/libvirt/qemu/guestname.xml) on the host OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<shareable/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<shareable/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
- 69 -
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
4. Select a virtual disk (VirtIO Disk) from the hardware list in the left.
5. In the [Virtual disk] window, set the serial number on [Serial number] of [Advanced options], and click [Apply].
The serial number should be a character string of up to 10 characters that does not duplicate in the virtual machine.
6. Check the version of the libvirt package on the host OS by using the rpm(8) command.
7. If the version of the libvirt package is libvirt-0.9.4-23.el6_2.4 or later, change the device attribute from disk to lun, which
is set in the guest setting file (/etc/libvirt/qemu/guestname.xml) on the host OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<serial>serial number</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<serial>serial number</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
- 70 -
9. On the guest OS, make sure that the by-id file of virtual disk exists.
- Make sure that the by-id files exist in all virtio block devices used for mirroring among servers.
- Make sure that the serial number set in step 5 is included in the file name of by-id file.
# ls -l /dev/disk/by-id
:
lrwxrwxrwx 1 root root 9 Apr 18 08:44 virtio-disk001 -> ../../vdg
lrwxrwxrwx 1 root root 9 Apr 18 08:43 virtio-disk002 -> ../../vdh
: serial number
DEVICE=ethX
BOOTPROTO=none
ONBOOT=yes
BRIDGE=brX
Create the interface setting file, /etc/sysconfig/network-scripts/ifcfg-brX, for the virtual bridge.
DEVICE=brX
TYPE=Bridge
BOOTPROTO=static
IPADDR=xxx.xxx.xxx.xxx
NETMASK=xxx.xxx.xxx.xxx
ONBOOT=yes
Note
For IPADDR and NETMASK, set IP addresses and netmasks to connect to the external network. When IPv6 addresses are required,
make the setting so that IPv6 addresses are assigned.
(2) Setting up virtual bridges for the public LAN and cluster interconnect
Edit the /etc/sysconfig/network-scripts/ifcfg-ethX file as follows:
DEVICE=ethX
BOOTPROTO=none
ONBOOT=yes
BRIDGE=brX
Create the interface setting file, /etc/sysconfig/network-scripts/ifcfg-brX, for the virtual bridge.
DEVICE=brX
TYPE=Bridge
ONBOOT=yes
- ON_SHUTDOWN=shutdown
- SHUTDOWN_TIMEOUT=300
- 71 -
Specify the timeout duration (seconds) for shutdown of the guest OS to SHUTDOWN_TIMEOUT. Estimate the length of time for
shutting down the guest OS and set the value. When multiple guest OSes are set, set the time whichever is greater. The above is an
example when the time is 300 seconds (5 minutes).
Note
- When setting /etc/sysconfig/libvirt-guests, do not describe the setting values and comments on the same line.
- When changing the settings in /etc/sysconfig/libvirt-guests during operation, make sure to follow the procedure in "9.4.1.3
Changing the Settings in /etc/sysconfig/libvirt-guests."
- RHEL7 environment
Execute the following command on all the nodes to check the startup status of the libvirt-guests service.
If any one of the run levels 2, 3, 4, 5 is "off", execute the following command.
If all of the run levels 2, 3, 4, 5 are "on", it is not necessary to execute the command.
- RHEL7 environment
Make sure that the current libvirt-guests service is enabled on all the nodes.
- 72 -
# /usr/bin/systemctl enable libvirt-guests.service
6. Creating a user ID
Point
This user ID will be the one used by the shutdown facility to log in to the host OS to force shut down the nodes. This user ID and
password are used for configuring the shutdown facility.
You need to set up a user for the shutdown facility for the guest OS control by PRIMECLUSTER.
(1) Creating a general user ID (optional)
Create a general user ID (optional) for the shutdown facility in the host OS.
Moreover, in order to permit the "sudo" execution without "tty", add "#" to the beginning of the following line to comment it out.
Defaults requiretty
See
For information on changing the public LAN and administrative LAN that the PRIMECLUSTER system uses, see "9.2 Changing the
Network Environment."
Information
Web-Based Admin View automatically sets up an interface that was assigned the IP address of the host name corresponding to the
node on which PRIMECLUSTER was installed. This interface will be used as a transmission path between cluster nodes and cluster
management server, and between cluster management servers and clients.
3. Initial setting
Initialize the guest OS.
- 73 -
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7
Virtualization Deployment and Administration Guide."
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7 Virtualization
Deployment and Administration Guide."
See
For information on the kernel parameters, see "3.1.7 Checking and Setting the Kernel Parameters."
Note
To enable modifications, you need to restart the operating system.
- 74 -
3.2.3 When building a cluster system between guest OSes on multiple host
OSes using Host OS failover function
This section describes how to install and set up related software when building a cluster system between guest OSes on multiple host OSes
using Host OS failover function.
Figure 3.4 Flow of building a cluster system when using Host OS failover function
- 75 -
3.2.3.1.3 Host OS setup (before installing the operating system on guest OS)
If you plan to operate a guest OS as part of a cluster, set up the required disk devices, virtual bridges, virtual disks, user IDs, and guest OS
initializations on the host OS.
Perform the following setup on the host OS after installing the operating system on the host OS and also before installing the operating
system on the guest OS.
Note
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7
Virtualization Deployment and Administration Guide."
3.2.3.1.4 Host OS setup (after installing the operating system on guest OS)
Perform this setup on the host OS according to the following procedure after installing the operating system on the host OS and the guest
OSes.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
- 76 -
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<shareable/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
4. Select a virtual disk (VirtIO Disk) from the hardware list in the left.
5. In the [Virtual disk] window, perform the following settings and click [Apply].
- Select the Shareable check box.
- Select 'none' for the cache model.
6. Check the version of the libvirt package on the host OS by using the rpm(8) command.
7. If the version of the libvirt package is libvirt-0.9.4-23.el6_2.4 or later, change the device attribute from disk to lun, which
is set in the guest setting file (/etc/libvirt/qemu/guestname.xml) on the host OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<shareable/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<shareable/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
- 77 -
8. Start the guest OS.
Using virtio-SCSI device for mirroring among servers
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-36000b5d0006a0000006a1296001f0000'/>
<target dev='sdh' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='7'/>
</disk>
:
4. Select a virtual disk (VirtIO Disk) from the hardware list in the left.
5. In the [Virtual disk] window, set the serial number on [Serial number] of [Advanced options], and click [Apply].
The serial number should be a character string of up to 10 characters that does not duplicate in the virtual machine.
6. Check the version of the libvirt package on the host OS by using the rpm(8) command.
7. If the version of the libvirt package is libvirt-0.9.4-23.el6_2.4 or later, change the device attribute from disk to lun, which
is set in the guest setting file (/etc/libvirt/qemu/guestname.xml) on the host OS.
:
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<serial>serial number</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
- 78 -
</disk>
:
:
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-id/scsi-1FUJITSU_30000085002B'/>
<target dev='vdb' bus='virtio'/>
<serial>serial number</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
:
# ls -l /dev/disk/by-id
:
lrwxrwxrwx 1 root root 9 Apr 18 08:44 virtio-disk001 -> ../../vdg
lrwxrwxrwx 1 root root 9 Apr 18 08:43 virtio-disk002 -> ../../vdh
: serial number
DEVICE=ethX
BOOTPROTO=none
ONBOOT=yes
BRIDGE=brX
Create the interface setting file, /etc/sysconfig/network-scripts/ifcfg-brX, for the virtual bridge.
DEVICE=brX
TYPE=Bridge
BOOTPROTO=static
IPADDR=xxx.xxx.xxx.xxx
NETMASK=xxx.xxx.xxx.xxx
ONBOOT=yes
Note
For IPADDR and NETMASK, set IP addresses and netmasks to connect to the external network. When IPv6 addresses are required,
make the setting so that IPv6 addresses are assigned.
(2) Setting up virtual bridges for the public LAN and cluster interconnect
Edit the /etc/sysconfig/network-scripts/ifcfg-ethX file as follows:
DEVICE=ethX
BOOTPROTO=none
- 79 -
ONBOOT=yes
BRIDGE=brX
Create the interface setting file, /etc/sysconfig/network-scripts/ifcfg-brX, for the virtual bridge.
DEVICE=brX
TYPE=Bridge
ONBOOT=yes
- ON_SHUTDOWN=shutdown
- SHUTDOWN_TIMEOUT=300
Specify the timeout duration (seconds) for shutdown of the guest OS to SHUTDOWN_TIMEOUT. Estimate the length of time for
shutting down the guest OS and set the value. When multiple guest OSes are set, set the time whichever is greater. The above is an
example when the time is 300 seconds (5 minutes).
Note
- When setting /etc/sysconfig/libvirt-guests, do not describe the setting values and comments on the same line.
- When changing the settings in /etc/sysconfig/libvirt-guests during operation, make sure to follow the procedure in "9.4.1.3
Changing the Settings in /etc/sysconfig/libvirt-guests."
- RHEL7 environment
Execute the following command on all the nodes to check the startup status of the libvirt-guests service.
- 80 -
# /sbin/chkconfig --list libvirt-guests
libvirt-guests 0:off 1:off 2:off 3:off 4:off 5:off 6:off
If any one of the run levels 2, 3, 4, 5 is "off", execute the following command.
If all of the run levels 2, 3, 4, 5 are "on", it is not necessary to execute the command.
- RHEL7 environment
Make sure that the current libvirt-guests service is enabled on all the nodes.
6. Creating a user ID
Point
This user ID will be the one used by the shutdown facility to log in to the host OS to force shut down the nodes. This user ID and
password are used for configuring the shutdown facility.
KVM environment
You need to set up a user for the shutdown facility for the guest OS control by PRIMECLUSTER.
(1) Creating a general user ID (optional)
Create a general user ID (optional) for the shutdown facility in the host OS.
Moreover, in order to permit the "sudo" execution without "tty", add "#" to the beginning of the following line to comment it out.
Defaults requiretty
- 81 -
3.2.3.1.7 Checking and setting the kernel parameters
To operate the PRIMECLUSTER-related software, you need to edit the values of the kernel parameters based on the environment.
Perform this setup before restarting the installed PRIMECLUSTER.
Target node:
All the nodes on which PRIMECLUSTER is to be installed
The kernel parameters differ according to the products and components to be used.
Check "Setup (initial configuration)" of PRIMECLUSTER Designsheets and edit the value if necessary.
See
For information on the kernel parameters, see "3.1.7 Checking and Setting the Kernel Parameters."
Note
To enable modifications, you need to restart the operating system.
Note
- After setting CF, set the timeout value of the cluster system on the host OS to 20 seconds. For details on the setup, refer to "11.3.1
Changing Time to Detect CF Heartbeat Timeout."
- Share the cluster interconnect LAN of the host OS with other guest OSes, and separate networks for each cluster system with Virtual
LAN.
- 82 -
See
For information on changing the public LAN and administrative LAN that the PRIMECLUSTER system uses, see "9.2 Changing the
Network Environment."
Information
Web-Based Admin View automatically sets up an interface that was assigned the IP address of the host name corresponding to the
node on which PRIMECLUSTER was installed. This interface will be used as a transmission path between cluster nodes and cluster
management server, and between cluster management servers and clients.
3. Initial setting
Initialize the guest OS.
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7
Virtualization Deployment and Administration Guide."
Point
This user ID is used by the host OS failover function to log in to the guest OS. This user ID and password are used for setting the host
OS failover function.
Moreover, in order to permit the "sudo" execution without "tty", add "#" to the beginning of the following line to comment it
out.
Defaults requiretty
- 83 -
3.2.3.4.2 NTP setup (Guest OS)
Before building the cluster, make sure to set up NTP that synchronizes the time of each node in the cluster system.
This setup should be performed on the guest OS before installing PRIMECLUSTER.
See
For details on settings, see "Red Hat Enterprise Linux 6 Virtualization Administration Guide" or "Red Hat Enterprise Linux 7 Virtualization
Deployment and Administration Guide."
See
For information on the kernel parameters, see "3.1.7 Checking and Setting the Kernel Parameters."
Note
To enable modifications, you need to restart the operating system.
Note
- Share the cluster interconnect LAN of the guest OS with other guest OSes and the host OS, and separate networks for each cluster system
with Virtual LAN.
- Do not change a timeout value of the guest OS from 10 seconds at the CF setting.
- 84 -
- For setup policy for survival priority, see "Survival scenarios" in "5.1.2 Setting up the Shutdown Facility."
Note
When creating a cluster application for a guest OS, do not set the ShutdownPriority attribute of RMS.
See
For details on the installation procedures, see the Installation Guide for PRIMECLUSTER.
After PRIMECLUSTER was installed, perform the following settings so that the CF modules and the GDS modules are not incorporated
to an initial RAM disk (initramfs) for kdump:
- RHEL6 environment
1. Add CF modules (cf, symsrv) and GDS modules (sfdsk, sfdsk_lib, sfdsklog, sfdsksys) to the setting of blacklist for /etc/
kdump.conf.
Example
blacklist kvm-intel
blacklist cf symsrv
If GDS is installed:
- 85 -
Note
In physical environment of PRIMERGY, PRIMECLUSTER sets kdump_post in /etc/kdump.conf. Do not set kdump_post in any other
environments than PRIMECLUSTER as only one kdump_post is active in /etc/kdump.conf.
PRIMECLUSTER adds the following settings to /etc/kdump.conf when OS is started.
- RHEL7 environment
1. Add the following description to the line of KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump by the same line.
If GDS is not installed:
rd.driver.blacklist=cf,symsrv
If GDS is installed:
rd.driver.blacklist=cf,symsrv,sfdsk,sfdsksys,sfdsklog,sfdsk_lib
Example
Before change:
See
For details on kdump, see the Linux documentation.
Note
When using the ntpdate service to adjust the time at OS startup in RHEL7, rapid time adjustment may be performed by the ntpdate service
after each PRIMECLUSTER service is started. Therefore, considering the startup order of systemd, set the time adjustment by the ntpdate
service to be completed before each PRIMECLUSTER service below is started.
- fjsvwvbs.service
- smawcf.service
- fjsvsdx.service (if using GDS)
The operation procedure is as follows.
You can skip these steps when not using the ntpdate service.
- 86 -
Operation Procedure:
Perform the following procedure on all the nodes.
# mkdir /etc/systemd/system/fjsvwvbs.service.d
# chmod 755 /etc/systemd/system/fjsvwvbs.service.d
# mkdir /etc/systemd/system/smawcf.service.d
# chmod 755 /etc/systemd/system/smawcf.service.d
# mkdir /etc/systemd/system/fjsvsdx.service.d
# chmod 755 /etc/systemd/system/fjsvsdx.service.d
# touch /etc/systemd/system/fjsvwvbs.service.d/ntp.conf
# chmod 644 /etc/systemd/system/fjsvwvbs.service.d/ntp.conf
# touch /etc/systemd/system/smawcf.service.d/ntp.conf
# chmod 644 /etc/systemd/system/smawcf.service.d/ntp.conf
# touch /etc/systemd/system/fjsvsdx.service.d/ntp.conf
# chmod 644 /etc/systemd/system/fjsvsdx.service.d/ntp.conf
3. Add the following setting to each configuration file (ntp.conf) created in step 2.
[Unit]
After=time-sync.target
# systemctl daemon-reload
5. Check the setting of start/stop order of the PRIMECLUSTER services. Make sure that time-sync.target is included.
If the time-sync.target is not included, make sure that settings step 1 to 4 are correctly done.
See
- For information on products supported by the PRIMECLUSTER system, see "Appendix A PRIMECLUSTER Products."
- For details on installing applications, see the manuals, Software Release Guides and installation guides for the individual applications.
- 87 -
Chapter 4 Preparation Prior to Building a Cluster
This chapter explains the preparation work that is required prior to building a cluster, such as starting up the Web-Based Admin View screen.
See
As preparation for building the cluster, check the operation environment. See "Chapter 2 Operation Environment" in the Installation Guide
for PRIMECLUSTER.
Table 4.1 Operation procedure and manual reference location for starting the Web-Based Admin View screen
Execution Required/ Manual reference location
Work item
Node Optional *
(1) 4.1 Checking PRIMECLUSTER - Required
Designsheets
(2) 4.2 Activating the Cluster Interconnect All nodes Required
(3) 4.3 Preparations for Starting the Web-
Based Admin View Screen
4.3.1 Assigning Users to Manage the Cluster node Required
Cluster
4.3.2 Preparing the Client Client Required WEB "3.1.2 Prerequisite
Environment client environment"
4.3.3 Initial Setup of Web-Based Cluster node Required
Admin View
4.3.4 Setting Up the Browser Client Required WEB "3.1.3.1 Preparing the
Web browser"
4.3.5 Setting Up Java Client Required WEB "3.1.3.2 Required for
the Web Browser
Environment"
- 88 -
Execution Required/ Manual reference location
Work item
Node Optional *
(4) 4.4 Starting the Web-Based Admin Client Required WEB "3.2 Screen startup"
View Screen
ONBOOT=yes
Set up the IP address when using CF over IP (CF over IP is necessary if the cluster nodes are located in the different network
segments).
Note
- ethX indicates a network interface that is used for the cluster interconnect.
A number is specified in X
If the state flag with the above command is not "UP", execute the following command to confirm if "UP" is set.
[RHEL7]
If the state flag with the above command is not "UP", execute the following command to confirm if "UP" is set.
- 89 -
- Startup from the command (recommended)
A mode that starts the screen conducting Java Web Start and does not use the Web browser.
Configuration of the Web browser is not required and in order to startup the screen without depending on the Web browser, this startup
mode is recommended.
1. Environment setup
Set up the environment for starting the GUI screen of Web-Based Admin View:
You can set up the following in any order:
Table 4.2 Operation management GUIs of Web-Based Admin View and authorized user groups
GUI name user group name Privileges
All GUIs wvroot Root authority. This group can execute all operations.
Cluster Admin clroot Root authority. This group can specify settings,
execute management commands, and display
information.
cladmin Administrator authority. This group cannot specify
settings. It can execute management commands and
display information.
clmon User authority. This group cannot specify settings and
cannot execute management commands. It can only
display information.
- 90 -
GUI name user group name Privileges
GDS (Global Disk Services) sdxroot Root authority. This group can use the GDS
management view.
The groups for the operation management GUIs are defined as shown in the above table.
wvroot is a special user group, and is used for Web-Based Admin View and GUIs. Users belonging to this group are granted the highest
access privileges for Web-Based Admin View and all kinds of operation management GUIs.
The system administrator can allow different access privileges to users according to the products that the users need to use.
For example, a user who belongs to the "clroot" group but not to "sdxroot" is granted all access privileges when opening the Cluster Admin
screen but no access privileges when opening the Global Disk Services (GDS) GUIs.
The following user groups: wvroot, clroot, cladmin, and clmon are automatically created at the installation of PRIMECLUSTER. Since the
sdxroot user group cannot be automatically created, if you want to grant the privileges to users for operating the GDS management view,
create it on each primary and secondary management servers. The users must also be assigned to these groups. The Web-Based Admin View
group membership should maintain consistency among all management servers associated with a specific cluster system.
To register the above group to a user, you should register the group as a Supplemental Group. To register a group as a Supplemental Group,
use the usermod(8) or useradd(8) command.
Note
When you register a new user, use the passwd(8) command to set a password.
# passwd username
The root user is granted the highest access privilege regardless of which group the root user belongs to.
For details about user groups, see "3.1.1 User group determination" in "PRIMECLUSTER Web-Based Admin View Operation Guide."
When creating the wvroot user group automatically at installation of PRIMECLUSTER, GID (ID number of the group) is not specified.
Even if GID is not changed, it does not affect the behavior of the operation management products running on Web-Based Admin View;
however, if you want to specify the same GID between the primary management server and the secondary management server, execute the
groupadd(8) command or the groupmod(8) command:
- When specifying GID before installing PRIMECLUSTER and then creating the wvroot user group
- When changing GID of the wvroot user group after installing PRIMECLUSTER
See
For details, see "3.1.2 Prerequisite client environment" in "PRIMECLUSTER Web-Based Admin View Operation Guide."
- 91 -
4.3.3 Initial Setup of Web-Based Admin View
Operation Procedure:
1. Stop Web-Based Admin View on all the management servers and nodes.
# /etc/init.d/fjsvwvcnf stop
# /etc/init.d/fjsvwvbs stop
2. Set the IP addresses of the primary management server and the secondary management server.
# /etc/opt/FJSVwvbs/etc/bin/wvSetparam primary-server <primary-management-server-IP-address>
# /etc/opt/FJSVwvbs/etc/bin/wvSetparam secondary-server <secondary-management-server-IP-address>
In addition, no value is displayed in Web-Based Admin View on the secondary management server.
3. Restart Web-Based Admin View on all the management servers and nodes.
- For RHEL6
# /etc/opt/FJSVwvbs/etc/bin/wvCntl restart
# /etc/init.d/fjsvwvcnf restart
- For RHEL7
# /etc/init.d/fjsvwvbs restart
# /etc/init.d/fjsvwvcnf restart
See
Web-Based Admin View has some different operation management modes. For further details, see "1.2.2 System topology" and "Chapter
7 Web-Based Admin View setup modification" in "PRIMECLUSTER Web-Based Admin View Operation Guide."
- 92 -
Note
- For making entries to /etc/hosts in Japanese, use EUC encoding and set "ja" for the system requirements variable "lang" for Web-Based
Admin View. For further details on the Web-Based Admin View language settings, refer to "4.3.3.3 Setting the Web-Based Admin
View Language."
- Only the IP addresses can be set to the primary management server and secondary management server.
- Sometimes after restarting Web-Based Admin View, it cannot be started and the message below is displayed.
wvcheckconf Error: [sys:group-addr] invalid IpAddress[Host name]
wvcheckconf: 'webview.cnf' has not been modified by some Errors.
FJSVwvbs: 'webview.cnf' abnormal
This message is displayed when group-addr, which is the environment variable of Web-Based Admin View, is not correctly specified.
Refer to the group address setting in "7.4 Secondary management server automatic migration" in "PRIMECLUSTER Web-Based
Admin View Operation Guide" and set the group-addr value correctly.
- If the information of both primary and secondary management servers is not set in /etc/hosts, refer to "Appendix B Troubleshooting"
in "PRIMECLUSTER Web-Based Admin View Operation Guide" and set the information.
Confirmation Procedure
Check that all node information is output by executing the "wvstat" command on the connected management server.
(Example)
In a two-node configuration consisting of node1(10.20.30.40) and node2(10.20.30.41), node1 is the primary management server and node2
is the secondary management server.
# /etc/opt/FJSVwvbs/etc/bin/wvstat
primaryServer 10.20.30.40 node1 http=10.20.30.40 Run 3m41s
primaryServer Sessions: 0
primaryServer Nodes: 2
10.20.30.40 node1 Linux-2.4.9-e.8enterprise 3m36s
10.20.30.41 node2 Linux-2.4.9-e.8enterprise 2m58s
secondaryServer 10.20.30.41 node2 http=10.20.30.41 Run 2m46s
secondaryServer Sessions: 0
secondaryServer Nodes: 2
10.20.30.40 node1 Linux-2.4.9-e.8enterprise 2m41s
10.20.30.41 node2 Linux-2.4.9-e.8enterprise 2m23s
Make sure that the information of the nodes connected to each management server is properly displayed. If the information is not properly
displayed, check the following points:
- If the information is not properly displayed, Web-Based Admin View has not been started or there may be an error in the Web-Based
Admin View settings. Restart Web-Based Admin View and execute the operation again. If node information is still not displayed, refer
to "2.4 Initial Setup of Web-Based Admin View" in "PRIMECLUSTER Web-Based Admin View Operation Guide" and check the
parameter settings.
- Communication with the management servers may be blocked by firewall. When using firewalld, iptables, or ip6tables as firewall,
permit the communication with the port numbers used by Web-Based Admin View. For details , see "Appendix L Using Firewall."
- 93 -
If you want to display the messages in Japanese, take the following steps to set up environment variables of Web-Based Admin View. You
need to set up the variables using a system administrator access privilege on all the nodes and the cluster management servers that construct
a cluster system.
This operation must be executed with the system administrator authority for all cluster nodes and the cluster management server that make
up the cluster system.
Table 4.3 Environment variable for the operation language of Web-Based Admin View
Attribute Variable Possible values Meaning
sys Lang C, ja Language environment in which Web-Based Admin View operates.
C: Operates in English.
ja: Operates in Japanese.
If this variable is not set, Web-Based Admin View operates in the English
environment.
Operation Procedure:
1. Stop Web-Based Admin View on all the management servers and nodes.
# /etc/init.d/fjsvwvcnf stop
# /etc/init.d/fjsvwvbs stop
2. Add the environment variable to the definition file (/etc/opt/FJSVwvbs/etc/webview.cnf) of Web-Based Admin View, and set the
language.
Execute the following command on all the management servers and nodes, referring to the example.
Example: Add the environment variable and set the operation language to Japanese.
3. Restart Web-Based Admin View on all the management servers and nodes.
- For RHEL6
# /etc/opt/FJSVwvbs/etc/bin/wvCntl restart
# /etc/init.d/fjsvwvcnf restart
- For RHEL7
# /etc/init.d/fjsvwvbs restart
# /etc/init.d/fjsvwvcnf restart
Note
- For Web-Based Admin View to display messages in Japanese, the language environment of the personal computers that are being used
as clients must be set to Japanese. If a client has an English environment, the message contents turn into garbled characters by the above
setting change.
- To change the environment variable again after it is added by the above procedure, execute the following command:
# /etc/opt/FJSVwvbs/etc/bin/wvSetparam lang <setting_value>
For details on the command, see "4.5.3 Environment variable modification" in "PRIMECLUSTER Web-Based Admin View Operation
Guide."
- 94 -
4.3.4 Setting Up the Browser
Set up a Web browser on the clients.
See
See "3.1.3.1 Preparing the Web browser" in "PRIMECLUSTER Web-Based Admin View Operation Guide."
See
For details on the supported Java versions, see "4.3.2 Preparing the Client Environment." For instructions on setting up Java, see "3.1.3.2
Conducting Java settings" in "PRIMECLUSTER Web-Based Admin View Operation Guide."
Operation Procedure:
- Startup from the command (recommended)
1. Press [Win] + [R] key on the client and the dialog box [Run] opens.
2. Input the javaws command with the format below and access to the cluster management server.
javaws http://<host-name>:<port-number>/
- Startup from the Web browser (If using Java Web Start)
1. Startup the Web browser on the client.
2. Specify the URL with the format below and access to the cluster management server.
http://<host-name>:<port-number>/
3. When using Microsoft Edge browser, click [Open] at the notification bar of file download completion which is displayed at the
lower part of the browser.
When the notification bar of file download confirmation is displayed, click [Save] to save the file and then click [Open].
<host-name>
The IP address or the host name (httpip) that clients use to access the primary or secondary management server.
The default value of httpip is the IP address that is assigned to the node name that is output when "uname -n" is executed.
<port-number>
Specify "8081."
- 95 -
If the port number has been changed, specify the up-to-date number.
For instructions on changing the http port number, see "7.2.1 http port number" in "PRIMECLUSTER Web-Based Admin
View Operation Guide."
Note
- If the Web-Based Admin View screen cannot be started when the host name is specified in <host-name>, specify the IP
address directly that corresponds to the host name.
- When specifying the IPv6 address for <host-name>, enclose it in brackets "[ ]".
(Example: http://[1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]:8081/Plugin.cgi)
- Note that the access method may be different depending on the operation management product. To use operation
management products that have different access methods at the same time, see "3.3.4 Concurrent use of operation
management products with different access methods" in "PRIMECLUSTER Web-Based Admin View Operation Guide."
- For information on the IP address or host name (httpip) used by clients, see "PRIMECLUSTER Web-Based Admin View
Operation Guide."
3. When the Web-Based Admin View is started, the following window appears.
Enter a user name and password that have been registered to the management server, and click OK.
Note
You cannot close this window by clicking "x" at the top right corner.
Note
- After starting the Web-Based Admin View screen, do not change the page by pressing the Forward/Next, Back, or Reread/Refresh
buttons.
- If "Welcome to Web-Based Admin View" does not appear after you read the URL of the Java Plug-in with Internet Explorer, an
appropriate Java Plug-in may not be installed. Confirm that an appropriate Java Plug-in is installed by using "Add/Remove Programs"
in the control panel. If the Java Plug-in is not installed or if an older Java Plug-in version that is not supported is installed, see
- 96 -
"PRIMECLUSTER Web-Based Admin View Operation Guide" and install the Java Plug-in. Also, if the "security warning" dialog box
appears, and prompts you to specify whether the "Java Plug-in" is to be installed and executed, select No.
- If the secondary cluster management server is set to operate dynamically, there is a function that connects automatically to the primary
or secondary cluster management server that is operating at that time even if the URL of a specific monitoring node is specified. For
details, see "7.4 Secondary management server automatic migration" in "PRIMECLUSTER Web-Based Admin View Operation
Guide."
- If repeated errors occur during the authentication of Step 3, the message 0016 may be displayed and you may not be able to log in. For
the action to take if this happens, see "Symptom 16" in "B.1 Corrective action" of "PRIMECLUSTER Web-Based Admin View
Operation Guide."
- If some problems occur while you are using Web-Based Admin View, see "Appendix A Message" and "Appendix B Troubleshooting"
in "PRIMECLUSTER Web-Based Admin View Operation Guide."
- When starting the screen using Java Web Start from the Web browser, a downloaded file name may be something other than
WebStart.jnlp.
- When starting the screen using Java Web Start from the Web browser, a tab remains in the Web browser after starting Web-Based
Admin View screen and the user name input screen. Closing this tab will not cause any problems because it does not operate with the
Web-Based Admin View after starting the screen.
- When starting the Java Web Start screen from the Web browser in an environment where the extended screen provided by the multi-
display function of Windows is used, the screen may not start or the screen size may be reduced or expanded.
In this case, change the screen settings with the following procedure:
- 97 -
4.5.1 Operation Menu Functions
Web-Based Admin View screen supports the functions shown below.
See "Menu Outline."
Menu Outline
The operation menus are categorized into the following two types:
a. Management screens and manuals of operation management products that are presented by PRIMECLUSTER
b. Management screens and manuals of operation management products that are provided by non-PRIMECLUSTER products
The following operations are possible for the menu of a:
- Manual
The PRIMECLUSTER online manual is displayed.
- Common
You can refer to manuals that are available as online manuals.
- 98 -
For details, see "PRIMECLUSTER Web-Based Admin View Operation Guide."
At the Cluster Admin screen, you can switch the window by clicking the following tabs:
- 99 -
Note
SIS cannot be used with this version.
Figure 4.2 Web-Based Admin View screen (Global Cluster Services menu)
- Cluster Admin
This function allows you to monitor the status of the PRIMECLUSTER system and operate the system.
- 100 -
Figure 4.3 Web-Based Admin View screen (Cluster Admin)
- msg (Message)
Cluster control messages are displayed.
Reference location: "Chapter 7 Operations"
1. Close all screens if the management screen of the operation management product is displayed.
2. When only the Web-Based Admin View screen is displayed, select the Logout.
- 101 -
Exiting the screen
To exit the Web-Based Admin View screen, follow the procedure below.
1. Log out from the Web-Based Admin View screen according to "Logging out of the screen" described above.
2. The login screen will be displayed. To exit the Web-Based Admin View screen, execute one of the following operations while the
login screen is still displayed:
Note
- To terminate the Web browser, select the Close in the File menu, or click the "x" at the top right corner of the screen.
- At the login screen, clicking the "x" at the top right corner of the screen will not terminate the screen.
- The login screen will remain temporarily after exiting the browser.
- 102 -
Chapter 5 Building a Cluster
The procedure for building a PRIMECLUSTER cluster is shown below:
Note
Note
When Firewall is enabled, disable it before the initial cluster setup.
- 103 -
When enabling Firewall after completing the installation of the cluster, see "Appendix L Using Firewall."
- 104 -
Note
- Node names of the cluster nodes are automatically input to "CF node names." The CF node name must be within 11characters.
- When constructing multiple clusters, and if any of NICs used in different clusters exist on the same network, specify a different name
per each cluster, such as including the node name in the cluster name.
- If you enable any one of the CF remote services, do not connect the following systems in the same cluster interconnect:
- Systems that have a security problem
- Systems in which cluster interconnects are not secured
- Hereinafter, the CF remote services (CFCP and CFSH) must be enabled. To enable this function after configuring CF, add the following
definition to the /etc/default/cluster.config file and execute cfset -r.
CFCP "cfcp"
CFSH "cfsh"
- To share a NIC with the administrative LAN and the cluster interconnect, see "1.1 CF, CIP, and CIM configuration" in
"PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide."
- When the bonding device is used for the cluster interconnection, only mode=1(active-backup) can be used.
- For the cluster interconnect, it is recommended to use the physically independent and dedicated network. If the network is shared with
other communications, a heartbeat failure may be detected due to the temporary network overload. Before the actual operation, test the
communication status under the actual network overload and make sure that a heartbeat failure is not detected. If the failure is detected,
refer to "11.3.1 Changing Time to Detect CF Heartbeat Timeout" and tune the cluster timeout value.
- When configuring the cluster system using the extended partitions in PRIMEQUEST 3000 series (except B model), up to 4 nodes can
be supported per cluster system.
Note
In the case of the single-node cluster operation
- Following messages of the shutdown facility and RMS are output, however, this is no problem, since the setting of the shutdown facility
is not performed.
- Messages of RMS:
(SCR,26): ERROR The sdtool notification script has failed with status 1 after dynamic
modification.
- 105 -
See
For information on the corrective action to be applied when the setting of the cluster interconnect fails, see "Chapter 8 Diagnostics and
troubleshooting" in "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide."
Table 5.4 Shutdown agent necessary if the host OS failover function is not used in the virtual machine environment
(KVM) (guest OS only)
Shutdown agent
libvirt
Server model
Panic Reset
(SA_libvirtgp) (SA_libvirtgr)
PRIMERGY Y Y
- 106 -
Shutdown agent
libvirt
Server model
Panic Reset
(SA_libvirtgp) (SA_libvirtgr)
PRIMEQUEST 2000 series Y Y
PRIMEQUEST 3000 series
Y: Necessary
When using the host OS failover function in virtual machine environment (KVM environment), set the following shutdown agents. The
shutdown agent that are set on the guest OS are the same as those used in the virtual machine function.
Table 5.5 Shutdown agent necessary if the host OS failover function is used in the virtual machine environment
(KVM)
Shutdown agent
Cluster
(SA_ipmi)
(SA_blade)
vmchkhost
node
Blade
IPMI
Server model
(SA_libvirtgr)
(SA_mmbp)
(SA_mmbr)
(SA_irmcp)
(SA_irmcr)
(SA_irmcf)
Poweroff
Reset
Reset
Reset
Panic
Panic
Panic
PRIMERGY
RX series Host OS Y - Y - - - - - - - -
TX series
BX series Y(* - Y - - - - - - - -
(Used with 1)
ServerView
Resource
Orchestrator
Virtual Edition)
BX series - Y Y - - - - - - - -
(Not used with
ServerView
Resource
Orchestrator
Virtual Edition)
All Guest OS - - - - - - - - Y Y Y
PRIMEQUEST
3000 series - - - - - Y Y Y - - -
All Guest OS - - - - - - - - Y Y Y
- 107 -
See
For details on the shutdown facility, see the following manuals:
Note
When SF calculates the survival priority, each node will send its survival priority to the remote node via the administrative LAN. If any
communication problem of the administrative LAN occurs, the survival priority will not be able to reach. In this case, the survival priority
will be calculated only by the SF node weight.
See
For details on the ShutdownPriority attribute of userApplication, see "D.1 Attributes available to the user" in "PRIMECLUSTER
Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
Survival scenarios
The typical scenarios that are implemented are shown below:
[Largest node group survival]
- 108 -
- Set the ShutdownPriority attribute of every userApplication to 0 (default).
- Set the "weight" of the node to survive to a value more than double the total weight of the other nodes.
- Set the ShutdownPriority attribute of every userApplication to 0 (default).
In the following example, node1 is to survive:
- 109 -
[Node survival in a specific order of node]
- Set the "weight" of the node to survive to a value more than double the total weight of the other nodes which have lower priority.
- Set the ShutdownPriority attribute of every userApplication to 0 (default).
In the following example, node1, node2, node3, and node4 are to survive in this order:
- Calculate the minimum value to be set to the ShutdownPriority attribute using the following formula. The value must be power
of 2 (1, 2, 4, 8, 16, ...) and equal to or larger than the calculated value.
The number of configuration node - 1
Example: In 2-node configuration, (2 - 1) = 1. The minimum settable value to ShutdownPriority attribute is 1.
Example: In 3-node configuration, (3 - 1) = 2. The minimum settable value to ShutdownPriority attribute is 2.
Example: In 4-node configuration, (4 - 1) = 3. The minimum settable value to ShutdownPriority attribute is 4.
The following example shows the survival priority of nodes on which userApplication runs. Sequentially app1, app2, and app3 are
prioritized.
- 110 -
[Host OS failover function]
- Set the "weight" of nodes to a power-of-two value (1,2,4,8,16,...) in ascending order of survival priority in each cluster system.
- The "weight" set to a guest OS should have the same order relation with a corresponding host OS.
For example, when setting a higher survival priority to host1 than host2 between host OSes, set a higher survival priority to
node1 (corresponding to host1) than node2-4 (corresponding to host2) between guest OSes.
- 111 -
5.1.2.2 Setup Flow for Shutdown Facility
- 112 -
When using in combination with ServerView Resource Orchestrator Virtual Edition
When using in combination with ServerView Resource Orchestrator Virtual Edition, for the setup flow for the shutdown facility in
PRIMERGY BX series, take the following steps.
When not using in combination with ServerView Resource Orchestrator Virtual Edition
When not using in combination with ServerView Resource Orchestrator Virtual Edition, for the setup flow for the shutdown facility in
PRIMERGY BX series, take the following steps.
- 113 -
7. Test for forced shutdown of cluster nodes
For the detail setup procedure, refer to "5.1.2.5 Setup Procedure for Shutdown Facility in PRIMEQUEST 3000 Series."
Note
When creating a redundant administrative LAN used in the shutdown facility by using GLS, set as below.
- 114 -
RX/TX series
Check the following settings in BMC(Baseboard Management Controller) or iRMC(integrated Remote Management Controller) necessary
for setting IPMI shutdown agent.
BX series (When using in combination with ServerView Resource Orchestrator Virtual Edition)
Necessary settings are the same as the settings of RX/TX series. Refer RX/TX series.
BX series (When not using in combination with ServerView Resource Orchestrator Virtual Edition)
Check the following settings for the management blade necessary for setting Blade shutdown agent.
RX/TX series, BX series (When using in combination with ServerView Resource Orchestrator Virtual
Edition)
CFNameX,weight=weight,admIP=myadmIP:agent=SA_ipmi,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_ipmi,timeout=timeout
Example:
node1,weight=1,admIP=10.20.30.100:agent=SA_ipmi,timeout=25
node2,weight=1,admIP=10.20.30.101:agent=SA_ipmi,timeout=25
- 115 -
BX series (When not using in combination with ServerView Resource Orchestrator Virtual Edition)
CFNameX,weight=weight,admIP=myadmIP:agent=SA_blade,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_blade,timeout=timeout
Example:
node1,weight=1,admIP=10.20.30.100:agent=SA_blade,timeout=20
node2,weight=1,admIP=10.20.30.101:agent=SA_blade,timeout=20
Note
- For using STP (Spanning Tree Protocol) in PRIMERGY, it is necessary to set the SF timeout value to the current value plus (+) 50
(seconds), taking into account the time STP needs to create the tree and an extra cushion. This setting also causes delays in failover
times.
- The contents of rcsd.cfg file must be same on all the nodes. If different, it does not work.
Information
When the "/etc/opt/SMAW/SMAWsf/rcsd.cfg" file is to be created, the "/etc/opt/SMAW/SMAWsf/rcsd.cfg.template" file can be used as
a prototype.
- 116 -
# /sbin/service ipmi start
Starting ipmi drivers: [ OK ]
[RHEL7]
Execute the following command on all the nodes to check the startup status of the IPMI service.
[RHEL7]
Make sure that the current IPMI service is enabled on all the nodes.
# sfcipher -c
Enter User's Password: <- enter bmcpwd$
Re-enter User's Password: <- enter bmcpwd$
/t1hXYb/Wno=
Note: It is not necessary to insert '\' in front of the special characters specified as the password.
For information on how to use the sfcipher command, see the "sfcipher" manual page.
Note
For the passwords specified when making IPMI (BMC and iRMC), seven-bit ASCII characters are available.
Among them, do not use the following characters as they may cause a problem.
- 117 -
- For IPv4 address
CFName1 ip-address:user:passwd {cycle | leave-off}
CFName2 ip-address:user:passwd {cycle | leave-off}
Example 1:
When the IP address of iRMC of node1 is 10.20.30.50, the IP address of iRMC of node2 is 10.20.30.51.
Example 2:
When the IP address of iRMC of node1 is 1080:2090:30a0:40b0:50c0:60d0:70e0:80f0, the IP address of iRMC of node2 is
1080:2090:30a0:40b0:50c0:60d0:70e0:80f1.
Information
When the "/etc/opt/SMAW/SMAWsf/SA_ipmi.cfg" file is to be created, the "/etc/opt/SMAW/SMAWsf/SA_ipmi.cfg.template" file
can be used as a prototype.
Note
- Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg file are correct. If there is an error in the setting
contents, the shutdown facility cannot be performed normally.
- Check if the IP address (ip-address) of IPMI (BMC or iRMC) corresponding to the cluster host's CF node name (CFNameX) of
the /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg file are set. If there is an error in the setting, a different node may be forcibly stopped.
- The contents of SA_ipmi.cfg file must be same on all the nodes. If different, it does not work.
- 118 -
Cluster configuration within a single chassis
management-blade-ip IPaddress
community-string SNMPcommunity
CFName1 slot-no {cycle | leave-off}
CFName2 slot-no {cycle | leave-off}
Example :
When the IP address of the management blade of node1 and node2 is 10.20.30.50, the slot number of node1 is 1 and the slot number of
node2 is 2.
management-blade-ip 10.20.30.50
community-string public
node1 1 cycle
node2 2 cycle
community-string SNMPcommunity
management-blade-ip IPaddress1
CFName1 slot-no {cycle | leave-off}
management-blade-ip IPaddress2
CFName2 slot-no {cycle | leave-off}
Note
SNMP community name of the management blade must be same in all the chassis.
Example:
When the IP address of the management blade of node1 is 10.20.30.50, and the slot number of node1 is 1.
Moreover, when the IP address of the management blade of node2 is 10.20.30.51, and the slot number of node2 is 2.
community-string public
management-blade-ip 10.20.30.50
node1 1 cycle
- 119 -
management-blade-ip 10.20.30.51
node2 2 cycle
Information
When the "/etc/opt/SMAW/SMAWsf/SA_blade.cfg" file is to be created, the "/etc/opt/SMAW/SMAWsf/SA_blade.cfg.template" file can
be used as a prototype.
Note
- Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_blade.cfg file are correct. If there is an error in the setting contents,
the shutdown facility cannot be performed normally.
- Check if the IP address (IPaddress) of the management blade and the slot number (slot-no) of the server blade corresponding to the
cluster host's CF node name (CFNameX) of the /etc/opt/SMAW/SMAWsf/SA_blade.cfg file are set. If there is an error in the setting,
a different node may be forcibly stopped.
- The contents of SA_blade.cfg file must be same on all the nodes. If different, it does not work.
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
If the following message is output, the setting file (rcsd.cfg) of the shutdown daemon has an error. Correct the file.
If the following message is output, the setting file (SA_ipmi.cfg or SA_blade.cfg) of the shutdown agent has an error. Correct the file.
In the environment where panicinfo_setup has already been executed, the following massage is output.
Note
To execute the command, CF and CF services (CFSH and CFCP) must be activated. For details, see "5.1.1 Setting Up CF and CIP."
PANICINFO_TIMEOUT 5
RSB_PANIC 0
After change
- 120 -
PANICINFO_TIMEOUT 10
RSB_PANIC 3
2. Change the timeout value of SA_lkcd in the /etc/opt/SMAW/SMAWsf/rcsd.cfg file as follows on all the nodes.
Before change
agent=SA_lkcd,timeout=20
After change
agent=SA_lkcd,timeout=25
- When not using in combination with ServerView Resource Orchestrator Virtual Edition in BX series
Change RSB_PANIC of /etc/opt/FJSVcllkcd/etc/SA_lkcd.tout as follows on all the nodes.
Before change
RSB_PANIC 0
After change
RSB_PANIC 2
# sdtool -s
If the shutdown facility has already been started, execute the following command to restart the shutdown facility on all the nodes.
# sdtool -e
# sdtool -b
If the shutdown facility has not been started, execute the following command to start the shutdown facility on all the nodes.
# sdtool -b
# sdtool -s
Information
Display results of the sdtool -s command
- If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
- If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node
displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, or network
resources being used by that agent.
- If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA
initialization. "Unknown" is displayed temporarily until the actual status can be confirmed.
- If "TestFailed" or "InitFailed" is displayed, check /var/log/messages. After the failure-causing problem is resolved and SF is restarted,
the status display changes to InitWorked or TestWorked.
- 121 -
Note
If "TestFailed" is displayed as the test status when "sdtool -s" is executed after the shutdown facility was started, it may be due
to the following reasons:
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
4. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
5. Execute the following command on all the nodes and make sure that the shutdown facility operates normally.
# sdtool -s
Note
When creating a redundant administrative LAN used in the shutdown facility by using GLS, set as below.
- 122 -
MMB check items
Check the following settings for MMB blade necessary for setting Blade shutdown agent.
- The "Privilege" setting of the user is set to "Admin" so that the user can control the MMB with RMCP.
- The "Status" setting of the user is set to "Enabled" so that the user can control the MMB with RMCP.
- The passwords for controlling MMB with RMCP must be specified seven-bit ASCII characters are available.
> < " / \ = ! ? ; , &
Check the settings for the user who uses RMCP to control the MMB. Log in to MMB Web-UI, and check the settings from the "Remote
Server Management" window of the "Network Configuration" menu.
If the above settings have not been set, set up the MMB so that the above settings are set.
Note
The MMB units have two types of users:
See
For how to set up and check MMB, refer to the following manual:
Checking the time to wait until I/O to the shared disk is completed (when using other than the ETERNUS disk
array as the shared disk)
When using any disks other than the ETERNUS disk array as the shared disk, to prevent the data error when the node is down due to a panic
or other causes, set the time until I/O to the shared disk is completed.
To set the wait time described in "5.1.2.4.5 Setting I/O Completion Wait Time", panic the node during I/O to the shared disk. After that,
check the time until I/O to the shared disk is completed.
1. Execute the "clmmbsetup -a" command on all the nodes, and register the MMB information.
For instructions on using the "clmmbsetup" command, see the "clmmbsetup" manual page.
# /etc/opt/FJSVcluster/bin/clmmbsetup -a mmb-user
Enter User's Password:
Re-enter User's Password:
For mmb-user and User's Password, enter the following values that were checked in "5.1.2.4.1 Checking the Shutdown Agent
Information."
- 123 -
mmb-user
User's name for controlling the MMB with RMCP
User's Password
User's password for controlling the MMB with RMCP.
Note
For the passwords specified when setting MMB, seven-bit ASCII characters are available.
Among them, do not use the following characters as they may cause a problem.
2. Execute the "clmmbsetup -l" command on all the nodes, and check the registered MMB information.
If the registered MMB information was not output on all the nodes in Step 1, start over from Step 1.
# /etc/opt/FJSVcluster/bin/clmmbsetup -l
cluster-host-name user-name
-----------------------------------
node1 mmb-user
node2 mmb-user
CFNameX,weight=weight,admIP=myadmIP:agent=SA_mmbp,timeout=timeout:agent=SA_mmbr,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_mmbp,timeout=timeout:agent=SA_mmbr,timeout=timeout
Example:
node1,weight=2,admIP=fuji2:agent=SA_mmbp,timeout=20:agent=SA_mmbr,timeout=20
node2,weight=2,admIP=fuji3:agent=SA_mmbp,timeout=20:agent=SA_mmbr,timeout=20
Note
- For the shutdown agents to be specified in the rcsd.cfg file, set both the SA_mmbp and SA_mmbr shutdown agents in that order.
- Set the same contents in the rcsd.cfg file on all the nodes. Otherwise, a malfunction may occur.
- 124 -
Information
When creating the /etc/opt/SMAW/SMAWsf/rcsd.cfg file, you can use the /etc/opt/SMAW/SMAWsf/rcsd.cfg.mmb.template file as a
template.
# /etc/opt/FJSVcluster/bin/clmmbmonctl
If "The devmmbd daemon exists." is displayed, the MMB asynchronous monitoring daemon has been started.
If "The devmmbd daemon does not exist." is displayed, the MMB asynchronous monitoring daemon has not been started. Execute the
following command to start the MMB asynchronous monitoring daemon.
# /etc/opt/FJSVcluster/bin/clmmbmonctl start
After setting the wait time, execute the following command to check if the specified value is set.
# /etc/opt/FJSVcluster/bin/cldevparam -p WaitForIOComp
value
Note
- When specifying the longer I/O completion wait time than the time to detect CF heartbeat timeout (default 10 seconds), the time to detect
CF heartbeat timeout must be changed as long as the current set time + I/O completion wait time + 3 seconds or more. This prevents
timeout of the CF heartbeat during the I/O completion wait time.
For how to change the time to detect CF heartbeat timeout, refer to "11.3.1 Changing Time to Detect CF Heartbeat Timeout."
- If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.
# sdtool -s
If the shutdown facility has already been started, execute the following command on all the nodes to restart the shutdown facility.
- 125 -
# sdtool -e
# sdtool -b
If the shutdown facility has not been started, execute the following command on all the nodes to start the shutdown facility
# sdtool -b
# sdtool -s
Information
Display results of the sdtool -s command
- If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
- If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node
displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, or network
resources being used by that agent.
- If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA
initialization. "Unknown" is displayed temporarily until the actual status can be confirmed.
- If "TestFailed" or "InitFailed" is displayed, check /var/log/messages. After the failure-causing problem is resolved and SF is restarted,
the status display changes to InitWorked or TestWorked.
Note
- If "TestFailed" is displayed and the message 7210 is output to /var/log/messages at the same time when "sdtool -s" is executed after the
shutdown facility was started, there may be an error in the settings as described below.
Make sure each setting is correctly set.
- If "sdtool -s" is executed immediately after the OS is started, "TestFailed" may be displayed in the Test State for the local node.
However, this state is displayed because the snmptrapd daemon is still being activated and does not indicate a malfunction. If "sdtool
-s" is executed 10 minutes after the shutdown facility is started, TestWorked is displayed in the Test State.
In the following example, "TestFailed" is displayed in the Test State for the local node (node1).
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
- 126 -
node1 SA_mmbp.so Idle Unknown TestFailed InitWorked
node1 SA_mmbr.so Idle Unknown TestFailed InitWorked
node2 SA_mmbp.so Idle Unknown TestWorked InitWorked
node2 SA_mmbr.so Idle Unknown TestWorked InitWorked
The following messages may be displayed in the syslog right after the OS is started by same reason as previously described.
These messages are also displayed because the snmptrapd daemon is being activated and does not indicate a malfunction. The following
message is displayed in the syslog 10 minutes after the shutdown facility is started.
- If "sdtool -s" is executed when MMB asynchronous monitoring daemon is started for the first time, "TestFailed" may be displayed. This
is a normal behavior because the settings are synchronizing between node. If "sdtool -s" is executed 10 minutes after the shutdown
facility is started, "TestWorked "is displayed in Test State field.
- If nodes are forcibly stopped by the SA_mmbr shutdown agent, the following messages may be displayed in the syslog. These are
displayed because it takes time to stop the nodes and do not indicate a malfunction.
If "sdtool -s" is executed after the messages above were displayed, KillWorked is displayed in the Shut State for the SA_mmbp.so. Then,
KillFailed is displayed in the Shut State for the SA_mmbr.so.
The following is the example of "sdtool -s" when the nodes (from node1 to node2) were forcibly stopped and the messages above were
displayed.
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
node1 SA_mmbp.so Idle Unknown TestWorked InitWorked
node1 SA_mmbr.so Idle Unknown TestWorked InitWorked
node2 SA_mmbp.so Idle KillWorked TestWorked InitWorked
node2 SA_mmbr.so Idle KillFailed TestWorked InitWorked
# sdtool -e
# sdtool -b
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
node1 SA_mmbp.so Idle Unknown TestWorked InitWorked
node1 SA_mmbr.so Idle Unknown TestWorked InitWorked
- 127 -
node2 SA_mmbp.so Idle Unknown TestWorked InitWorked
node2 SA_mmbr.so Idle Unknown TestWorked InitWorked
Note
- Note the following points when configuring the cluster system using the extended partitions (except B model).
- Up to 4 nodes can be supported per cluster system.
- VGA/USB/rKVMS of Home SB must be assigned to any one of the extended partitions (it can also be an extended partition not
configuring the cluster system). If VGA/USB/rKVMS of Home SB is "Free" without an assignment, the iRMC asynchronous
function cannot operate correctly.
For how to assign VGA/USB/rKVMS to the extended partitions, refer to the following manual:
- 128 -
- "FUJITSU Server PRIMEQUEST 3000 Series Business Model iRMC S5 Web Interface"
- The "Privilege" setting of the user is set to "Admin" so that the user can control MMB with RMCP.
- The "Status" setting of the user is set to "Enabled" so that the user can control MMB with RMCP.
- The passwords for controlling MMB with RMCP must be specified seven-bit ASCII characters are available.
> < " / \ = ! ? ; , &
To check the settings of the user who uses RMCP to control MMB, log in to MMB Web-UI, and check the settings from "Remote Server
Management" window of "Network Configuration" menu.
If the above settings have not been set, set up MMB so that the above settings are set.
Note
The MMB units have two types of users:
See
For how to set up and check MMB, refer to the following manual:
Checking the time to wait until I/O to the shared disk is completed (when using other than the ETERNUS disk
array as the shared disk)
When using any disks other than the ETERNUS disk array as the shared disk, to prevent the data error when the node is down due to a panic
or other causes, set the time until I/O to the shared disk is completed.
To set the wait time described in "5.1.2.5.5 Setting I/O Completion Wait Time", panic the node during I/O to the shared disk. After that,
check the time until I/O to the shared disk is completed.
Note
PRIMERGY is compatible with iRMC device, however, the IRMC shutdown agent cannot be used.
- 129 -
# /usr/bin/systemctl status ipmi.service
ipmi.service - IPMI Driver
Loaded: loaded (/usr/lib/systemd/system/ipmi.service; disabled)
Active: inactive (dead)
3. Execute clirmcsetup -a command on all the nodes, and register the iRMC information.
For instructions on using clirmcsetup command, see the clirmcsetup manual page.
For irmc-user and User's Password, enter the following values that were checked in "5.1.2.5.1 Checking the Shutdown Agent
Information."
irmc-user
User to control iRMC
User's Password
Password of the user to control iRMC
Note
For the passwords specified when setting iRMC, seven-bit ASCII characters are available.
Among them, do not use the following characters as they may cause a problem.
For mmb-user and User's Password, enter the following values that were checked in "5.1.2.5.1 Checking the Shutdown Agent
Information."
- 130 -
mmb-user
User to control MMB with RMCP
User's Password
Password of the user to control MMB with RMCP
Note
For the passwords specified when setting MMB, seven-bit ASCII characters are available.
Among them, do not use the following characters as they may cause a problem.
5. Execute clirmcsetup -l command on all the nodes, and check the registered MMB/iRMC information.
If the MMB/iRMC information registered in step 3 and 4 is not output on all the nodes, retry from step 1.
- PRIMEQUEST 3000 B model
# /etc/opt/FJSVcluster/bin/clirmcsetup -l
cluster-host-name irmc-user mmb-user
------------------------------------------------
node1 irmc-user *none*
node2 irmc-user *none*
# /etc/opt/FJSVcluster/bin/clirmcsetup -l
cluster-host-name irmc-user mmb-user
------------------------------------------------
node1 irmc-user mmb-user
node2 irmc-user mmb-user
CFNameX,weight=weight,admIP=myadmIP:agent=SA_irmcp,timeout=timeout:agent=SA_irmcr,timeout=timeout:ag
ent=SA_irmcf,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_irmcp,timeout=timeout:agent=SA_irmcr,timeout=timeout:ag
ent=SA_irmcf,timeout=timeout
- 131 -
Example (PRIMEQUEST 3000 B model):
node1,weight=2,admIP=fuji2:agent=SA_irmcp,timeout=20:agent=SA_irmcr,timeout=20
node2,weight=2,admIP=fuji3:agent=SA_irmcp,timeout=20:agent=SA_irmcr,timeout=20
node1,weight=2,admIP=fuji2:agent=SA_irmcp,timeout=20:agent=SA_irmcr,timeout=20:agent=SA_irmcf,timeou
t=20
node2,weight=2,admIP=fuji3:agent=SA_irmcp,timeout=20:agent=SA_irmcr,timeout=20:agent=SA_irmcf,timeou
t=20
Note
- For the shutdown agents to be specified in the rcsd.cfg file, set all of SA_irmcp, SA_irmcr, and SA_irmcf shutdown agents in that order.
- Set the same contents in the rcsd.cfg file on all the nodes. Otherwise, a malfunction may occur.
Information
When creating the /etc/opt/SMAW/SMAWsf/rcsd.cfg file, you can use the /etc/opt/SMAW/SMAWsf/rcsd.cfg.irmc.template file as a
template.
# /etc/opt/FJSVcluster/bin/clirmcmonctl
If "The devirmcd daemon exists." is displayed, the iRMC asynchronous monitoring daemon has been started.
If "The devirmcd daemon does not exist." is displayed, the iRMC asynchronous monitoring daemon has not been started. Execute the
following command to start the iRMC asynchronous monitoring daemon:
# /etc/opt/FJSVcluster/bin/clirmcmonctl start
After setting the wait time, execute the following command to make sure that the specified value is set.
# /etc/opt/FJSVcluster/bin/cldevparam -p WaitForIOComp
value
- 132 -
Note
- When specifying the longer I/O completion wait time than the time to detect CF heartbeat timeout (default 10 seconds), the time to detect
CF heartbeat timeout must be changed as long as the current set time + I/O completion wait time + 3 seconds or more. This prevents
timeout of the CF heartbeat during the I/O completion wait time.
For how to change the time to detect CF heartbeat timeout, refer to "11.3.1 Changing Time to Detect CF Heartbeat Timeout."
- If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.
# sdtool -s
If the shutdown facility has already been started, execute the following commands on all the nodes to restart the shutdown facility.
# sdtool -e
# sdtool -b
If the shutdown facility has not been started, execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
# sdtool -s
Information
Display results of the sdtool -s command
- If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
- If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node
displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, or network
resources being used by that agent.
- If "Unknown" is displayed as the stop or initial status, it means that the shutdown facility has still not executed node stop, path testing,
or the shutdown agent initialization. "Unknown" is displayed temporarily until the actual status can be confirmed.
- If "TestFailed" or "InitFailed" is displayed, check /var/log/messages. After the failure-causing problem is resolved and the shutdown
facility is restarted, the status display changes to InitWorked or TestWorked.
INFO: 3124 The node status is received. (node: nodename from: irmc/mmb_ipaddress)
If the message is not displayed, the firewall settings of the node may be incorrect. Check again the settings.
- 133 -
5.1.2.6 Setup Procedure for Shutdown Facility in Virtual Machine Environment
This section describes the setup procedure of the shutdown facility in the virtual machine environment.
Note
When creating a redundant administrative LAN used in the shutdown facility by using GLS, set as below.
- When building a cluster system between guest OSes on one host OS, see "3.2.1.2 Host OS setup (after installing the operating system
on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes without using the host OS failover function, see "3.2.2.2
Host OS setup (after installing the operating system on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes using host the OS failover function, see "3.2.3.1.4 Host OS
setup (after installing the operating system on guest OS)."
Also take the following steps to check that the setting of the sudo command is already completed.
This is necessary for the confirmed user to execute the command as the root user.
Note
Be sure to perform the following operations from 1. to 3. on all guest OSes (nodes).
- 134 -
1. Encrypt the password.
Execute the sfcipher command to encrypt the password that was checked in "5.1.2.6.1 Checking the Shutdown Agent Information."
For details on how to use the sfcipher command, see the manual page of "sfcipher."
# sfcipher -c
Enter User's Password:
Re-enter User's Password:
D0860AB04E1B8FA3
2. Set up the panicky shutdown agent (SA_libvirtgp) and reset shutdown agent (SA_libvirtgr).
Set up the panicky shutdown agent (SA_libvirtgp) and reset shutdown agent (SA_libvirtgr).
Create the /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg and the /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg as below.
Create the /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg and the /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg by using the root user
privilege, and change the permission of the file to 600.
Example:
When the guest OS domain name of node1 is domain1, and the IP address of the host OS on which node1 operates is 10.20.30.50.
Moreover, when the guest OS domain name of node2 is domain2, and the IP address of the host OS on which node2 operates is
10.20.30.51.
- /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg
node1 domain1 10.20.30.50 user D0860AB04E1B8FA3
node2 domain2 10.20.30.51 user D0860AB04E1B8FA3
- /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg
node1 domain1 10.20.30.50 user D0860AB04E1B8FA3
node2 domain2 10.20.30.51 user D0860AB04E1B8FA3
Note
- Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg file and the /etc/opt/SMAW/SMAWsf/
SA_libvirtgr.cfg file are correct. If there is an error in the setting contents, the shutdown facility cannot be performed normally.
- Check if the domain name (domainX) of the guest OS and the IP address (ip-address) of the host OS corresponding to the cluster
host's CF node name (CFNameX) of the /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg file and the /etc/opt/SMAW/SMAWsf/
SA_libvirtgr.cfg file are set. If there is an error in the setting, a different node may be forcibly stopped.
- The contents of the SA_libvirtgp.cfg, SA_libvirtgr.cfg, and rcsd.cfg files of all guest OSes (nodes) should be identical. If not, a
malfunction will occur.
- 135 -
3. Log in to the host OS
The shutdown facility accesses the host OS with SSH. Therefore, you need to authenticate yourself (create the RSA key) in advance,
which is required when using SSH for the first time.
On all guest OSes (nodes), log in to each host OS IP address (ip-address) set in the step 2. using each set user.
Execute the command as the root user access privilege.
Note
Be sure to perform the following operations from 2. to 3. on all guest OSes (nodes).
1. Set up the libvirt shutdown agent and check the information of the host OS.
Check the following information that are set in the libvirt shutdown agent:
- CF node name
2. Set up the vmchkhost shutdown agent.
Create /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg as described in the following.
Create the SA_vmchkhost.cfg using the root user access privilege and change the permission of the file to 600.
Example:
When the CF node name of the host OS on which node1 (CF node name of the guest OS) operates is hostos1, the IP address of the
host OS is 10.20.30.50, the CF node name of the host OS on which node2 (CF node name of the guest OS) operates is hostos2, and
the IP address of the host OS is 10.20.30.51.
- 136 -
Note
- Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg file are correct. If there is an error in the setting
contents, the shutdown facility cannot be performed normally.
- Check if the CF node name of the host OS (host-cfnameX) and the IP address of the host OS (ip-address) corresponding to the
CF node name (guest-cfnameX) of the guest OS (clutser host) of the /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg file are set.
If there is an error in the setting, the shutdown facility cannot be performed normally.
- The contents of the SA_vmchkhost.cfg file of all guest OSes (nodes) should be identical. If not, a malfunction will occur.
node1,weight=2,admIP=fuji2:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35:agent=SA_vmch
khost,timeout=35
node2,weight=1,admIP=fuji3:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35:agent=SA_vmch
khost,timeout=35
- 137 -
node1,weight=2,admIP=fuji2:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35
node2,weight=1,admIP=fuji3:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35
Note
- SA_libvirtgp shutdown agent must be set first followed by SA_libvirtgr, and then set SA_vmchkhost as the last of all in the rcsd.cfg
file.
- Set the same contents in the rcsd.cfg file on all the nodes. Otherwise, a malfunction may occur.
Information
When creating the /etc/opt/SMAW/SMAWsf/rcsd.cfg file, you can use the /etc/opt/SMAW/SMAWsf/rcsd.cfg.mmb.template file as a
template.
# sdtool -s
If the shutdown facility has already been started, execute the following command to restart the shutdown facility on all the nodes.
# sdtool -e
# sdtool -b
If the shutdown facility has not been started, execute the following command to start the shutdown facility on all the nodes.
# sdtool -b
# sdtool -s
Information
About the displayed results
- If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
- If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node
displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, network
resources, or the host OS being used by that agent.
- If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA
initialization. "Unknown" will be displayed temporarily until the actual status can be confirmed.
- 138 -
- When building a cluster system between guest OSes on one host OS, see "3.2.1.2 Host OS setup (after installing the operating
system on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes without using the host OS failover function, see "3.2.2.2
Host OS setup (after installing the operating system on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes using the host OS failover function, see "3.2.3.1.4 Host
OS setup (after installing the operating system on guest OS)."
After the failure-causing problem is resolved and SF is restarted, the status display changes to InitWorked or TestWorked.
5.1.2.6.6 Setting up the Host OS Failover Function to the Host OS (PRIMEQUEST only)
When using the host OS failover function in PRIMEQUEST, for linking with MMB asynchronous monitoring function or iRMC
asynchronous monitoring function, configure the host OS failover function to the host OS.
Set up this setting after setting libvirt shutdown agent and vmchkhost shutdown agent.
Note
Be sure to perform the following operations from 3 to 7 on all the host OSes (nodes).
(2) Set the sudo command so that the created user can execute the command as a root user.
Execute the visudo command by using the root command. Describe the following setting in the displayed setting file.
# sfcipher -c
Enter User's Password:
Re-enter User's Password:
D0860AB04E1B8FA3
- 139 -
4. Create /etc/opt/FJSVcluster/etc/kvmguests.conf.
Create /etc/opt/FJSVcluster/etc/kvmguests.conf with the following contents.
Create the kvmguests.conf file using the root user access privilege and change the permission of the file to 600.
When multiple guest OSes (the cluster nodes) are operating on a host OS that configures the cluster, describe all the guest OSes
configured the host OS failover function in this file.
Example: In a two-node configuration between guest OSes, two cluster systems are configured
# /opt/SMAW/SMAWsf/bin/sfkvmtool -c
NOTICE: The check of configuration file succeeded.
- 140 -
7. Start the shutdown facility
Check that the shutdown facility has already been started on all the nodes.
# sdtool -s
If the shutdown facility has already been started, execute the following on all the nodes to restart it.
# sdtool -e
# sdtool -b
If the shutdown facility has not been started, execute the following on all the nodes to start it.
# sdtool -b
Note
After shutting down a node (a guest OS) forcibly by SA_libvirtgp, the guest OS may be a temporary stopped state. (For example, when there
is no space in /var/crash on the host OS.) In the case, forcibly shutdown the guest OS by the virsh destroy command.
1. Initial setup
Set up the resource database that CRM manages.
Set up the CRM resource database from the CRM main window. Use the CRM main window as follows:
Operation Procedure:
1. Select PRIMECLUSTER -> Global Cluster Services -> Cluster Admin in the Web-Based Admin View operation menu.
- 141 -
2. When the "Cluster Admin" screen is displayed, select the crm tab.
Menu bar
This area displays the menu. See "7.1.2.1.3 Operations."
CRM tree view
This area displays the resources registered to CRM. The resources are displayed in a tree structure.
For details on the colors and status of the icons displayed in the tree, see "7.1.2.1 Displayed Resource Types."
CRM table view
This area displays attribute information for the resource selected in the CRM tree view. For information on the displayed information,
see "7.1.2.2 Detailed Resource Information."
- 142 -
Operation Procedure:
1. Select the Initial setup in the Tool menu.
Note
The Initial setup can be selected only if the resource database has not been set.
Cluster name
This area displays the names of the clusters that make up the resource database. The cluster names displayed here were defined
during CF configuration.
Node List
This area displays the list of the nodes that make up the resource database.
Note
Check that the nodes that were configured in the cluster built with CF and the nodes displayed here are the same.
- 143 -
If the nodes do not match, check the following:
- Whether all the nodes displayed by selecting the cf tab in the Cluster Admin screen are Up.
- Whether Web-Based Admin View is operating in all the nodes.
For instructions on checking this, see "4.3.3.2 Confirming Web-Based Admin View Startup."
Continue button
Click this button to set up the resource database for the displayed cluster.
Initial setup is executed on all the nodes displayed in the Node list.
Cancel button
Click this button to cancel processing and exit the screen.
3. Check the displayed contents, and click the Continue to start initial setup.
4. The screen below is displayed during execution of initial setup.
Note
- If a message appears during operation at the CRM main window, or if a message dialog box entitled "Cluster resource management
facility" appears, see "3.2 CRM View Messages" and "Chapter 4 FJSVcluster Format Messages" in "PRIMECLUSTER Messages."
- If you want to add, delete, or rename a disk class from the Global Disk Services screen after executing Initial Setup from the CRM main
window, close the Cluster Admin screen.
Note
When using Dell EMC PowerPath, complete the settings according to "Settings to Use Dell EMC PowerPath" in "PRIMECLUSTER Global
Disk Services Configuration and Administration Guide" before taking the following steps.
- 144 -
Operation Procedure:
1. Registering the network interface card
1. Confirm that all the nodes have been started in multi-user mode.
2. Perform the following procedure on any node in the cluster system.
1. Log in the node using system administrator access privileges.
2. Execute the "clautoconfig" command.
# /etc/opt/FJSVcluster/bin/clautoconfig -r -n
Note
- Do not execute the "clautoconfig" command on the node in which the "clautoconfig" command is being executed or on any other
node while the "clautoconfig" command is being executed. If you execute it, a shared disk device cannot be registered correctly.
If you have executed it, execute the following operation on all the nodes that constitute the cluster system to re-execute "5.1.3
Initial Setup of the Cluster Resource Management Facility" described in this chapter:
1. Reset the resource database using the "clinitreset" command. For details on this command, see the manual pages of
"clinitreset".
# ifconfig eth1 up
[RHEL7 or later]
1. Log in any one of the nodes of the cluster system using system administrator access privileges.
2. Set the disk for performing the mirroring among servers.
For performing the mirroring among servers, set the local disk device to be accessed from each node as an iSCSI device.
For details, see "Disk Setting for Performing Mirroring among Servers" in "PRIMECLUSTER Global Disk Services Configuration
and Administration Guide."
By this setting, the target disk device can be used from each node as the shared disk device is used. For the procedure below, describe
the iSCSI device in the shared disk definition file.
- 145 -
3. Create a shared disk configuration file in the following format.
The configuration file defines settings of a shared disk connected to all the nodes of the cluster system.
Create a shared disk definition file with an arbitrary name.
- Define "resource key name device name node identifier" for each shared disk in one row.
- "resource key name", "device name", and "node identifier" are delimited by a single space.
- Set up resource key name, device name and node identifier as follows;
Resource key name
Specify a resource key name that indicates the sharing relationship for each shared disk. You need to specify the same name
for each disk shared between nodes. The resource key name should be specified in the "shd number" format. "shd" is a fixed
string. For "number", you can specify any four-digit numbers. If multiple shared disks are used, specify unique numbers for
each shared disk.
Device name
Specify a device name by the full device path of the shared disk.
/dev/sdb
Note
Node identifier
Specify a node identifier for which a shared disk device is available. Confirm the node identifier by executing the "clgettree"
command. For details on this command, see the manual pages of "clgettree".
(Example) node1 and node2 are node identifiers in the following case:
# /etc/opt/FJSVcluster/bin/clgettree
Cluster 1 cluster
Domain 2 PRIME
Shared 7 SHD_PRIME
Node 3 node1 ON
Node 5 node2 ON
- 146 -
The following example shows the configuration file of the shared disk when shared disks /dev/sdb and /dev/sdc are shared
between node1 and node2.
- When adding a shared disk device and registering the added shared disk device on the resource database, define only the
information of the added shared disk device.
Example: When registering the added disk device /dev/sdd (*1) on the resource database after
shd0001 and shd0002 are already registered on the resource database:
(*1) Note
The device name of the added shared disk device may not follow the device name of the registered device in alphabetical
order. Make sure to check the device name of the added shared disk device before defining the information of the added disk
device.
4. Execute the "clautoconfig" command to register the settings of the shared disk device that is stored in the configuration file in the
resource database.
Specify the "clautoconfig" command in the following format:
(Format)
(Example)
# /etc/opt/FJSVcluster/bin/clautoconfig -f /var/tmp/diskfile
Note
- If the "clautoconfig" command ends abnormally, take corrective action according to the error message. For details on the
messages of this command, see "PRIMECLUSTER Messages."
- This command does not check whether the shared disk defined in the configuration file is physically connected.
- If the device name of the shared disk device varies depending on a node, execute the "clautoconfig" command on the nodes in
which all the device files written in the shared disk configuration file exist. If a device file written in the shared disk configuration
file does not exist on the node in which the "clautoconfig" command is executed, the resource registration fails and the following
message is displayed.
For details, see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
- If you found an error in the shared disk configuration file after executing the "clautoconfig" command, reset the resource database
by executing the "clinitreset" command and restart the node.
3. Registration completed
When the initial setup and automatic configure are completed, the following screen appears.
- 147 -
4. Checking registered resource
When automatic configuration is completed, go to the CRM main window and confirm that the resource registration is completed by
checking the following.
- There is a connection path failure between a host device and a disk array unit.
- A disk array unit is not ready.
- A network adapter failed.
- A network adapter driver failed.
If the resources are not registered correctly, first review the above causes.
Note
- If a message appears during operation at the CRM main window, or if a message dialog box entitled "Cluster resource management
facility" appears, see "3.2 CRM View Messages" and "Chapter 4 FJSVcluster Format Messages" in "PRIMECLUSTER Messages."
- If you want to add, delete, or rename a disk class from the Global Disk Services screen after executing Initial Setup from the CRM main
window, close the Cluster Admin screen.
- 148 -
6750 A resource failure occurred. SysNode:node1RMS userApplication:app0 Resource:apl1
The operator intervention request function displays a query-format message to the operator if a failed resource or a node in which RMS has
not been started is found when a cluster application is started. The messages for operator intervention requests are displayed to syslogd(8)
and Cluster Admin.
1421 The userApplication "userApplication" did not start automatically because not all of the nodes
where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster,
manually shutdown any nodes where it is not started and then perform it.For a forced online,
there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
See
For details on the messages displayed by the fault resource identification function and the messages displayed by the operator intervention
request function, see "3.2 CRM View Messages" and "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Note
To view the manual pages of each command, add "/etc/opt/FJSVcluster/man" to the MANPATH variable.
Preparation prior to displaying fault resource identification and operator intervention request messages
The fault resource identification and operator intervention request messages are displayed by using syslogd(8) / rsyslogd(8). daemon.err is
specified to determine the priority (facility.level) of the fault resource identification and operator intervention request messages.
For details on the priority, see the manual page describing syslog.conf(5) / rsyslogd.conf(5).
If the fault resource identification and operator intervention request messages need to be output to the console, execute the following
procedure on all the nodes.
Procedure:
daemon.err /dev/console
2. If daemon.err is not set to be displayed on the console, change the setting of rsyslogd in /etc/rsyslog.conf.
To enable this change, restart the system log daemon by executing the following command.
# /etc/init.d/rsyslog restart
- RHEL7
1. Check the setting of rsyslogd in /etc/rsyslog.conf to see that daemon.err is set to be displayed on the console.
(Example) Daemon.err is set to be displayed on the console.
- 149 -
daemon.err /dev/console
2. If daemon.err is not set to be displayed on the console, change the setting of rsyslogd in /etc/rsyslog.conf.
To enable this change, restart the system log daemon by executing the following command.
# xterm -C
Identifying the fault resource and changing the operation setting of operator intervention request
Use the clsetparam(1M) command to change the setting. For details, see the manual page of clsetparam(1M).
- 150 -
Chapter 6 Building Cluster Applications
The procedure for building a cluster application is shown below.
Note
When using RMS, make sure to configure the cluster application.
- 151 -
Execution Required/ Manual reference
Work item
Nodes optional location*
when the GLS
redundant line
control
function is
used)
(3) 6.3 GDS Configuration Setup All nodes Optional GDSG "Chapter 5
(required Operation"
when GDS is
used)
(4) 6.4 Initial GFS Setup All nodes Optional GFSG
(required
when GFS is
used)
(5) 6.5 Setting Up the Application Environment All nodes Required Manuals for each
application
(6) 6.6 Setting Up Online/Offline Scripts All nodes Optional RMS "2.9 Environment
variables,"
"12 Appendix -
Environment variables"
(7) 6.7.1 Starting RMS Wizard All nodes Required -
6.7.2 Setting Up userApplication
6.7.3 Setting Up Resources
6.7.4 Generate and Activate
6.7.5 Registering the Cluster Service of a All nodes Optional
PRIMECLUSTER-compatible product (required
when a
PRIMECLUS
TER-
compatible
product is
used)
(8) 6.8 Setting Up the RMS Environment All nodes Required RMS "2.9 Environment
variables,"
"12 Appendix -
Environment variables"
(9) 6.9 Checking the Cluster Environment All nodes Required -
- RMS: PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide
- GDSG: PRIMECLUSTER Global Disk Services Configuration and Administration Guide
- GFSG: PRIMECLUSTER Global File Services Configuration and Administration Guide
- GLSR: PRIMECLUSTER Global Link Services Configuration and Administration Guide: Redundant Line Control Function
- 152 -
6.1 Initial RMS Setup
When RMS is to be used, you must first check "Setup (initial configuration)" of PRIMECLUSTER Designsheets and change the following
environment variable as required:
See
For information on how to check and change the environment variables of RMS automatic startup, see "7.1.2 Starting RMS automatically
at boot time" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
Note
Although it is possible to have "takeover network" for PRIMECLUSTER and "IP address takeover" for GLS on the same cluster system,
you must not configure them on the same interface. If you do so, the communication through "takeover IP address" will be disabled.
For example, when you select 'eth1' for the interface when you set "takeover network" for PRIMECLUSTER, do not use 'eth1' for GLS
environment settings (do not specify 'eth1' by using the '-t' option for "hanetconfig create" command).
When you need to duplex the interface for a takeover network, use "IP address takeover" for GLS. You cannot set "takeover network" for
the bonding interface.
The setup values correspond to the values in "Setup (GLS_Monitoring Parameter)", "Setup (GLS_Virtual Interface)", "Setup (GLS_GS
Linkage Mode Monitoring Destination Information)", and "Setup (GLS_Common Parameter)" of PRIMECLUSTER Designsheets.
- 153 -
Operation Procedure:
If the OPERATING node is [HOST-primecl01]
2. Specify the IP address specified in step 1-1 above to the /etc/sysconfig/network-scripts/ifcfg-ethX (X is either 0 or 1) file.
- Contents of /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
HOTPLUG=no
IPADDR=10.34.214.181
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
- Contents of /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
HOTPLUG=no
ONBOOT=yes
TYPE=Ethernet
- 154 -
Note
Add "HOTPLUG=no" to the settings for the physical interfaces bundled by GLS (/etc/sysconfig/network-scripts/ifcfg-ethX
file). This setting is not necessary when bundling the tagged VLAN interface.
Information
Setting of "HOTPLUG=no" does not disable the PCI hot plug function.
You can perform hot maintenance for NIC (PCI card) to the physical interfaces with "HOTPLUG=no."
2. Restarting
Run the following command and restart OS. After restarting OS, verify eth0 is enabled using the "ip(8)" or the "ifconfig(8)"
command.
# /sbin/shutdown -r now
# /opt/FJSVhanet/usr/sbin/hanetmask print
Note
For details on the subnet mask value, see "hanetmask command" in "PRIMECLUSTER Global Link Services Configuration and
Administration Guide: Redundant Line Control Function."
# /opt/FJSVhanet/usr/sbin/hanetconfig print
# /opt/FJSVhanet/usr/sbin/hanetpoll print
# /opt/FJSVhanet/usr/sbin/hanetconfig print
- 155 -
7. Creating of the takeover IP address (takeover virtual Interface)
# /opt/FJSVhanet/usr/sbin/hanethvrsc create -n sha0
# /opt/FJSVhanet/usr/sbin/hanethvrsc print
- Contents of /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
HOTPLUG=no
IPADDR=10.34.214.182
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
- Contents of /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
HOTPLUG=no
ONBOOT=yes
TYPE=Ethernet
2. Restarting
Run the following command and restart OS. After restarting OS, verify eth0 is enabled using the "ip(8)" or the "ifconfig(8)"
command.
# /sbin/shutdown -r now
# /opt/FJSVhanet/usr/sbin/hanetmask print
Note
For details on the subnet mask value, see "hanetmask command" in "PRIMECLUSTER Global Link Services Configuration and
Administration Guide: Redundant Line Control Function."
- 156 -
# /opt/FJSVhanet/usr/sbin/hanetconfig print
# /opt/FJSVhanet/usr/sbin/hanetpoll print
# /opt/FJSVhanet/usr/sbin/hanetconfig print
# /opt/FJSVhanet/usr/sbin/hanethvrsc print
Post-setup processing
After the OPERATING and STANDBY node setup is done, create the Gls resources and register them to the cluster application.
For details, see "6.7.3.5 Setting Up Gls Resources" and "6.7 Setting Up Cluster Applications."
Then, start RMS and check the RMS tree to confirm whether the Gls resources are displayed correctly. For details, see "7.1.3.1 RMS Tree."
The Gls resource name is displayed as GlsX (X is integer).
See
For information on GLS (redundant line control function) and other operation modes, see "PRIMECLUSTER Global Link Services
Configuration and Administration Guide: Redundant Line Control Function."
See
For setup details, see "2.3 Setup with GLS" in "PRIMECLUSTER Web-Based Admin View Operation Guide."
- 157 -
- "6.3.2 Setting Up Shared Disks"
When using the shared disk, set up the shared disk volumes.
Add this setting also when performing the mirroring among servers.
Note
- If you are using a shared disk unit, you must use GDS to manage that unit.
- Execute the configuration setting of GDS after initializing the cluster.
- To use EC or REC function of the ETERNUS Disk storage systems without using PRIMECLUSTER GD Snapshot, do not add a GDS
class that includes a copy destination disk of EC or REC to a cluster application.
When EC or REC is either the synchronous processing in process or equivalency maintain status, a program running on the server may
fail to access the destination disk with error. Therefore, if the class that includes the copy destination disk is added to a cluster
application, the program running on the server may fail to access the destination disk. This may lead to a failover of the cluster
application.
See
For setup details, see "System Disk Mirroring Settings [EFI]" in "PRIMECLUSTER Global Disk Services Configuration and
Administration Guide."
Note
To mirror the system disk of a guest OS by using GDS in KVM environment, you need to configure a mirror volume of a local class or a
shared class, which is created on the host OS, for the guest OS. For information on how to set up the host OS, see the following:
- When building a cluster system between guest OSes on one host OS, see "1. Setting up disks and related devices" in "3.2.1.1 Host OS
setup (before installing the operating system on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes without using Host OS failover function, see "1. Setting up
disks and related devices" in "3.2.2.1 Host OS setup (before installing the operating system on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes using Host OS failover function, see "1. Setting up disks
and related devices" in "3.2.3.1.3 Host OS setup (before installing the operating system on guest OS)."
For details on settings, see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
Operation Procedure:
- 158 -
1. Select Global Disk Services at the Web-Based Admin View top screen.
The GDS Management screen (hereinafter main screen) is displayed.
2. From the GDS configuration tree, select the node in which the system disk mirror is to be set, click the Settings menu, and select
System Disk Settings.
A list of disks that can be used for mirrored disks for the selected node is displayed.
- 159 -
Select the system disk ("Physical disk name" on the designsheet), and click Next.
- 160 -
Note
Specify the class name so that the class names of the root class are not duplicated among cluster nodes.
- 161 -
5. Select a Spare Disk ("Spare disk name" on the designsheet) from the "Physical Disk List," and click Add.
Check that the spare disk that was selected is registered to "Spare Disk," and then click Next.
If a spare disk is unnecessary, go to Step 6.
- 162 -
6. Check the system disk configuration.
Check the physical disk name and the mirror disk name, and then click Create.
- 163 -
After creation of the system disk is completed, the following screen is displayed.
Check the screen contents, and then click OK.
Set up mirroring for the system disk of primecl02 on each node, and then, restart all the nodes.
- 164 -
- When the ext3 file system is to be used
1. Execute "Volume setup."
2. Execute "File system setup."
3. Create a Gds resource and register it to a cluster application.
For details, see "6.7.3.4 Setting Up Gds Resources" and "6.7 Setting Up Cluster Applications."
Note
- "When the GFS Shared File System is to be used" and "When the file system is not to be used," "File system setup" is not necessary.
- The setup procedures for "When the ext3 file system is to be used" and "When the file system is not to be used" must be carried out before
the Gds resources are set up. For details, see "6.7.3.3 Preliminary Setup for Gds Resources."
- "When the GFS Shared File System is to be used," "6.7.3.4 Setting Up Gds Resources" must not be carried out.
- The local class disks or shared class disks used by GDS on the guest OS should be configured as the following virtual disks if they are
used in the virtual machine environment.
- KVM environment
virtio-SCSI devices or virtio block devices
Volume setup
There are five types of volumes:
a. Single volume
b. Mirror volume
c. Stripe volume
d. Volume created in a concatenation group
e. Netmirror volume
This section separately describes the volume setup procedures for a single volume (a) and for other volumes (b, c, d, e). For details, see
"Settings of Class, Group and Volume" in "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
The values to be set for the individual items correspond to the values in "Setup (GDS Local Class)" and "Setup (GDS Shared Class)" of
PRIMECLUSTER Designsheets.
Note
- If you plan to add, delete, or rename a disk class from the GDS Management screen (hereinafter main screen), close the Cluster Admin
screen before starting the operation.
- When neither the system nor the GDS Management screen are reactivated after "2. Registering a shared disk" of "5.1.3.2 Registering
Hardware Devices," the registered shared disk might not be correctly recognized to GDS. In this case, setup the volume after updating
physical disk information. Physical disk information can be updated by selecting Update Physical Disk Information from Operation
menu of the main screen.
- 165 -
Single volume setup
If you are not using a single volume, this setup is unnecessary.
Operation Procedure:
At the above screen, select the physical disk to be registered from the Physical Disk list, and then click Add. When Add is clicked,
the Class Attributes Definition screen opens. Enter the Class Name but do not change the Type value (leave the value as "shared").
- 166 -
Set Disk Type to "single," and then click OK.
4. Volume creation
Select Settings -> Volume Configuration, and then select the disk that was registered in Step 2 from the Group and Disk List.
Select "Unused" in the volume diagram, and enter the Volume Name, the Volume Size, and the volume attributes.
Click Add to enable the settings.
- 167 -
Check the settings, and then click Exit.
- 168 -
At the above screen, select the physical disks to be registered from "Physical Disk" list, and then click "Add". When "Add" is
clicked, the Class Attributes Definition screen opens. Enter "Class Name" but do not change "Type" value (leave the value as
"shared"). Then click "Exit".
- 169 -
At the above screen, select the disks to be added to the group from "Class Configuration Disk/Group" list, and then click "Add".
Enter "Group Name", "Type", and "Stripe width" in the Group Attributes Definition screen, and then click "OK".
For the mirroring among servers, select "netmirror" for "Type".
Enter "Stripe width" only when selecting "stripe" for "Type".
3. Creating a volume
Click the Volume Configuration tab, and select the group that was created in Step 2 from the Group and Disk List. Select Unused
in the volume diagram, and enter the Volume Name, the Volume Size, and the volume attributes.
Click Add to enable the settings.
Check the setup information, and then click Exit.
- 170 -
4. Checking the configuration
The disk configuration is displayed as shown below.
- 171 -
File system setup
Create a file system for each created volume.
Example: class name = Class1, volume name = Volume1, and file system type = ext3
See
For how to create file system, see the file system manual.
See
The volume is started by the [Start Volume] of [Operation] menu of GDS management view or the "sdxvolume -N" command.
For details, see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
- 172 -
To use the GFS shared file system in RMS cluster operation, you need to set up GFS according to the flow below:
The device name and mount points that are specified here correspond to the values in "Setup (GFS Shared File System)" and "Setup (GFS
Shared File System 2)"of PRIMECLUSTER Designsheets.
Note
- You need to prepare a management partition that is exclusive to the GFS shared file system. The GDS volume disk class is used for a
switching file system and non-switching file system. For the management partition, non-switching file system must be allocated.
- If you are using a GFS shared file system, you must not carry out "6.7.3.4 Setting Up Gds Resources."
Operation Procedure:
1. Create a management partition for the GFS shared file system on any one of the nodes.
# sfcsetup -c /dev/sfdsk/class0001/dsk/GFSctl
primecl01# sfcfrmstart
primecl02# sfcfrmstart
- 173 -
Note
If sfcfrmstart ends abnormally, confirm that sfcprmd is started with the "ps" command. If sfcprmd has not been started, execute the
following command on the node on which sfcprmd is not started:
- For RHEL6
- For RHEL7
5. Add the mount information of the GFS shared file system to /etc/fstab on each node. Specify "noauto" in the "mount options" filed
of the mount information. Do not specify "noatrc" in the same field.
See
The operations described in procedures 4, 5, and 6 can be set up by using the GUI management view. For details, see "6.4.1 File System
Creation."
Operation Procedure:
1. Start the GFS management view.
Choose Global File Services on the Web-Based Admin screen, select a node from the node list, and then display the main screen of
Global File Services.
- 174 -
Selecting "Node name"
Select the node names to be shared with "Node Names." You must select two nodes.
Note that the selection of the local node (displayed node) cannot be canceled.
Selecting a "Host name"
To select a host name other than that which is currently displayed, click the Select button and specify the host name of the LAN
to be used on each node. Note that two or more host names cannot be specified.
Setting the "Primary MDS" and "Secondary MDS"
Specify the nodes that boot the management server of the shared file system in "Primary MDS" and "Secondary MDS."
Setting the "Mount point" and "Make directory"
Specify the full path for the "Mount point." If you select "yes" from "Make directory," creates a directory with the following
attributes:
- Owner: root
- Group: sys
- Access authority: 775
After setting or changing this information, click the Next button to open the "Create File System Wizard (2)."
To return each setup item to its default value, click the Reset button.
To stop the processing of the file system creation, click the Cancel button.
- 175 -
Select the partition to be used from the [Candidate partitions] list and then click the Add button.
Only one partition can be selected at a time. A partition that is already being used as a file system or as a management partition cannot
be selected.
After the partition has been selected, click the Next button to open the "Create File System Wizard (3)."
To return to the "Create File System Wizard (1)," click the Back button.
To abandon file system creation, click the Cancel button.
- 176 -
After setting the above information, click the Next button to open the "Create File System Wizard (4)."
No information can be set with the "Create File System Wizard (4)." Go to the "Create File System Wizard (5)."
To return each setup item to its default value, click the Reset button.
To return to "Create File System Wizard (2)," click the Back button.
To abandon file system creation, click the Cancel button.
To create the file system while leaving the default settings of the extended, detailed, and mount information as is, click the Create
button.
- 177 -
After setting the above information, click the Next button to open the "Create File System Wizard (6)."
To return each setup item to its default value, click the Reset button.
To return to the "Create File System Wizard (4)," click the Back button.
To abandon file system creation, click the Cancel button.
To create the file system while leaving the default setting of the mount information as is, click the Create button.
- 178 -
After setting the above information, click the Create button to create the file system. To return each setup item to its default value,
click the Reset button.
To return to the "Create File System Wizard (5)," click the Back button.
To abandon file system creation, click the Cancel button.
See
See the manuals for the individual applications.
- 179 -
Note
Environment variables set in each server ("/etc/profile" or "etc/bashrc", for example) are not guaranteed to be inherited by Online, Offline,
and Check scripts. Therefore, make sure to define the environment variables used with these scripts in each script.
Sample scripts
This section shows samples of the Online and Offline scripts, which are set as Cmdline resources.
Start script/Stop script
#!/bin/sh
#
# Script.sample
# Sample of Online/Offline Script
#
# Copyright(c) 2003 FUJITSU LIMITED.
# All rights reserved.
#
# $1 -c : OnlineScript
# -u : OfflineScript
The above script sample covers both the Start script and the Stop script.
An example of Check script is shown below:
Check script
#!/bin/sh
#
# Script.sample.check
# Sample of Check script
#
# Copyright(c) 2003 FUJITSU LIMITED.
# All rights reserved.
#
# Check the current state of target resource.
# If status is Online:
exit 0
- 180 -
Notes on script creation
Hot-standby operation
To enable hot-standby operation of the Cmdline resources, the following must be prepared:
#!/bin/sh
#
# Script.sample
# Sample of Online/Offline Script
#
# Copyright(c) 2003 FUJITSU LIMITED.
# All rights reserved.
#
# $1 -c : OnlineScript
# -u : OfflineScript
The following example shows Check script that supports hot-standby operation.
Check script (hot-standby operation)
#!/bin/sh
#
# Script.sample.check
# Sample of Check script
#
# Copyright(c) 2003 FUJITSU LIMITED.
# All rights reserved.
#
# If status is Online:
exit 0
# If status is Standby:
- 181 -
exit 4
# If status is Faulted:
exit 2
# If status is Offline:
exit 1
- 182 -
Environment variable Outline
HV_APPLICATION This variable sets the userApplication name that the resource belongs to.
Example: app1
HV_AUTORECOVER The value of this variable indicates whether the script is triggered by
AutoRecover or not (1 or 0). For details on AutoRecover, see "Appendix D
Attributes" in "PRIMECLUSTER Reliant Monitor Services (RMS) with
Wizard Tools Configuration and Administration Guide."
0: Not triggered by AutoRecover
1: Triggered by AutoRecover
HV_FORCED_REQUEST This variable sets a value that indicates whether or not forced failover was
requested by operator intervention.
0: Forced failover was not requested.
1: Forced failover was requested.
HV_NODENAME This variable contains the resource name.
Example) ManageProgram000_Cmd_APP1,
RunScriptsAlways000_Cmd_APP1
HV_OFFLINE_REASON This variable sets the trigger for bringing the resource Offline.
SWITCH: The resource was set to Offline because of a userApplication
switchover request (hvswitch).
STOP: The resource was set to Offline because of a userApplication stop
request (hvutil -f, hvutil -c)
FAULT: The resource was set to Offline because of a resource fault.
DEACT: The resource was set to Offline because of a userApplication
deactivate request (hvutil -d)
SHUT: The resource was set to Offline because of an RMS stop request
(hvshut)
HV_SCRIPT_TYPE This variable sets the type of script that was executed.
Online: Online script
Offline: Offline script
HV_LAST_DET_REPORT This variable sets the state of the current resources.
Online: Online state
Offline: Offline state
Standby: Standby state
Faulted: Faulted state
Warning: Warning state
HV_INTENDED_STATE This variable sets the resource state that is expected after state transition is
completed.
Online: Online state
Offline: Offline state
Standby: Standby state
Faulted: Faulted state
Warning: Warning state
NODE_SCRIPTS_TIME_OUT This variable sets the timeout duration (seconds) of the script.
Example: 300
- 183 -
See
- For details on hvenv.local, see "1.9 Environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools
Configuration and Administration Guide."
- For details on the RMS environment variables, see "Appendix E Environment variables" in "PRIMECLUSTER Reliant Monitor
Services (RMS) with Wizard Tools Configuration and Administration Guide."
- 184 -
- To create two cluster applications, repeat steps 2. to 3.
- 185 -
5) Priority transfer of standby operation
Set up the priority transfer of standby operation as follows.
6) Scalable operation
Set up a scalable operation as follows.
- 186 -
- Before you create cluster applications as part of scalable operation, create cluster applications in standby operation that act as the
constituent factors of the cluster applications in scalable operation. To create cluster applications in standby operation, repeat steps 2.
to 3.
Example 1) For scalable operation with three nodes, repeat steps 2. and 3. three times to create three cluster applications of standby
operation.
Example 2) For high-availability scalable 1:1 standby (standby operation), repeat steps 2. and 3. once to create 1:1 standby cluster
applications.
See
- After you finish setting up the cluster application, start the cluster applications. For instructions on starting the application, see "7.2.2.1
Starting a Cluster Application."
- For instructions on changing a cluster application, see "10.3 Changing the Cluster Configuration." For instructions on deleting a cluster
application, see "10.2 Deleting a Cluster Application."
- For the setting contents of a cluster application depending on the operation, and notes on its setting, see "6.10 Setting Contents and Notes
on Cluster Application."
Note
- Set up the cluster application and resources based on the cluster application and resource information in "Setup (cluster application)"of
PRIMECLUSTER Designsheets that was created in the design stage, and the sheet corresponding to each resource. If you need to
change the cluster application after it is created, the designsheets are helpful. Make sure to create the designsheets before performing
necessary operation.
- 187 -
- Set up "remote file copy" and "remote command execution" for the RMS Wizard. See the notes on "5.1.1 Setting Up CF and CIP."
If the cluster interconnect is not protected by security, cancel the "remote file copy" and "remote command execution" settings on all
the cluster nodes after setting up the cluster applications.
# /opt/SMAW/SMAWRrms/bin/hvw -n testconf
Note
About the name of userApplication
The character string set by ApplicationName menu of the hvw command is converted to lower case, and used for the cluster application
name.
ApplicationName must satisfy all the conditions below:
Operation Procedure:
1. Select "Application-Create" from the "Main configuration menu."
- 188 -
2. Select "STANDBY" from the "Application type selection menu."
Note
When configuring with the following PRIMECLUSTER Wizard products, refer to the manual of each product.
3. Next, "turnkey wizard "STANDBY"" will be output. Select "Machines+Basics" and then set up userApplication.
4. The userApplication setup page will appear. Set up the following for the userApplication:
- Nodes that constitute the userApplication
- Attributes of the userApplication
Set up the nodes that constitute userApplication by selecting "Machines[number]" and then a SysNode name on the subsequent screen
that is displayed.
The procedures for setting up the nodes that constitute a userApplication and cluster application priority are explained for each
topology, below.
- 189 -
Topology How to set up userApplication configuration nodes and cluster
application priority
(For the second userApplication)
In "Machines[0]," specify a SysNode that is Online when the userApplication
first starts up. For this SysNode, specify the SysNode specified for "Machines[1]"
when the first userApplication was set up.
In "Machines[1]," specify a SysNode that is in standby status or Offline when the
userApplication first starts up. Specify the SysNode specified in "Machines[0]"
when the first userApplication was set up.
N:1 standby (For the first userApplication)
In "Machines[0]," specify a SysNode that is Online when the userApplication
first starts up.
In "Machines[1]," specify a SysNode that is in standby status or Offline when the
userApplication first starts up.
(For the second or subsequent userApplication)
In "Machines[0]," specify a SysNode that is Online when the userApplication
first starts up. For this, specify a SysNode other than that previously specified for
"Machines[0]" or "Machines[1]" when the userApplication was set up.
In "Machines[1]," specify a SysNode that is in standby status or Offline when the
userApplication first starts up.
For this, specify the same SysNode as that previously specified in "Machines[1]"
when the userApplication was set up.
Cascaded In "Machines[0]," specify a SysNode that is Online when the userApplication
first starts up.
For "Machines[1]" or later, specify a SysNode that is in standby status or Offline
when the userApplication first starts up.
State transition occurs in ascending order of the numbers specified for
"Machines[number]."
Example) When there are four nodes, state transition occurs in the order shown
below:
"Machines[0]" -> "Machines[1]" -> "Machines[2]" -> "Machines[3]"
Priority transferring (For the first userApplication)
In "Machines[0]," specify a SysNode that is Online when the userApplication
first starts up.
For "Machines[1]" or later, specify a SysNode that is in standby status or Offline
when the userApplication first starts up.
(For the second or subsequent userApplication)
In "Machines[0]," specify a SysNode that is Online when the userApplication
first starts up. For this, specify a SysNode other than that previously specified in
"Machines[0]" when the userApplication was set up.
For "Machines[1]" or later, specify a SysNode that is in standby status or Offline
when the userApplication first starts up.
State transition occurs in ascending order of the numbers specified in
"Machines[number]."
Example) When there are four nodes, state transition occurs in the order shown
below:
"Machines[0]" -> "Machines[1]" -> "Machines[2]" -> "Machines[3]"
- 190 -
Set up the attributes of the userApplication as follows:
- 191 -
Attribute Setup value Contents Remark
Online on the highest
priority node
regardless of on which
node the cluster
application was Online
before restarting RMS.
Information
For more information and the list of attributes settable to userApplication, refer to "D.1 Attributes available to the user" in
"PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
Note
In the case of the single-node cluster operation
To set up exclusive relationships between cluster applications, you must set up the following.
For details on exclusive relationships between applications, see "6.7.7 Exclusive Relationships Between Cluster Applications."
Create multiple cluster application groups between which an exclusive relationship can be established. Exclusive control is
established between the cluster applications within a single group.
Up to 52 groups of A to Z or a to z can be specified. "20X" and "10X" are fixed values. Therefore, you must always specify either
"20X" or "10X" after the group.
- Example) When the cluster application is included in group A and the job priority is high
A20X
- Example) When the cluster application is included in group A and the job priority is low
A10X
Note
Exclusive relationships between cluster applications can be established only when the operation is being performed with two or more
cluster applications. When the operation is to be performed with one cluster application, do not set up any relationships between
cluster applications.
Group 20X
Group 10X
- 192 -
"LicenseToKill" : "no"
"AutoBreak" : "yes"
Note
Operator intervention requests and error resource messages are displayed only when the AutoStartUp and PersistentFault attributes
are set to yes(1). When the operator intervention and error resource messages are to be displayed, set yes(1) for the AutoStartUp and
PersistentFault attributes. For information on the operator intervention and error resource messages, see "4.2 Operator Intervention
Messages" in "PRIMECLUSTER Messages."
Information
The following scripts can be registered to userApplication. For more information on each script, refer to "Appendix D Attributes" in
"PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
- FaultScript
- PreCheckScript
- PreOnlineScript
- PostOnlineScript
- PreOfflineScript
- OfflineDoneScript
Do not use a tilde (~) for the command path or the argument set to each script.
- 193 -
5. Confirm if the setup information is correct, and then select "SAVE+EXIT."
6. "turnkey wizard "STANDBY"" is output. Specify the settings for each resource.
- 194 -
Example 1) Preparing for scalable operation
When you create a cluster application in a scalable operation, you must first create a cluster application in a standby operation, which is a
prerequisite for scalable operation.
If the cluster application of scalable operation is to run on three nodes, create a cluster application of standby operation on each of those
nodes (the node is for operation only and has no standby).
When you create a cluster application for standby operation, which is a prerequisite for scalable operation, set up only "Machines[0]."
Example 2) Preparing for high-availability scalable operation
To create a high-availability scalable cluster application, you must first create a cluster application for standby operation, which is a
prerequisite for high-availability scalable operation.
If the cluster application for high-availability scalable operation is 1:1 standby, create a cluster application for 1:1 standby.
Note
To create a cluster application in standby operation that constitutes scalable operation, set "AutoStartUp" to "no." To start the cluster
applications automatically when you start RMS, set the value of "AutoStartUp" to "yes" when you create a cluster application as part of
scalable operation.
The procedure for setting up the node of a cluster application in a standby operation, which is a prerequisite for scalable operation, is as
shown below.
- 195 -
Topology How to set up userApplication configuration nodes
operation. For information on making this setting, see how to set up the topology
of each standby operation.
For information on how to create standby cluster applications, see "6.7.2.1 Creating Standby Cluster Applications."
After you complete the setup of standby operation, which is a prerequisite for scalable operation, you must create the cluster application of
scalable operation as explained below.
3. "turnkey wizard "SCALABLE"" is output. Select "Machines+Basics" and set up the userApplication.
4. The userApplication setup screen is output. Specify the following settings for the userApplication:
- Nodes where the userApplication is configured
- userApplication attributes
Set up the nodes where the userApplication is configured as follows:
- 196 -
- Specify all SysNode names where the cluster application is configured (standby operation) in "Machines[number]".
Refer to the following when setting the userApplication attributes:
- 197 -
7. "Settings of application type" is output. Select "AdditionalAppToControl."
Information
All of a cluster application of standby operation is displayed with lowercase characters.
10. To allow scalable operation with multiple cluster applications (standby operation), repeat steps 7. to 9.
11. Set up the order in which cluster applications are started up (standby operation). When you start the cluster applications, start from
the one with the smallest startup sequence number. When stopping, from the one with the largest startup sequence number. Cluster
applications with the same startup sequence number must start up or stop in parallel.
Note
If you do not need to set up a startup sequence number, you do not have to perform the procedure described below.
- 198 -
1. Select "(ApplicationSequence=)" from "Settings of application type."
2. Select "FREECHOICE."
3. Enter the startup sequence number, and then press the return key.
- Enter the cluster application with the highest startup sequence number first.
- If the startup sequence numbers are different, input a single colon (:) between the cluster applications.
- If the startup priority numbers are the same, input a single space between the cluster applications.
Note
The cluster application for standby operation must be entered entirely in lowercase characters.
The following is an example in which the startup sequence of app1 is the first, followed by app2 and then app3 (app2 and app3
have the same startup sequence number).
When two or more cluster applications for scalable operation are to be created, repeat steps 1. to 12.
- 199 -
6.7.3 Setting Up Resources
This section explains how to register resources to the userApplication that was set up in the previous section.
You can register the following resources:
- Cmdline resources
You can use Cmdline resources to set up script files or commands as resources. The Cmdline resources are required to generate the state
transition of userApplication along with the stop of user applications, and conversely, to start or stop ISV applications or user
applications along with the state transition of the userApplication.
- Fsystem resources
Used when you mount a file system along with userApplication startup.
Note
To use a file system in a class created by GDS as an Fsystem resource, you must register the Gds resource to the same userApplication.
- Gds resources
Used when you start and stop a disk class to be defined by GDS by linking it with the userApplication.
- Gls resources
Used when you set up a takeover IP address that is to be defined in a userApplication with the redundant line control function of GLS,
or when you set a takeover IP address in a userApplication with the single line control function.
- Procedure resources
Used when you register a state transition procedure in the userApplication.
- 200 -
Resource setup flow
Operation Procedure:
1. Select "CommandLines" from "turnkey wizard "STANDBY"".
- 201 -
2. "CommandLines" will appear. Select "AdditionalStartCommand."
3. Select "FREECHOICE" and then enter the full path of the StartCommand. If you need to specify arguments, delimit them with blanks.
StartCommand is executed during Online processing to start user applications.
[StartCommand exit codes]
StartCommand has the following exit codes:
0: Normal exit. The Online processing is successfully done.
Other than 0: Abnormal exit. The Online processing fails. When the script exits with the cord other than 0, the resource will enter
Faulted.
Note
The following characters cannot be used in the script path and the arguments that set for StartCommand, and StopCommand and
CheckCommand to be described later.
= \ ~ % @ &
If you need to use those characters, describe them within the script that sets to Cmdline resources.
4. "CommandLines" will appear. If you need to stop the user programs, select "StopCommands."
StopCommand is executed during Offline processing to stop user applications.
You do not always have to set up the StopCommand.
[StopCommand exit codes]
StopCommand has the following exit codes:
0: Normal exit. The Offline processing is successfully done.
Other than 0: Abnormal exit. The Offline processing fails. When the script exits with the cord other than 0, the resource will enter
Faulted.
If you do not use StopCommand, start from step 6.
- 202 -
Note
If "none" is set to StopCommands, regardless of the settings of Flags, LIEOFFLINE attribute is enabled and CLUSTEREXCLUSIVE
is disabled. In this status, the Cmdline resource is started and monitored.
5. Select "FREECHOICE" and then enter the full path of StopCommand. If you need to specify arguments, delimit them with blanks.
7. Select "FREECHOICE" and then enter the full path of the CheckCommand. If you need to specify arguments, delimit them with
blanks.
Note
If you enable the "NULLDETECTOR" attribute, CheckCommand is not started from RMS. For hot-standby operation, enable the
following two attributes;
- STANDBYCAPABLE
RMS executes Standby processing of the resources on all the nodes where the userApplication is Offline.
- 203 -
- ALLEXITCODES
Check script provides the detailed state of the resource with the exit code.
For further details about the hot-standby operation settings, see "6.6 Setting Up Online/Offline Scripts."
Note
The file system on the volume of LVM (Logical Volume Manager) cannot be controlled in Fsystem resource.
- 204 -
If you plan to use GDS volumes, you need to define the /etc/fstab.pcl file as follows.
Example: /etc/fstab.pcl file
Note
- If you have defined the same device or mount point in the /etc/fstab file, those definitions can be removed by making them into
comment lines. If those definitions are remained, userApplications may fail to be started normally.
- Ext4 and xfs are used to make the allocation of the disk area more efficiently, and to improve the writing performance, by using
their "Delayed Allocation" feature. As a result of the implementation of "Delayed Allocation", there is a possibility that a part
of data is lost by OS panic or power supply interruption of servers, because the sojourn time on the memory of data that should
be stored on the disk becomes longer.
When a program has to guarantee writing immediately after writing in file system, the application which writes the file should
issue the fsync() call. Refer to Storage Administration Guide of the Red Hat, Inc. for "Delayed allocation."
- For the directory paths that are specified as the mount points, specify any paths that do not include symbolic links.
# /sbin/mkfs.ext4 /dev/sdd2
# /sbin/mkfs.xfs /dev/sdd3
- 205 -
# /bin/mount -t xfs /dev/sdd3 /mnt/swdsk3
# /bin/umount /mnt/swdsk3
- Forcible file system check prevention (recommended for ext3 and ext4)
If ext3 or ext4 is used for a file system, the file system might forcibly be checked during Online processing of a switching file
system. It is part of the ext3 and ext4 specification that file systems are checked when a certain number of mounting has been
executed since the last file system check, or a certain period of time has passed.
If the file systems are forcibly checked along with startup or failover of the cluster application, timeout occurs due to file system
Online processing, and PRIMECLUSTER startup or failover might fail.
It is necessary to prevent the file systems from being checked by executing the following command for all the ext3 and ext4
switching files.
Example: Configuring and confirming the prevention of file systems from being checked
After executing the above command, check if "Maximum mount count :-1", "Check interval:0"is displayed using the following
command:
If the forcible file system check is prevented, file systems might corrupt due to failures such as disk errors and kernel bug. These
failures cannot be detected through file system logging and journaling. The file system corruption might cause data corruption.
To prevent this, execute the "fsck - f" command to enable the file system forcible check during periodic maintenance.
5. Stopping the GDS volume (Only when Step 2 has already been implemented)
Stop the GDS volume started in Step 2.
Example: Stopping the volume volume0001 of the disk class class with a command
- 206 -
2. Select "AdditionalMountPoint."
File systems (Lfs_APP1:not yet consistent)
1) HELP 4) REMOVE+EXIT 7) (Timeout=180)
2) - 5) AdditionalMountPoint
3) SAVE+EXIT 6) (Filter=)
Choose the setting to process: 5
3. The mount point, which is defined in /etc/fstab.pcl, will appear. Select mount points for monitoring-only disks.
1) HELP 6) /mnt/swdsk2
2) RETURN 7) /mnt/swdsk3
3) FREECHOICE
4) ALL
5) /mnt/swdsk1
Choose a mount point: 5
4. Select "SAVE+RETURN."
Set flags for mount point: /mnt/swdsk1 Currently set: LOCAL,AUTORECOVER (LA)
1) HELP 4) DEFAULT 7) SHARE(S)
2) - 5) SYNC(Y) 8) MONITORONLY(M)
3) SAVE+RETURN 6) NOT:AUTORECOVER(A)
Choose one of the flags: 3
5. If you register multiple mount points, repeat steps 2 to 4 for each mount point. After you have registered all necessary mount
points, Select "SAVE+EXIT."
# /opt/SMAW/SMAWRrms/bin/hvgdsetup -a [class-name]
...
Do you want to continue with these processes ? [yes/no] y
Information
To check the setup status of a shared volume, execute the following command:
# /opt/SMAW/SMAWRrms/bin/hvgdsetup -l
- 207 -
Note
- If the preliminary setup is not performed, the cluster application is set to Inconsistent status. For details, see "Cluster applications
become "Inconsistent" in "Cluster System Related Error" of "PRIMECLUSTER Global Disk Services Configuration and
Administration Guide."
- This operation must not be performed when a GFS shared file system is used.
Operation Procedure:
1. Select "Gds:Global-Disk-Services" from "turnkey wizard "STANDBY"".
- 208 -
4. Select "SAVE+EXIT."
Operation Procedure:
1. Select "Gls:Global-Link-Services" from "turnkey wizard "STANDBY"".
- 209 -
5. To save the Gls resource settings and then exit, select "SAVE+EXIT."
You can change the timeout value of the Gls resource by selecting "(Timeout=60)" and setting any value (seconds).
See
By setting up the value in the StandbyTransitions attribute when the cluster application is created, Gls resources on the standby node can
be switched to the "Standby" state and the state of the Gls resources on the standby node can be monitored. For information on how to make
this setting, see "6.7.2.1 Creating Standby Cluster Applications."
<node name> : CF node name of the node which uses the takeover IP address
<takeover> : Host name of the takeover IP address
<interface> : Network interface name on which the takeover IP address will be activated
<netmask/prefix> : Netmask for the takeover IP address (for IPv4), or network prefix length (for
IPv6)
- 210 -
Example
When an IPv4 address for the host "takeover" (netmask 255.255.255.0) is taken over between two nodes (node0 and node1) on the
network interface eth2, define as follows (specify the 8-digit netmask in hexadecimal).
When an IPv6 address for the host "takeover6" (network prefix length: 64) is taken over on the network interface eth3, define as
follows.
Note
- An IPv6 link local address cannot be used as a takeover network resource. Moreover, it cannot be used as a communication
destination of reachability monitoring.
- When defining a host name in the /etc/hosts file, do not assign the same host name to the IPv4 address and the IPv6 address.
Operation Procedure:
2. When you have previously specified the target host to monitor its network reachability using ICMP, select "AdditionalPingHost" and
specify that target host.
The target host name registered in the process of prerequisites will be shown as an option. Select the host name you have previously
specified.
1) HELP
2) RETURN
3) FREECHOICE
4) router
5) l3hub
6) takeover
Choose another trusted host to ping:4
When you finish specifying the target host, you will be brought back to the previous screen. Since you are required to specify more
than one target host, you need to select "AdditionalPingHost" again to add another target host on the previous screen.
- 211 -
3. Select "AdditionalInterface" to set up the takeover IP address.
When you have more than one IP address, you need to repeat this process for each IP address.
1) HELP
2) RETURN
3) FREECHOICE
4) router
5) l3hub
6) takeover
Choose an interface name:6
- DEFAULT
If you choose "DEFAULT", all values will revert back to their default values.
- BASE, VIRTUAL
This attribute is effective only when using an IPv4 address. When using an IPv6 address, do not change this attribute. The
default value is "VIRTUAL".
- BASE
If you specify "BASE", activation/deactivation of the takeover IPv4 address and activation/deactivation of the physical
interface (for example, eth2) are performed at the same time. "BASE" will be shown on "Currently set" and "5) VIRTUAL"
is shown on the menu page.
- VIRTUAL
If you specify "VIRTUAL", activation/deactivation of the takeover IPv4 address and activation/deactivation of the logical
interface (for example, eth2:1) are performed at the same time. "BASE" will be shown on the menu page. "VIRTUAL" will
be shown on "Currently set" and "5) BASE" is shown on the menu page.
You must activate the IPv4 address on the physical interface (for example, eth2) where the logical interface will be created
beforehand because the takeover IPv4 address with this attribute specifies the IPv4 address for the logical interface. To
activate the IPv4 address on the physical interface beforehand, make settings so that the IPv4 address is activated on the
physical interface at startup of the operating system, or register the takeover IPv4 address with "BASE" attribute with the
same takeover network resource.
- AUTORECOVER, NOT:AUTORECOVER
If you reactivate the takeover IP address, specify this attribute. The default value is "AUTORECOVER".
- AUTORECOVER
If you specify "AUTORECOVER" and the network interface goes down or becomes unreachable due to an error, it will
try to activate the takeover IP address only once. "AUTORECOVER" will be shown on "Currently set"and "6)
- 212 -
NOT:AUTORECOVER" is shown on the menu page. When the activation of the takeover IP address fails, it will be
notified to the cluster.
- NOT:AUTORECOVER
If you specify "NOT:AUTORECOVER", the "AUTORECOVER" setting will be disabled. "NOT:AUTORECOVER"
will be shown on "Currently set" and "AUTORECOVER" is shown on the menu page."
- BASIC-IF
You cannot use this attribute. Do not change.
- MONITORONLY, NOT:MONITORONLY
- MONITORONLY
If you specify "MONITORONLY" and the network interface goes down or becomes unreachable due to an error, the error
will not be notified to the cluster. "MONITORONLY" will be shown on "Currently set" and "7) NOT:MONITORONLY"
is shown on the menu page. If you specify this attribute, a switchover due to a takeover IP address failure will not occur.
- NOT:MONITORONLY
If you specify "NOT:MONITORONLY", the "MONITORONLY" setting will be disabled. "NOT:MONITORONLY"
will be shown on "Currently set" and "7) MONITORONLY" is shown on the menu page.
Note
At least one out of all the takeover IP addresses you have registered to the takeover network resources should be set to
"NOT:MONITORONLY".
- PING
By setting this attribute, you can specify the previously configured target host for the takeover IP address. Select the target
host name to be monitored which you have set in the process of prerequisites.
1) HELP
2) RETURN
3) router(000)
4) l3hub
Choese a ping host of the pool ():3
Note
- NeedAll, InterfaceFilter
You cannot use these attributes. Do not change.
- 213 -
Operation Procedure:
1. Select "Procedure:XXXXXXXXXX" from "turnkey wizard "STANDBY"".
Example of registering cluster resources of the BasicApplication class to a userApplication:
Note
If a cluster resource does not appear on this screen, it indicates that the cluster resource has not been registered in the resource
database. Confirm whether the cluster resource has been registered on each node of the userApplication, which is designed with "6.7.2
Setting Up userApplication." Register cluster resources if they are not registered. For details on the "clgettree" command, see the
manual pages of this command. For details on registering the cluster resource in the resource database, see "D.1 Registering a
Procedure Resource."
4. You can change the following on this screen. If necessary, select "SAVE+RETURN" from "Application detail Resource wizard" after
that.
- 214 -
6.7.4 Generate and Activate
This section explains how to execute Generate and Activate. You need to confirm first that the cluster application has been correctly created.
Operation Procedure:
1. Select "Configuration-Generate" from the "Main configuration menu."
Note
Do not execute "Configuration-Activate" simultaneously on multiple nodes which constitute the cluster.
- 215 -
6.7.5 Registering the Cluster Service of a PRIMECLUSTER-compatible
product
If the resources registered to a userApplication are for a PRIMECLUSTER-compatible product, register the resources to the cluster service
according to the procedure described below.
Operation Procedure
# /etc/opt/FJSVcluster/bin/clrwzconfig
Note
- If the cluster service for the PRIMECLUSTER-compatible product is not registered, the PRIMECLUSTER-compatible product will
not operate correctly. Therefore be sure to register the cluster service and the resources.
6.7.6 Attributes
See
For information on the attributes, see "Appendix D Attributes" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools
Configuration and Administration Guide."
Information
- To set up an exclusive relationship, create a group of cluster applications between which an exclusive relationship is to be set. Up to
52 groups can be created.
- For information on setting up an exclusive relationship, see "6.7.2.1 Creating Standby Cluster Applications."
- The cluster application in which the exclusive relationship is set transits to Standby state according to the StandbyTransitions attribute.
Note
When the cluster application state is Faulted on a node, cluster applications in exclusive relationships on that node cannot be made
operational by newly starting the cluster applications. Cluster applications started later will be stopped regardless of job priority.
The reason for this is that possibly not all resources under the control of the cluster application in the Faulted state could be stopped.
In such a case, clear the Faulted state of the cluster application to bring it to the Offline state, and then start the cluster applications that are
in exclusive relationships.
For information on how to clear the Faulted state of cluster application, see "7.2.2.4 Bringing Faulted Cluster Application to available
state."
- 216 -
The operation of cluster applications, between which an exclusive relationship is set up, during failover can be explained in the following
two cases:
When the job priorities of the cluster applications with an exclusive relationship are different
Cluster applications with the highest job priority take the top priority for startup on the nodes on which the cluster applications with high
job priority are running or on the nodes to which the cluster applications with high job priority are failed over. Therefore, cluster applications
running with low priorities will be forcibly exited.
The states indicated in the following figure are as follows:
- 217 -
- 218 -
Failover of the cluster application with a low job priority
Failover occurs for a cluster application with a low job priority only when there is no cluster application with a high job priority included
on the node to which the cluster application with a low job priority is to be failed over.
- 219 -
- 220 -
- 221 -
When the job priorities of cluster applications with an exclusive relationship are the same
The operation of the cluster applications that are already running will be continued. On the node on which cluster applications are already
running, cluster applications that subsequently start up will be stopped.
- 222 -
- 223 -
6.8 Setting Up the RMS Environment
When using RMS, you need to check "Setup (initial configuration)"of PRIMECLUSTER Designsheets and change the following
environment variable to the value corresponding to the configuration setup.
See
For information on how to check and change the RMS environment variables, see "1.9 Environment variables" and "Appendix E
Environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration
Guide."
# /etc/opt/FJSVcluster/bin/clchkcluster
If an error occurs, deal with the error according to the error message output on the console.
See
For details on the clchkcluster command, see the manual pages of the clchkcluster command.
Note
Even though the AutoSwitchOver attribute has been set, a failover is not performed unless HaltFlag has been set in the event of a double
fault.
How to failover a userApplication in the event of a node failure, resource failure, and RMS stop
Perform the following operation:
- 224 -
-> AutoSwitchOver = HostFailure | ResourceFailure | Shutdown
Note
1. In the event of a double fault, a failover is not performed even though this attribute value has been set.
Set the HaltFlag attribute for performing a failover even in the event of a double fault.
2. When the status of the userApplication to be switched is Fault, it cannot be switched even though AutoSwitchOver has been set.
When performing a failover, clear the Faulted state.
How to switch userApplication to Standby automatically when RMS is started, userApplication is switched,
or when clearing a fault state of userApplication
Perform the following operation:
-> StandbyTransitions = Startup | SwitchRequest | ClearFaultRequest
Note
- If "yes" has been set to AutoStartUp attribute, the status of the standby userApplication is transited to Standby when RMS is started
regardless of the setting value of StandbyTransitions.
The relationship between AutoStartUp and StandbyTransitions is as follows.
- If the resource which StandbyCapable attribute is set as "yes"(1) does not exist in the userApplication, the userApplication is not in the
Standby state regardless of the set value of StandbyTransitions attribute.
How to set scalable cluster applications for preventing timeout of Controller resource during a state
transition
When it takes time to start up and stop a cluster application that constitutes a scalable configuration, a timeout error of the Controller resource
(resource to indicate the scalability) may occur during a state transition. In this case, the state transition is stopped forcibly.
In this case, the setting of Controller resource needs to be changed according to the startup and stop times for each cluster application that
constitutes a scalable configuration.
Calculate the Timeout value of a scalable cluster application, and then change its setting with the following procedure:
Procedure
- 225 -
1. Calculating the maximum state transition time for a cluster application
The status of the Controller resource is transited to Online when the statues of userApplications under the Controller resource are all
Online. Therefore, calculate the total values of ScriptTimeouts for each resource that configures a cluster application.
For example, if every one of the following resource; Cmdline resource, Fsystem resource, GDS resource, or Gls resource exists under
the cluster application, you can calculate as follows. (The timeout value for each resource is a default value.)
Cmdline resource 300 (sec) + Fsystem resource 180 (sec) + GDS resource 1800 (sec) + Gls resource 60 (sec) = 2340 (sec)
This value is larger than the default value for the scalable cluster application 180 (sec), set the setting value to 2340 (sec).
Information
Default script timeout values for each resource
Cmdline : 300
Fsystem : 180
GDS : 1800
Gls : 60
The maximum state transition time of a cluster application between multiple nodes
Example
For example, in the case Online or Offline processing of a userApplication is assumed to be finished just before it times out when the
userApplication is with a three-node configuration and the status is Online on Node1, after starting the state transition on the first
Node, it takes 4 times (2 x ("the number of Sysnode" - 1) for the userApplication to be Online on the final node as follows:
- 226 -
2. Select "Application-Create" from "Main configuration menu."
5. Select "SELECTED."
- 227 -
6. Select "TIMEOUT(T)" from "Set *global* flags for all scalable (sub) applications."
7. Select "FREECHOICE" and enter the setting value (when entering 2340).
8. Select "SAVE+RETURN" from "Set *global* flags for all scalable (sub) applications."
- 228 -
See
For detailed operation on how to change RMS Wizard and attributes, see "10.3 Changing the Cluster Configuration" or
"PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
How to stop a standby operational system preferentially in the event of a heartbeat error
When a heartbeat error is detected, set the survival priority for the node to be stopped forcibly so that it prevents all operational and standby
systems from being failed by forcibly stopping both operational and standby systems mutually. Below describes how to stop the operational
system preferentially and collect the information for investigation.
Note
- The weighting of each node to set in the Shutdown Facility is defined to a node.
If an operational and standby system is switched due to a failover or switchover, it cannot be enabled even though the setting is changed.
As before, stop an operational system forcibly after a given time has elapsed in a standby system.
When a cluster is switched, be sure to perform a failback.
- If a system panic, CPU load, or I/O load continues, it seems like a heartbeat has an error. In this case, the cluster node with an error is
forcibly stopped regardless of the survival priority.
- A standby system with a low survival priority waits until an operational system is forcibly stopped completely. During this waiting time,
if the heartbeat is recovered, some information for investigating the heartbeat error may not be collected.
This case may occur when the CPU load or I/O load is the high in an operational system.
Procedure
Below indicates an example when the operational system is node1, and the standby system is node2.
Note
Perform the Steps 1 to 4 in the both operational and standby systems.
1. Modify the SF configuration (/etc/opt/SMAW/SMAWsf/rcsd.cfg) for the standby system (node2) with the vi editor, and so on to give
a higher weight value to the standby system. Change the weight attribute value of node2 from "1" to "2."
node2# vi /etc/opt/SMAW/SMAWsf/rcsd.cfg
[Before edit]
node1,weight=1,admIP=x.x.x.x:agent=SA_xx,timeout=20:agent=SA_yy:timeout=20
node2,weight=1,admIP=x.x.x.x:agent=SA_xx,timeout=20:agent=SA_yy:timeout=20
[After edit]
node1,weight=1,admIP=x.x.x.x:agent=SA_xx,timeout=20:agent=SA_yy:timeout=20
node2,weight=2,admIP=x.x.x.x:agent=SA_xx,timeout=20:agent=SA_yy:timeout=20
Note
- Describe the setting of one node with one line in the rcsd.cfg file.
- admIP may not be described depending on the version of PRIMECLUSTER.
- 229 -
node2# sdtool -r
3. Use the sdtool -C command. to check that the changed SF configuration has been reflected
Check that the weight attribute value of node2 has become "2."
node2# sdtool -C
Note
"Type" may not be displayed depending on the version of PRIMECLUSTER.
4. Use the sdtool -s command to check that all the SAs defined to the SF operate properly. Moreover, check that "Test State" and "Init
State" have been changed to "TestWorked" and "InitWorked respectively.
node2# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
node1 SA_xx Idle Unknown TestWorked InitWorked
node1 SA_yy Idle Unknown TestWorked InitWorked
node2 SA_xx Idle Unknown TestWorked InitWorked
node2 SA_yy Idle Unknown TestWorked InitWorked
Note
Perform the following Steps 5 to 8 either in the operational or standby system.
5. Check the ShutdownPriority attribute value of a cluster application (userApplication) with hvutil -W command.
When the ShutdownPriority attribute value is other than "0," perform Steps 6 to 8.
When it is "0," no more setting is required.
node1# hvutil -W
4
Note
Note that if you stop PRIMECLUSTER (RMS), the operation is also stopped.
node1# hvshut -a
7. Change the ShutdownPriority attribute value of a cluster application (userApplication) to "0." First, start the RMS Wizard.
Note
Change testconf based on your environment.
For details, see "11.1 Changing the Operation Attributes of a userApplication."
- 230 -
1. Select "Application-Edit" from "Main configuration menu."
2. Select the appropriate cluster application (userApplication) to change its configuration in "Application selection menu."
3. Select "Machines+Basics" in "turnkey wizard."
4. Select "ShutdownPriority."
5. Select "FREECHOICE" to enter 0.
6. Select "SAVE+EXIT" in "Machines+Basics."
7. Select "SAVE+EXIT" in "turnkey wizard."
8. Select "RETURN" on "Application selection menu."
9. Select "Configuration-Generate."
10. Select "Configuration-Activate."
node1# hvcm -a
Note
When a cluster is switched, be sure to perform a failback.
How to stop the operational node forcibly in the event of a subsystem hang
The following event is called a subsystem hang: the cluster does not detect that the operation is stopped (the operation seems normal from
the cluster monitoring) because only some I/Os within the operational node have errors and other I/Os operate normally.
In this case, if the node is switched to a standby node, the operation may be restarted. In the event of a subsystem hang, ping may respond
properly and you may be able to log in to a node.
When a subsystem hang is detected, stop the operational node with the following method and switch the operation.
If you can log in to a standby node
Stop the operational node from the standby node with the sdtool command.
# sdtool -k node-name
Note
It is possible to determine a subsystem hang from application failures to control a forcible stop mentioned above. In the case, it needs to be
determined from multiple clients. That is, even though an error is found from one client, the error may be in the client or on the network.
You need to consider such a case when controlling a forcible stop.
- 231 -
How to use SNMP manager to monitor cluster system
If any error occurs in the resources registered in the userApplication of a cluster, SNMP Trap will be sent to the server which SNMP manager
runs on, thus the cluster system will be able to be monitored.
See
For details of this function, see "F.11 SNMP Notification of Resource Failure" in "PRIMECLUSTER Reliant Monitor Services (RMS) with
Wizard Tools Configuration and Administration Guide."
Set the FaultScript attribute of userApplication to "To be specified by the hvsnmptrapsend command" as follows.
Prechecking
Check if the net-snmp-utils package provided by the OS has been installed on all the nodes of the cluster which uses this function. If it
has not been installed, you need to install it.
Example
# rpm -q net-snmp-utils
net-snmp-utils-5.5-41.el6.i686
Confirm that the SNMP manager supports version 2c of SNMP in the SNMP Trap destination. Moreover, check the community names
that the SNMP manager can receive beforehand.
Setup procedure
- 232 -
See
For information on how to set up userApplication with the RMS Wizard, see "6.7.2.1 Creating Standby Cluster Applications" and
"10.3 Changing the Cluster Configuration."
1) HELP
2) RETURN
3) NONE
4) FREECHOICE
Enter the command line to start upon fault processing: 4
>> /opt/SMAW/bin/hvsnmptrapsend community snmprvhost
Note
When the Fault script has been registered already, create a new script for executing both the Fault script command and the
hvsnmptrapsend command, and register this script in the Fault script.
5. See "6.7.4 Generate and Activate" and execute the "Configuration-Generate" and "Configuration-Activate" processes.
- 233 -
Do not use reserved words for userApplication names and Resource names
If you use a reserved word for a userApplication or Resource name, RMS cannot be configured properly.
Do not use the following reserved words in addition to numbers and types of characters limited in PRIMECLUSTER Installation and
Administration Guide.
<List of reserved words>
Reserved words written in C
auto|break|case|char|const|continue|
default|do|double|else|enum|extern|float|
for|goto|if|int|long|main|register|return|short|
signed|sizeof|static|struct|switch|typedef|
union|unsigned|void|volatile|while
and|and_eq|bitand|bitor|compl|not|or|or_eq|xor|xor_eq|
asm|catch|class|delete|friend|inline|new|operator|private|
protected|public|template|try|this|virtual|throw
ADMIN|ADMIN_MODIFY|CONTRACT_MODIFY|ENV|ENVL|INIT_NODE|Offline|
Faulted|Online|Standby|Warning|SysNode|andOp|
assert|commdNode|contractMod|controller|env|envl|gResource|node|
object|orOp|userApp|userApplication|ScalableCtrl
abstract|attach|attribute|begin|class|consume|copy|cpp|declare|
delay|delete|error|extends|extern|hidden|implements|include|
interface|java|left|lookahead|lr|message|modify|nonassoc|node|
nosplit|notree|package|prec|private|public|reductor|repeat|right|
select|show|simple|skip|state|tree|trigger|type|used|virtual|wait|link
- Start script
is started when the status of userApplication is transited to Online or Standby.
is a script to start user applications.
- Stop script
is started when the status of userApplication is transited to Offline.
is a script to stop user applications.
- 234 -
- Check script
is used to monitor the status of resources (user applications) to be started or stopped with a Start or Stop script. It is executed in regular
intervals after starting RMS. In addition, it is a script to report the status of user applications.
(*) If the processing time of the Check script (time from the start to the end of the Check script) is within about 0.25 seconds, it is started
in about 10-second intervals. If the processing time exceeds 0.25 seconds, it is started in about 20-second intervals.
Besides, the Start script and Stop script are called as the Online script and Offline script respectively.
The following table indicates attributes can be set to the Cmdline resources.
- 235 -
Attribute Outline
MONITORONLY This attribute controls whether to switch userApplication to Faulted state when the
resource is Faulted. If this attribute is set to "Yes," userApplication cannot be Faulted even
if the resource becomes Faulted.
Set "No" to at least one Cmdline resource that is registered in userApplication.
The default value is "No."
STANDBYCAPABLE If the attribute is set to "Yes," RMS sets the StandbyCapable attribute to "1" for this
resource.
For detailed information regarding this attribute, see "Appendix D Attributes" of
"PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration
and Administration Guide."
The default value is "No."
REALTIME If the attribute is set to "No," the Check script is started in the TS class.
If the attribute is set to "Yes," the Check script is started in the RT class.
Note that the highest priority is assigned to the process started in the RT class in the
operating system. Thus, the bugs of the script or commands may give a large effect on
system performance.
The default value is "No."
TIMEOUT This attribute sets a timeout interval (seconds) to start and stop programs.
The default value is "300."
Note
When PRIMECLUSTER products are not specified, do not change ReturnCodes of the Cmdline resource.
- 236 -
State transition Value of environment variable
Script for
State of the execution HV_LAST_DET_REPORT
Classification HV_INTENDED_STATE
Cmdline resource *1
At switchover Operation Online->Faulted - - -
(resource failure) al system
Faulted->Offline Stop script Offline Offline
Standby Offline->Online Start script Offline Online
system
At cutting of Operation Offline - - -
(resource failure in al system
standby system)
Standby Offline->Offline Stop script Offline Offline
*3
system *2
At exit of Operation Online->Online Start script Online Online
maintenance mode al system *4
Standby Offline->Offline - - -
system
*1: The value of HV_LAST_DET_REPORT is the current resource status just before the "Script for execution" is executed.
*2: This script is executed only when the following conditions exist:
- 237 -
State transition Value of environment variable
Script for
State of the execution HV_LAST_DET_REPORT HV_INTENDED_STATE
Classification
Cmdline resource *1
Standby Standby->Faulted - - -
system
Faulted->Offline Stop script Offline or Faulted *3 Offline
At exit of Operation Online->Online - - -
maintenance mode al system
Standby Standby->Standby - - -
system
*1: The value of HV_LAST_DET_REPORT is the current resource status just before the "Script for execution" is executed.
*2: When the StandbyTransitions attribute is "Startup."
*3: When the Check script is returned to 1 (Offline) during a failure detection, the value of HV_LAST_DET_REPORT is "Offline." When
the Check script is returned to 2 (Faulted) during a failure detection, the value of HV_LAST_DET_REPORT is "Faulted."
See
For the environment variable that can be referred to within a script, see "6.11.2.1.2 Environment Variables can be referred to within the Start
and Stop Scripts."
- 238 -
The Start script and Check script are switched based on the exit code. The states are as follows.
For details on the exit codes, see "6.11.2.2.3 Check Script Exit Code."
- 239 -
(*) For a timeout, see "6.11.2.1.5 Timeout of Scripts."
- At RMS startup
- At RMS stop
- At switchover
In addition to the Cmdline resource, the Gls resource is also described in the following figures as an example.
Note
The Check script is operated before the Start script. If the Check script is returned to Online before executing the Start script, the
Start script is not executed.
- 240 -
- Gls resource operation
At the same time a resource become Online after starting RMS, GLS activates a virtual IP address. In addition, to notify the location
of the activated IP address, GLS sends a system down notification.
Note
The Check script is operated before the Start script. If the Check script is returned to Online before executing the Start script, the
Start script is not executed.
- 241 -
- Gls resource operation
In Standby state, GLS monitors a network route with the host monitoring function (ping monitoring) without activating a virtual
IP address.
- 242 -
- At RMS stop Standby system (Standby->Offline)
- The Cmdline resource operation
The Stop script is executed. Without waiting for the given time, the Check script is executed. After the Check script is returned to
Offline, the corresponding Cmdline resource becomes Offline.
- 243 -
- Gls resource operation
At the same time a resource become Online, GLS activates a virtual IP address. In addition, to notify the whereabouts of the
activated IP address, GLS sends a system down notification.
- For standby systems of the Cmdline resource other than Hot-standby operation, the Start script is not executed at RMS startup. Thus,
the phases 2 and 3 do not exist.
- 244 -
- For standby systems of the Cmdline resource other than Hot-standby operation, the Stop script is not executed at RMS stop. Thus, the
phases 5 and 6 do not exist.
- 245 -
- The Cmdline resource with Hot-standby operation
- 246 -
- The Cmdline resource other than Hot-standby operation
- 247 -
6.11.2.1 start and stop Scripts
- STANDBYCAPABLE: No
- AUTORECOVER: No
- CLUSTEREXCLUSIVE: Yes
- NULLDETECTOR: No
- MONITORONLY: No
[Operation]
Below is an example when assuming the operation is the same for standby and operational systems by following "Table 6.3 The Cmdline
resource in other than Hot-standby operation." The same processing is executed in the lines where Start script is described. The same
processing is also executed in the lines where Stop script is described.
When assuming operations other than the above, refer to the environment variable and attribute to change them.
Figure 6.5 Start script and Stop script other than Hot-standby operation
The setting method varies depending on operating systems. See the respective manuals "PRIMECLUSTER Installation and Administration
Guide."
Moreover, below is an example when assuming that the operation of the following sample $FULL_PATH/Script is corresponding Hot-
standby operation.
- 248 -
[Setting]
- STANDBYCAPABLE: Yes
- AUTORECOVER: No
- CLUSTEREXCLUSIVE: Yes
- ALLEXITCODES: Yes
- NULLDETECTOR: No
- MONITORONLY: No
[Operation]
Below is an example of the Start script when the status is transited from Offline to Standby and also from Offline to Online. The
transitions are distinguished as "Table 6.4 The Cmdline resource in Hot-standby operation."
In addition to that, another example that the Stop script distinguishes when the status is transited from Standby to Offline and also from
Online to Offline is as follows.
When assuming operations other than the above, refer to the environment variable and attribute to change them.
- 249 -
Figure 6.6 Start script and Stop scripts with Hot-standby operation
6.11.2.1.2 Environment Variables can be referred to within the Start and Stop Scripts
When executing the Start script and Stop script, the following environment variables are set. You can refer to those environment variables
within the scripts.
Table 3.4 indicates the environment variables set in the scripts.
Table 6.5 Environment variables can be referred to within the Start and Stop scripts
Environment variables Outline
HV_APPLICATION This variable sets the userApplication name that the resource belongs to.
Example) app1
HV_AUTORECOVER The value of this variable indicates whether the script is triggered by
AutoRecover or not.
0: Not triggered by AutoRecover that is executed with the Online processing
1: Triggered by AutoRecover
HV_FORCED_REQUEST This variable sets a value that indicates whether or not forced failover was requested
by operator intervention.
- 250 -
Environment variables Outline
0: Forced failover was not requested.
1: Forced failover was requested.
HV_NODENAME This variable sets the resource name.
Example) ManageProgram000_Cmd_APP1, RunScriptsAlways000_Cmd_APP1
HV_OFFLINE_REASON This variable sets the trigger for bringing the resource Offline.
SWITCH: The resource was set to Offline because of a userApplication switchover
request (hvswitch)
STOP: The resource was set to Offline because of a userApplication stop request
(hvutil -f)
FAULT: The resource was set to Offline because of a resource fault.
DEACT: The resource was set to Offline because of a userApplication deactivate
request (hvutil -d)
SHUT: The resource was set to Offline because of an RMS stop request (hvshut)
HV_SCRIPT_TYPE This variable sets the type of script that was executed.
Online: Online script
Offline: Offline script
HV_LAST_DET_REPORT This variable sets the state of the current resources just before execution of the Start/
Stop script.
Online: Online state
Offline: Offline state
Standby: Standby state
Faulted: Faulted state
HV_INTENDED_STATE This variable sets the resource state that is expected after state transition is
completed.
Online: Online state
Offline: Offline state
Standby: Standby state
Faulted: Faulted state
Warning: Warning state
NODE_SCRIPTS_TIME_OUT This variable sets the timeout duration (seconds) of the script.
Example) 300
See
- For details on the RMS environment variables, see "Appendix E Environment variables" in "PRIMECLUSTER Reliant Monitor
Services (RMS) with Wizard Tools Configuration and Administration Guide."
- 251 -
Other than 0: Abnormal exit
The system assumes that an error occurred during the state transition of the Cmdline resources and interrupts state transition processing
of the userApplication.
- Online script
Check whether a target program has already run before starting it within the Online script. If it has already run, the Online script is
immediately stopped.
- Offline script
Check whether the target program has already stopped before stopping it within the Offline script. If it has already stopped, the Offline
script is immediately stopped.
Note
If the userApplication state before the maintenance mode is started is Online, the Online script of Cmdline resource where the
NULLDETECTOR flag is set is executed.
Note
The processing time for each script needs to be shorter than the TIMEOUT attribute value of attribute that users have set.
If the processing time of scripts exceeds the TIMEOUT attribute value, PRIMECLUSTER determines it is a resource error and stop the
startup and stop processings.
- 252 -
Figure 6.7 The Check script other than Hot-standby operation
If performing Hot-standby operation in the Cmdline resource, describe the Check script, which is similar to the start and stop scripts,
corresponding to Hot-standby operation
Below is an example of the Check script corresponding to Hot-standby operation.
The following example assumes that the setting has already described in 6.11.2.1.1 Examples of start and stop Scripts."
- 253 -
6.11.2.2.2 Environment Variables that can be referred to within the Check Scripts
The following environment variables are set when executing the Check script. These environment variables can be referred to within the
script.
- HV_APPLICATION
- HV_NODENAME
See
For outlines on these environment variables, see "Table 6.5 Environment variables that can be referred to within the Start and Stop scripts."
And, for details on the RMS environment variables, see "Appendix E Environment variables" in "PRIMECLUSTER Reliant Monitor
Services (RMS) with Wizard Tools Configuration and Administration Guide."
Note
Since the exit codes other than the above indicate the specific status, use these codes only when applicable products are specified in the
environment that uses PRIMECLUSTER products.
- 254 -
6.11.3 Notes on Scripts
- The execute permission for each script is user: root and group: root.
- Environment variables set in each server ("/etc/profile" or "etc/bashrc", for example) are not guaranteed to be inherited by Start, Stop,
and Check scripts. Therefore, make sure to define the environment variables used with these scripts in each script.
- The Check script is called in regular intervals (10-second intervals) after starting RMS. It does not synchronize with the Start or Stop
script.
Therefore, at the time the Check script is started, the processing of the Start script has not completed or the Stop script may still be in
process.
If the Check script has started before completing the Start script, create a script so that the exit code Offline is returned.
- When multiple Cmdlines are registered in userApplication, it is performed in the order of registering Cmdline when starting
userApplication. On the other hand, when stopping it, it is performed in the opposite order of registering Cmdline. The example is as
follows.
The resource registered first is Command[0], the resource registered next is Command[1].
Those resources are started and stopped in the following order.
At startup
StartCommands[0]
StartCommands[1]
At stop
StopCommands[1]
StopCommands[0]
- The Cmdline resource is managed by its creator. Thus, for the operation error, the creator need to investigate the cause, modify the error,
and check the operation.
To investigate the cause of the error immediately, take some actions such as outputting a log.
/var/opt/SMAWRrms/log/"user_application_name".log
"user application name" is the user application name that the Cmdline resource has registered. If the Start or Stop script does not operate
properly, you can investigate the cause from the message output in this file.
- When starting a resident process from the Start script registered in the Cmdline resource, a file descriptor of the Start script is transferred
to the resident process. To output a message to a standard error or standard error output from the resident process, the message is stored
in the "user application name".log file. However, the purpose of this file is to obtain a message that the Start and Stop scripts of a resource
output. The messages output from the resident process all the time are not assumed. If the resident process keeps outputting messages,
the "user application name".log file may weigh on its disk space.
To start operational application which has a resident process from the Cmdline resource, perform any one of the following resolutions:
- Change the setting of the operational application so that the resident process does not output a message to a standard output or
standard error output.
- Immediately after starting the resident process, modify the processing of the resident process so that the file descriptor of the
standard output or standard error output transferred from the Start script becomes CLOSE.
Point
The resident process is started with taking over file descriptors other than the standard output or standard error output. There is no
problem to close all the file descriptors.
- 255 -
- Redirect the messages output from the resident process within the Start script to /dev/null or other files.
Example
If a resident process is started with the Start command; StartCommand.sh, register the Start command as follows:
- The messages output are unnecessary for the operation (the messages are discarded with /dev/null file).
/usr/local/bin/StartCommand.sh > /dev/null 2>&1
- The messages are necessary for the operation and they are output to the log file /var/tmp/logfile.
/usr/local/bin/StartCommand.sh > /var/tmp/logfile 2>&1
Note
To redirect the messages output from the resident process to other log files, you need to delete log files periodically so that they do
not weigh on their disk space. You cannot delete log files during the resident process operation, copy /dev/null to log files so that
the size of them becomes 0.
cp /dev/null /var/tmp/logfile
Setting the size of log files 0 periodically from the cron command allows the operation with the enough disk space.
- The mount state of a file system has set in line with the definition of /etc/fstab.pcl.
- I/O to the file system has performed properly while it is been mounted.
- AUTORECOVER
If "Yes" is set, hvdet_gmount tries to recover the failure by re-mounting when it detects a failure. If this attempt fails, the Fault
processing is executed.
The default value is "Yes."
- 256 -
Note
"No" is recommended to set to AUTORECOVER.
If you set "Yes," it is effective for the measures when an operator unmounts a file system mistakenly. However, it takes time for a
switchover when Fsystem timeouts due to an I/O error, and so on because it tries to perform I/O again.
If an error is detected, hvdet_gmount repeats a recovery processing only for the number of times specified with the
HV_GMOUNTMAXLOOP attribute as follows. Even though it cannot be recovered, perform a recovering processing specified times
with the HV_GMOUNTMAXRETRY attribute.
The defaults values for HV_GMOUNTMAXLOOP and HVGMOUNTMAXRETRY are four and seven times respectively. The
recovery processing for HV_GMOUNTMAXLOOP is executed in 0.5-second intervals while the recovery processing for
HV_GMOUNTMAXRETRY is executed in 10-second intervals. Therefore, when a disk or path error cannot be recovered occurs, the
re-try processing is executed in about 84 seconds, and then it is switched over.
Note
HV_GMOUNTMAXLOOP and HV_GMOUNTMAXRETRY are RMS environment variables. To change those values, set "export
HV_GMOUNTMAXLOOP=value" and "export HV_GMOUNTMAXRETRY=value" to hvenv.local
See
The type of file system that can be used on the shared disk device varies depending on the OS. For details on the file system and notes on
use, see "Linux user guide" of each OS.
- 257 -
# tune2fs -c0 -i0 <device_name>
Example
After executing the above command, check if "Maximum mount count :-1", "Check interval:0"is displayed using the following command:
# tune2fs -l /dev/sdi1
[snip]
Mount count: 10
Maximum mount count: -1
[snip]
Check interval: 0 (<none>)
[snip]
Note
If the forcible file system check is prevented, file systems might corrupt due to failures such as disk errors and kernel bug. These failures
cannot be detected through file system logging and journaling. The file system corruption might cause data corruption. To prevent this,
execute the "fsck - f" command to enable the file system forcible check during periodic maintenance.
See
Ext4 and xfs are used to make the allocation of the disk area more efficient, and to improve the writing performance, using their "Delayed
Allocation" feature. As a result of the implementation of "Delayed Allocation", there is a possibility that a part of data is lost by OS panic
or power supply interruption of servers, because the sojourn time on the memory of data that should be stored on the disk becomes longer.
For the details of delayed allocation, see Storage Administration Guide of the Red Hat, Inc.
- ext4
The delayed allocation can be set disable by specifying nodelalloc for mount option in ext4. Specify the mount option of /etc/fstab.pcl
file as follows.
- xfs
The delayed allocation cannot be set disable when xfs is used. Therefore, in order to prevent a part of data not be lost by OS panic or
power supply interruption of servers, the application should immediately issue the fsync() call after writing to guarantee writing in the
file system.
- 258 -
- Do not access mountpoint specified in Fsystem from other than a userApplication.
During Offline processing, if accessing the mountpoint specified in Fsystem with other process, the Offline processing may fail and a
switchover may not be performed.
- Do not change the mountpoint name for Fsystem with such as mv command when a userApplication is Online.
If the mountpoint name is changed when Online, hvdet_gmount detects an error and a userApplication is switched. To change the
mountpoint name temporarily, stop RMS first.
- If 31 or more mountpoints registered in a single Fsystem resource exist, you need to change the default timeout value (180 seconds).
For the Timeout value of the Fsystem resource, "the number of mountpoints registered in single Fsystem x 6 seconds" or more needs
to be set.
For example, if 31 mountpoints are registered in a single Fsystem resource, set "31 x 6 seconds = 186 seconds" or more to the Timeout
attribute of the Fsystem resource.
- The timeout value set in each Fsystem resource is the time until all processing completes for the mountpoints registered in the Fsystem
resource.
For example, if three mountpoints; /mnt1, /mnt2, and /mnt3 are registered in the Fsystem resource, and also 100 seconds is set to the
timeout value, the processing times out unless the processing of all three mountpoints completes within 100 seconds.
- For the disk partition used in the Fsystem resource, it is necessary to create beforehand.
If it has not been created, Online processing fails.
Note
To mount a file system on a shared disk manually, mount it from any one of nodes configuring a cluster system.
If you mount file systems on shared disks from multiple cluster nodes at the same time, these file systems are destroyed. Perform the
operation with careful attention.
# /opt/SMAW/SMAWRrms/bin/hvshut -a
# /bin/df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 20315844 7474340 11792864 39% /
/dev/sda1 256666 25466 217948 11% /boot
tmpfs 971664 0 971664 0% /dev/shm
If the file system has already mounted, a cluster application may be in operation or the file system has already been mounted manually.
In this case, stop the cluster application and RMS, or unmount the target file system with the umount command.
- 259 -
The following procedure is performed in any one of nodes configuring a cluster.
See
For how to restore the file system with the fsck command or e2fsck command, see the Online manual page for Linux (man fsck
or man e2fsck).
# /bin/cat /etc/fstab.pcl
#RMS#/dev/sfdsk/class0001/dsk/volume0001 /mnt/swdsk1 ext3 noauto 0 0
Example: Mounting the file system of the mountpoint /mnt/swdsk1 controlled by the Fsystem resource
# /bin/umount /mnt/swdsk1
# /opt/SMAW/SMAWRrms/bin/hvcm -a
- 260 -
Part 3 Operations
Chapter 7 Operations................................................................................................................................... 262
- 261 -
Chapter 7 Operations
This chapter describes the functions managing PRIMECLUSTER system operations. They monitor operation statuses for
PRIMECLUSTER system and operate PRIMECLUSTER system according to its operation statuses and so on. Also, notes for operating
PRIMECLUSTER system are described.
The following user groups are allowed to do each specific operation:
Operation Target
Referring the operation management screens All user groups
Operations wvroot, clroot, cladmin
Monitoring All user groups
Corrective actions for resource failures wvroot, clroot, cladmin
- CF main window
Use this screen to set up the configuration of the nodes that make up the cluster, manage the nodes, and display the node state.
See
For instructions on displaying each screen, see "4.5.3 Cluster Admin Functions."
- 262 -
See
For details, see "Chapter 4 GUI administration" in "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide."
Note
The node states may be displayed as Unknown. In this case, exit the Web-Based Admin View screen and restart. If the node states in the
CF main window of Cluster Admin are still displayed as Unknown, check the node states by using cftool -n.
- 263 -
See
The CRM main window is a screen of the cluster resource monitoring facility. See "crm" in "4.5.3 Cluster Admin Functions."
Items that are related to resources under shared resources are displayed with overlapping .
Icon Resource
Shared resource
Local disk
IP address
Network interface
Takeover network
Node
Cluster states
The following cluster states are displayed.
Red OFF-FAIL One of the nodes in the state other than the ON state,
or a shared resource is in the OFF-FAIL state.
- 264 -
Node states
The following node states are displayed.
Green with ON- One of the resources under the node is in the Faulted
vertical red lines FAILOVER state.
Note
Green with vertical ON- The resource is operating normally, but some devices or resources
red lines FAILOVER that are multiplexed and managed internally are in the Faulted state.
7.1.2.1.3 Operations
You can perform the operations described below on the CRM main window.
In the table below, "Selection resource" is the resource class name of the selectable resource. For details on resource class names, see
"7.1.2.2 Detailed Resource Information."
- 265 -
Table 7.1 Operations of the CRM main window
Operation method
Feature Target group
Menu Selection resource
Build CRM resource database Tool - Initial setup None (*1) wvroot
clroot
Request Resource activation Tool - Start SDX_DC (*2) wvroot
clroot
cladmin
Request Resource Tool - Stop SDX_DC (*2) wvroot
deactivation clroot
cladmin
Exit Cluster Admin screen File - Exit All All
No selection
View Help Help - Content (*3) All All
No selection
View version Help - About All All
No selection
*1 Set Initial Configuration menu can be selected only if the resource database has not been set up. This menu item is not displayed in the
pop-up menu.
*2 Only the disk resources that are registered to Global Disk Services are enabled.
*3 Help for the CRM main window is displayed with a separate browser from the browser that displays Help for CF, RMS, and SIS.
Note
- For information about user groups, see "4.3.1 Assigning Users to Manage the Cluster."
Initial setup
Select this item to set up the resource database to be managed by the cluster resource management facility. Select Tool -> Initial setup
to display the Initial Configuration Setup screen. The initial configuration setup cannot be operated simultaneously from multiple
clients. See "5.1.3.1 Initial Configuration Setup."
Start
This menu item activates the selected resource. The start operation is executed during maintenance work. If the selected resource is
registered to a cluster application, the start operation can be executed only when that cluster application is in the Deact state. Use the
RMS main window to check the cluster application state.
Note
- After completing the maintenance work, be sure to return the resource that you worked on to its state prior to the maintenance.
- If the resource that was maintained is registered to a cluster application, be sure to stop the resource before clearing the Deact state
of the application.
- Yes button
Executes resource start processing.
- 266 -
- No button
Does not execute resource start processing.
Stop
This menu item deactivates the selected resource. The stop operation is executed during maintenance work. If the selected resource is
registered to a cluster application, the startup operation can be executed only when that cluster application is in the Deact state. Use the
RMS main window to check the cluster application state.
Note
- After completing the maintenance work, be sure to return the resource that you worked on to its state prior to the maintenance.
- If the resource that was maintained is registered to a cluster application, be sure to stop the resource before clearing the Deact state
of the application.
- Yes button
Executes resource stop processing.
- No button
Does not execute resource stop processing.
Note
If a message is displayed during operating at the CRM main window and the frame title of the message dialog box is "Cluster resource
management facility," then see "3.2 CRM View Messages" and "Chapter 4 FJSVcluster Format Messages" in "PRIMECLUSTER
Messages."
- 267 -
Icon/ resource Attributes Meaning/attribute value
class name
(Top: Meaning, Bottom: Attribute value)
SHD_SWITCH Switching disk that is used exclusively between two nodes
Disk_Attr This class indicates the physical connection mode and usage mode of a GDS-
managed disk class that can be used from the cluster system.
SDX_DC,
SHD_DISK The disk is physically shared, but the usage mode (shared disk
SDX_SHDDC or switchover disk) is not specified.
SHD_SHARE Shared disk class that allows access from multiple nodes
SHD_SWITCH Switching disk class for exclusive use between two nodes
node_name This item indicates the name of the node in which this LAN board is set.
The node name is set.
Ethernet
WebView This item indicates the network interface to be used by Web-Based Admin
View.
If Web-Based Admin View is being used, USE is set. If not, UNUSE is set.
ip_addr This item indicates the takeover IP address.
If the takeover IP address information is IPv4, this item is set in the format
SHD_Host
XXX.XXX.XXX.XXX. If IP address takeover has not been set, this item is
blank.
If the takeover IP address information is IPv6, the icon or the resource is not
displayed.
- RMS tree
- Configuration information or object attributes
- Switchlogs and application logs
- 268 -
Figure 7.1 RMS main window
Icon Meaning
Represents the cluster.
Represents a node.
Information
State display icons are not displayed in cluster icons. Instead, the RMS cluster table can be displayed. For details, see "7.3.3 Concurrent
Viewing of Node and Cluster Application States."
- 269 -
Icon Icon color Outline Details
Note
The node states in the RMS main window of Cluster Admin may be displayed as Unknown. In this case, exit the Web-Based Admin View
screen and restart. If the node states in the RMS main window of Cluster Admin are still displayed as Unknown, check the node states by
using hvdisp -a.
Green with vertical blue Stand By Object is in such a state that it can be quickly
lines brought Online when needed.
Blue with vertical red OfflineFault Object is Offline, but a fault has occurred
lines before and is not cleared yet.
Orange in the left and Maintenance-Online Object is in maintenance mode and must be
green in the right Online when exiting maintenance mode.
Orange in the left and Maintenance-Offline Object is in maintenance mode and must be
blue in the right Offline when exiting maintenance mode.
- 270 -
Icon Icon color Outline Details
Orange in the left and Maintenance-Stand By Object is in maintenance mode and must be
green in the right with Stand By when exiting maintenance mode.
vertical blue lines
Pop-up menu
If you right-click an object in the RMS tree, a pop-up menu lists the operations for that object. You can also use this menu for monitoring
the state.
Note
- The following icons may be displayed in the userApplication object or the gResource object.
: This icon is displayed at the right side of the userApplication object state icon. It means that only some resources under the
userApplication are started. For details, see "7.2.3 Resource Operation."
: This icon is displayed at the right side of the gResource object. It means that a resource fault occurred in the past. For details, see
"7.3.5 Fault Traces of Resources" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and
Administration Guide."
: This icon is displayed at the right side of the userApplication object state icon. It means that status of some resources in the
userApplication has changed from the status just before the start of maintenance mode. To exit the maintenance mode, all the resource
status in userApplication must be changed back to the original status just before the start of maintenance mode. For more information,
refer to "7.2.2.6 Entering maintenance mode for Cluster Application."
- : Though this icon indicates that the resource fault occurred in the past, it has nothing to do with the current state of the resource.
For this reason, this icon is subsequently shown as "Fault Traces of Resources."
If you want to check the current state of the resource, check the resource object state.
This icon is hidden in any of the following cases:
(*) When the cluster application is in the Faulted state, you need to clear the Faulted state if you specify the cluster application for
switchover again.
- 271 -
- In the RMS tree, only the status of the second level userApplication object of some system nodes is displayed while the status of the
third and fourth level objects is not displayed. This event occurs when OS of the system node is restarted or Web-Based Admin View
is restarted while Cluster Admin is running. To recover from such an event, select and right-click the object of the target system node
on the RMS tree, then select "Connect" from the pop-up menu. The RMS tree is updated to the latest state, and the status of third and
fourth level objects is displayed.
- 272 -
7.2 Operating the PRIMECLUSTER System
Note
To stop two or more nodes at the same time, it is necessary to first stop RMS.
Note that the user application is also stopped when you stop RMS. For instructions on stopping RMS, see "7.2.1.2 Stopping RMS."
Operation Procedure:
From the top screen of Web-Based Admin View, open Cluster Admin according to the following procedure:
1. Choose the node you want to start from the cluster tree in the RMS main window.
2. Right-click on the node and select [Start RMS] from the pop-up menu.
See
See "7.1.1 Starting RMS" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration
Guide."
Operation Procedure:
1. Use the Tool pull-down menu on the RMS main window or right-click the system node, and then select the shutdown mode on the
screen that appears next.
- 273 -
See
See "7.1.3 Stopping RMS" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration
Guide."
Operation Procedure:
1. On the RMS tree in the RMS main window, right-click the cluster application to be started, and select Online from the pop-up menu
that is displayed.
The cluster application will start.
Information
You can also display the pop-up menu by right-clicking the target icon in an RMS graph or the RMS cluster table. For details on RMS graphs
and the RMS cluster table, see "7.3.5 Viewing Detailed Resource Information" and "7.3.3 Concurrent Viewing of Node and Cluster
Application States."
Note
To start a cluster application manually, check that the cluster application and resources under it are stopped on all the nodes other than the
node on which the cluster application is to be started. You can check whether they are stopped by the Offline or Standby state. With the state
other than Offline or Standby, they may be running. In this case, stop them and then start the cluster application on the target node.
Operation Procedure:
1. On the RMS tree in the RMS main window, right-click the cluster application to be stopped, and select Offline from the displayed
pop-up menu.
The cluster application will stop.
Information
You can also display the pop-up menu by right-clicking the target icon in an RMS graph or the RMS cluster table. For details on RMS graphs
and the RMS cluster table, see "7.3.5 Viewing Detailed Resource Information" and "7.3.3 Concurrent Viewing of Node and Cluster
Application States."
- 274 -
Operation Procedure:
1. Right-click on the application object and select the Switch menu option.
A pull-down menu appears listing the available nodes for switchover.
2. Select the target node from the pull-down menu to switch the application to that node.
Information
You can also display the pop-up menu by right-clicking the target icon in an RMS graph or the RMS cluster table. For details on RMS graphs
and the RMS cluster table, see "7.3.5 Viewing Detailed Resource Information" and "7.3.3 Concurrent Viewing of Node and Cluster
Application States."
Operation Procedure:
1. Right-click on the cluster application object in the RMS tree, and select Clear Fault.
Information
You can also display the pop-up menu by right-clicking the target icon in an RMS graph or the RMS cluster table. For details on RMS graphs
and the RMS cluster table, see "7.3.5 Viewing Detailed Resource Information" and "7.3.3 Concurrent Viewing of Node and Cluster
Application States."
Operation Procedure:
1. Check that the node in the Wait state has been stopped. If not, stop the node manually.
2. Check that the CF state is DOWN in the CF main window. If the CF state is LEFTCLUSTER, clear LEFTCLUSTER in the CF main
window and make sure the node state is changed from LEFTCLUSTER to DOWN.
3. If the Wait state of the node has not been cleared after performing 2, right-click on the system node in the RMS graph and select the
"Clear Wait & shutdown (hvutil -u)" from the menu.
Note
If you clear the Wait state of a system node manually, RMS and CF assume that you have already checked that the target node had stopped.
Therefore, if you clear the Wait state when the node has not been stopped, this may lead to the data corruption.
Information
You can also display the pop-up menu by right-clicking the target icon in an RMS graph or the RMS cluster table. For details on RMS graphs
and the RMS cluster table, see "7.3.5 Viewing Detailed Resource Information" and "7.3.3 Concurrent Viewing of Node and Cluster
Application States."
- 275 -
If a cluster application becomes maintenance mode, it cannot be switched.
Note that cluster nodes and resources are monitored during maintenance mode. In this case, when the resource state is changed, the resource
state of the cluster application that is viewed on the RMS tree is also changed.
If the state of a cluster application resource has changed while in maintenance mode, since switching is not carried out, it becomes a state
in which consistency with the resource registered in the cluster application is collapsed. (Example: Some resources are in the Offline state
while others are in the Online state.) Therefore, before exiting the maintenance mode, it is necessary to revert the resource state of the cluster
application to the same state as before starting the maintenance mode.
For using maintenance mode, see "7.4 Using maintenance mode" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard
Tools Configuration and Administration Guide."
Note
Please note the following for using maintenance mode.
- Perform maintenance mode to the cluster application of the standby operation containing resources for which the maintenance is
necessary.
- Since the resources for which the maintenance is necessary during the operation are not contained, it is not necessary to make the cluster
application of the scalable operation into maintenance mode.
- To start maintenance mode, a cluster application must be in the Online, Standby, or Offline state.
- To exit maintenance mode, a cluster application and each resource must be returned in the same state before starting maintenance mode.
- Do not stop RMS or the system with cluster applications in maintenance mode. Be sure to exit maintenance mode of all cluster
applications before stopping RMS or the system.
- Use maintenance mode only when applicable products are specified in the environment that uses PRIMECLUSTER products.
- When the cluster application that includes Cmdline resource that sets the NULLDETECTOR flag is in maintenance mode, the script
that was set to the Cmdline resource must correspond to the maintenance mode. For details, see "6.11.2.1.4 Notes When Setting the
NULLDETECTOR Flag."
For details, see "7.4.2 Maintenance mode operating notes" or "2.1.7.1 Restrictions during maintenance mode" in "PRIMECLUSTER
Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
Note
- It is assumed that this function is used when you check the behavior of resources during cluster application configuration. Do not
perform any business operations while cluster applications are partially Online.
If you want to carry out business operations without starting a resource, delete that resource from the cluster application. For instructions
on deleting a cluster application, see "10.5 Deleting a Resource."
After using this function, restart the application by the following procedure before starting any business operation, and make sure that
all resources become Online.
1. Stop userApplication.
# hvutil -f userApplication
# hvdisp -a
- 276 -
3. Start userApplication.
# hvdisp -a
- Stop cluster applications in scalable operation whenever you start/stop a resource with scalable configuration individually. After that,
execute the operation on the cluster applications in standby operation that constitute the cluster applications in scalable operation.
- For details, see "7.3 Managing resources" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration
and Administration Guide."
Operation Procedure:
1. On the RMS tree in the RMS main window, right-click the resource to be started, and select [Resource Online] from the pop-up menu.
The resource will start.
Information
Also, the pop-up menu can be displayed by right-clicking on the icon of the RMS graph. For instructions on the RMS graph, see "7.3.5
Viewing Detailed Resource Information."
Operation Procedure:
1. On the RMS tree in the RMS main window, right-click the resource to be stopped, and select [Resource Offline] from the pop-up
menu.
The resource will stop.
Information
Also, the pop-up menu can be displayed by right-clicking on the icon of the RMS graph. For instructions on the RMS graph, see "7.3.5
Viewing Detailed Resource Information."
Check the state of the failed resource first, and then clear the fault trace according to the procedure below.
Operation procedure:
1. Right-click the failed resource in the RMS tree of the RMS main window, and then select [Clear fault trace (hvutil -c)] from the pop-
up menu.
- 277 -
Point
In addition to the hvutil -c command can clear the fault trace, it can be also cleared automatically when the resource becomes Online next
time.
Information
For details on the icon of fault traces of resource, see "7.1.3.1 RMS Tree."
For the method of displaying fault traces of resources, see "7.3.5 Fault Traces of Resources" in "PRIMECLUSTER Reliant Monitor Services
(RMS) with Wizard Tools Configuration and Administration Guide."
The pop-up context menu can be displayed by right-clicking the icon of the RMS graph. For details on the RMS graph, see "7.3.5 Viewing
Detailed Resource Information."
CF state Description
The node has left the cluster unexpectedly, probably from a crash. To
LEFTCLUSTER /
Red ensure cluster integrity, it will not be allowed to rejoin until marked
INVALID
DOWN.
- 278 -
CF state Description
Green
with vertical Route Missing Some cluster interconnects have not been recognized on startup.
blue lines
White UNKNOWN The reporting node has no opinion on the reported node.
Green
with vertical Route Down Some cluster interconnects are not available.
blue lines
This icon shows any of the following status:
UNCONFIGURED - CF has not been set.
Gray /UNLOADED
/LOADED - The CF driver has not been loaded.
- The CF driver has been loaded but CF is not started.
- Online
- Wait
- Offline
- Deact
- Faulted
- Unknown
- Inconsistent
- Stand By
- Warning
- OfflineFault
- 279 -
- Maintenance
- Maintenance-Online
- Maintenance-Offline
- Maintenance-Stand By
See
See "State display of other objects" in "7.1.3.1 RMS Tree."
The first line shows the names of the nodes that RMS is managing (fuji2 and fuji3 in the example above). To the left of each node name is
a state display icon that shows the state of that node.
The second and subsequent lines show the names of all cluster applications that RMS is managing and the states of those applications.
The RMS cluster table enables you to display the states of nodes and cluster applications in one table.
Viewing the RMS Cluster Table
If the background color of the cluster application name is the same as that of the background of the window
It indicates that the cluster application is online.
If the background of the cluster application name is pink
This condition indicates that the cluster application is in the Faulted state and a failure has occurred in one or more SysNode.
If the background of the cluster application name is sky blue
This condition indicates that the cluster application is in the Offline state.
If the state display icon of a cluster application is enclosed in a rectangle
This condition indicates that the node has the highest priority among those nodes that configure the cluster application. If the cluster
application is started after creating the cluster application, the node in a rectangle will be in the Online state.
Displaying/hiding state names
Select the Show State Names checkbox to display state names to the right of the state display icons.
See
For details on the RMS cluster table, see "6.1 Using the RMS clusterwide table" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
- 280 -
7.3.4 Viewing Logs Created by the PRIMECLUSTER System
There are two types of logs that can be viewed in the PRIMECLUSTER system:
- Switchlog
The switchover requests or failures that occur in nodes are displayed.
- Application log
The operation log of the cluster application is displayed.
- 281 -
Information
The following display formats are enabled for the log. For details, see "6.4 Viewing RMS log messages" in "PRIMECLUSTER Reliant
Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
- Full graph
Displays the configuration of the entire cluster system in which RMS is running.
- Application graph
Shows all objects used by the specified application. You can check the details of the specific object using this graph.
- Sub-application graph
Lists all sub-applications used by a given application and shows the connections between the sub-applications.
RMS graphs
If you left-click the target object, the attributes of the object will be displayed on a pop-up screen.
- 282 -
See
See "6.2 Using RMS graphs" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration
Guide."
Right-click a node in the RMS tree, and select View Environment. The local variables are displayed.
- 283 -
7.3.7 Monitoring Cluster Control Messages
Select the msg tab, which is found at the bottom of the tree panel. If a new message was added to the text area since the last time the area
was displayed, this tab is displayed in red.
You can clear the message text area or isolate it from the main panel.
Failure detection
Normally, the RMS main window (a) is used to monitor the cluster applications.
- 284 -
- If a failure occurs in a resource or the system
Failover of the userApplication or node panic will occur.
In such a case, you can detect the failure by observing the following conditions:
- The color of the icons in the RMS main window (a) changes.
- A message is output to the msg main window (c), syslog(f), and the console (g).
- By executing the "clreply" command, you can confirm an operator intervention request to which no response has been entered and
start up the userApplication by responding to it. For information on the "clreply" command, see the manual pages.
- The operator intervention request message will be output to syslog(f) and the console (g). By responding to the operator intervention
request message, you can start the userApplication.
For further details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Note
If there are multiple operator intervention request messages for which no response has yet been entered, you need to respond to each
of them.
In addition, you can use the features described in "Failure confirmation features list" to detect the failure.
Cause identification
You can also use the function that detected the failure and the features listed in "Failure confirmation features list" below to identify the
faulted resource that caused the failure.
Failure confirmation features list
- 285 -
Failure confirmation features Manual reference
(g) Console * PRIMECLUSTER Messages
Messages that are displayed on the console or syslog can
be checked.
Viewing the "console problem" information on the
console can help you identify the fault cause.
(h) GDS GUI PRIMECLUSTER Global Disk Services
Configuration and Administration Guide
Note
Console
- The operator intervention request messages (message numbers: 1421, 1423), incurred when RMS is not started on all the nodes, are
displayed only when yes(1) is set for the AutoStartUp attribute of the userApplication. For information on the userApplication attribute,
see "Appendix D Attributes" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and
Administration Guide."
- The operator intervention request messages (message numbers: 1422, 1423) and the error resource messages incurred after a resource
or system error occurs are displayed only when yes(1) is set for the PersistentFault attribute of the userApplication. For information on
the userApplication attribute, see "Appendix D Attributes" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools
Configuration and Administration Guide."
- The operator intervention request and error resource messages are displayed by using the "clwatchlogd" daemon to monitor switchlog.
You need to send the SIGHUP signal to clwatchlogd when you change the value of RELIANT_LOG_PATH that is defined in the
"hvenv.local" file. When clwatchlogd receives this signal, clwatchlogd acquires the latest value of RELIANT_LOG_PATH. After you
change RELIANT_LOG_PATH, you must start RMS.
Note
When you check the message of a resource failure, a resource with the "MONITORONLY" attribute may be in the fault state even if the
cluster application is in the Offline state. Check whether there are any resources in the fault state. Especially, check that Fsystem resources
are not in the fault state.
Note
If you are using an operation management product other than a PRIMECLUSTER product, you may need to take corrective actions
prescribed for that product.
For details, see the manual provided with each operation management product.
(Example) Symfoware
- 286 -
3. Clear the fault trace of the failure resource
Clear the fault trace of the failure resource. For more information, refer to "7.2.3.3 Clearing Fault Traces of Resources."
- Hardware error
- Error on LAN card, hub, or cable
- Connection error
- Network configuration error
- Configuration error on IP address, netmask, or routing information, etc.
Contact your system administrator on the network configuration error. The following section describes how to fix hardware related errors.
If any heartbeat error on the cluster interconnect is detected, either of the following messages will be output to the /var/log/messages file.
"CF: Problem detected on cluster interconnect NIC_NAME to node NODE_NAME: missing heartbeat replies.
(CODE)"
"CF: Problem detected on cluster interconnect NIC_NAME to node NODE_NAME: ICF route marked down.
(CODE)"
"NIC_NAME" indicates the network interface card on which the error is detected.
"NODE_NAME" indicates the CF node name on which the error is detected.
"CODE" indicates the necessary information to determine the cause.
When either of the above messages is output to the file, follow the steps below.
Corrective action
1. Determining the failed node
Confirm that each device is working properly. You can also use the ping command to determine the failed node and its location.
Note
When an error on the entire cluster interconnects (all interconnects for every node) occurs, the cluster system forcibly shut down all
the nodes except one which has the highest survival priority.
For details on survival priority, see "5.1.2 Setting up the Shutdown Facility."
If an error on an active node (e.g. LAN card error of a node on which an active cluster application resides) occurs, you must stop the
node before fixing it. To minimize the down time, make sure to follow the steps below before performing "Step 2. Performing
maintenance tasks."
- 287 -
Note
For a LAN card error, the failed node must be stopped to perform the maintenance task.
For an error on cables or hubs, you can perform the maintenance task with the node being active.
3. Recovery
To recover the partial failure of the cluster interconnect, skip to "Step 2. Cluster interconnect recovery" below.
After confirming that the cluster interconnect is recovered successfully, clear the "Faulted" state of the cluster application as
necessary. For details on the operation, see "7.2.2.4 Bringing Faulted Cluster Application to available state."
7.4.2 Corrective Action in the event of the LEFTCLUSTER state when the
virtual machine function is used
If the host OS becomes the panic state or hangs up when the virtual machine is used, the LETCLUSTER state may occur. This section
describes the corrective actions in this case.
- 288 -
Do not stop RMS while RMS is being started
Heartbeats between nodes are interrupted and the node where RMS is stopped may be forcibly shut down.
Stop RMS after completing its startup processing (completing the state transition processing of a cluster application).
If operating systems hang up or slow down on a node in a cluster, a healthy node may be forcibly stopped.
If operating systems hang up or slow down on a node in a cluster due to system load, and so on, CF or RMS detects LEFTCLUSTER and
stop the Shutdown Facility stops the node forcibly.
The Shutdown Facility forcibly stops a node according to the survival priority. Therefore, when the hang-up and slowdown of operating
systems on the failed node are recovered before a healthy node forcibly stops the failed node, the healthy node may be forcibly stopped first.
When a system volume on a disk device cannot be referred to because all paths failed in a SAN boot /iSCSI
boot configuration, the PRIMECLUSTER failure detection function cannot be operated depending on the
status of the system.
Because the node which cannot refer to the system volume is unstable, set the node to panic status with the following method.
When you can log in cluster nodes other than the relevant node
Stop the relevant node using the sdtool command.
When you start cluster applications manually or confirm the message of a resource failure, check whether
a resource with the "MONITORONLY" attribute has been in the fault state.
If you start or switch over cluster applications before the failure of the resource with the "MONITORONLY" attribute is solved, cluster
inconsistencies or data corruption may occur.
- 289 -
When you set Firewall and use the state module in Firewall, do not restart the iptables service or the
ip6tables service during PRIMECLUSTER operation.
When using the state module in Firewall, restarting the iptables service or the ip6tables service triggers initializing information of the
communication status, and subsequent communication may not work correctly. Neither applications nor PRIMECLUSTER can work
correctly, when you change the setting of Firewall, perform one of the following operations:
The following error messages may be output to the console and syslog during system startup in RHEL7
environment
The following messages may be output to the console and syslog during system startup in RHEL7 environment. This does not disrupt
ongoing operation.
kernel: Request for unknown module key 'FUJITSU Software: Fujitsu BIOS DB FJMW Certificate:
Hexadecimal, forty-digit' err -11
kernel: Disabling lock debugging due to kernel taint
kernel: clonltrc: module license 'Proprietary' taints kernel.
kernel: clonltrc: module verification failed: signature and/or required key missing - tainting kernel
kernel: sfdsk_lib: module verification failed: signature and/or required key missing - tainting kernel
kernel: sha: module license 'Proprietary' taints kernel.
kernel: sha: module verification failed: signature and/or required key missing - tainting kernel
kernel: symsrv: module license 'Proprietary' taints kernel.
kernel: symsrv: applying kernel_stack fix up
kernel: symsrv: module verification failed: signature and/or required key missing - tainting kernel
kernel: cf: applying kernel_stack fix up
kernel: poffinhibit_ipdv: module verification failed: signature and/or required key missing -
tainting kernel
Note
A node where RMS is not running could be forcibly killed before the cluster application or the resource is forcibly started on another node
to reduce the risk of data corruption.
To perform forced startup of a cluster application or a resource safely, check whether RMS is running on all the nodes in the cluster before
starting forced startup according to the following procedure, and if there are the nodes on which RMS is not running, then shut down the
nodes.
- 290 -
- Check the CF tree of the Cluster Admin.
2. Check the following contents for the node states, and take corrective actions if necessary:
- Check the node states are all UP.
- If a LEFTCLUSTER node exists, recover CF from the LEFTCLUSTER state.
For details, see "PRIMECLUSTER Cluster Foundation Configuration and Administration."
- If a node with DOWN or UNKNOWN exists, or if a node for which the state is not displayed exists, check whether the operating
system of the node has stopped. If the operating system is running, shut down the operating system or restart OS in single-user
mode.
3. Check whether some nodes on which RMS is not running exist among the nodes on which the cluster application or the resource will
be forcibly started by one of the following methods:
- Execute the hvdisp -a command on nodes where the cluster application or the resource will be started and check that the state of
objects whose Type is SysNode is Online.
fuji2# hvdisp -a
- 291 -
- Check that the states of all SysNode displayed in the RMS tree of the Cluster Admin are Online.
4. If nodes which satisfy the following conditions exist, shut down the operating system of the nodes, or restart OS in single-user mode.
- The node state is UP, and
- The state of SysNode is not Online.
5. Execute the Forced switch (hvswitch -f) to forcibly start the cluster application or the resource.
Table 7.2 Failures detected with a heartbeat and its detection time of heartbeat timeout (CF and RMS))
Failure type detected with a heartbeat Detection time of heartbeat timeout
(default)
CF - System hangs on the kernel layer level 10 seconds
(*1): When using the monitoring agent of PRIMECLUSTER, the monitoring agent detects it immediately
(*2): In the environment where the ELM heartbeat (RMS heartbeat) is available, the ELM heartbeat detects it immediately (the ELM
heartbeat is available in 4.2A00 or later as default).
(*3): As an example, there is a double fault.
Note
The error detected by a CF heartbeat effects well on the operation. Therefore, the detection time of heartbeat timeout (detection time) is set
shorter than RMS detection time.
If you set the detection time of CF shorter than that of RMS, the following warning message is output during RMS startup.
- 292 -
(BM, 4) The CF cluster timeout <cftimeout> exceeds the RMS timeout <rmstimeout>. This may result in
RMS node elimination request before CF timeout is exceeded. Please check the CF timeout specified in /
etc/default/cluster.config and the RMS heartbeat miss time specified by hvcm '-h' option.
- 293 -
cron entry Execution Contents
name interval
(default setting
value)
Change the cron configuration so that sflogcontrol midnight is executed once a day.
hvcleanupnfs Once a day (at Execute a recovery processing required for the RFS (NFS file system) resource.
night) Use this cron in the Wizard for NAS (RFS) environment.
Note
Do not delete the entries which PRIMECLUSTER registered to the root user's cron, and do not move them to another user's cron as well.
- 294 -
Part 4 System Configuration Modification
- 295 -
Chapter 8 Changing the Cluster System Configuration
This chapter explains some configuration nodes of PRIMECLUSTER system, and how to add, delete, and change hardware.
Before adding the cluster application or the resource, check "Design (the number of resources)" of PRIMECLUSTER Designsheets to verify
that the number of resource objects and the number of detectors that can be set in the whole PRIMECLUSTER system do not exceed their
maximum values.
After changing the cluster system configuration, use the PRIMECLUSTER environment checking tool to check the PRIMECLUSTER
environment.
For details on checking the PRIMECLUSTER environment, see "6.9 Checking the Cluster Environment."H
Note
- When you change a system board, reconfigure BMC or iRMC used by the shutdown facility.
- When you change a system board or a network interface card, do not restart the network.
- 296 -
Information
You must stop RMS during performing "5. Change the cluster configuration."
However, you do not need to stop RMS if all the following conditions are met because performing "5. Change the cluster configuration"
is not necessary under the condition:
- The added shared disk device is registered with the existing class of GDS.
- The added shared disk device is no used as Fsystem resource.
Operation Procedure:
2. Change the device names set in resources of the shared disk device.
Update the device names set in the resources of the existing shared disk device to the current device names.
Execute the following command. For filepath, specify an empty file with absolute path.
# /etc/opt/FJSVcluster/bin/clautoconfig -f filepath
Note
When SDX_UDEV_USE=off is described in the GDS configuration file /etc/opt/FJSVsdx/sdx.cf, do not execute the clautoconfig
command.
See
To register resources, see "5.1.3.2 Registering Hardware Devices."
See
For information on how to set up GDS and create Gds resources, see "6.3 GDS Configuration Setup," "6.7.3.3 Preliminary Setup for
Gds Resources," and "6.7.3.4 Setting Up Gds Resources."
- Fsystem resource
- Gds resource
See
For information on how to change the cluster configuration, see "10.3 Changing the Cluster Configuration."
- 297 -
8.1.1.2 Adding a Network Interface Card Used for the Public LAN and the Administrative
LAN
This section describes how to add a network interface card used for the public LAN and the Administrative LAN.
Operation Procedure:
See
To register resources, see "5.1.3.2 Registering Hardware Devices."
See
For information on how to change the cluster configuration, see "10.3 Changing the Cluster Configuration."
1. Execute the "hvshut" command on each node to stop PRIMECLUSTER RMS as follows. Answer "yes," then only RMS will stop.
The cluster application will remain running.
# hvshut -L
WARNING
-------
The '-L' option of the hvshut command will shut down the RMS
software without bringing down any of the applications.
In this situation, it would be possible to bring up the same
application on another node in the cluster which *may* cause
- 298 -
data corruption.
Do you wish to proceed ? (yes = shut down RMS / no = leave RMS running).
yes
NOTICE: User has been warned of 'hvshut -L' and has elected to proceed.
Add the following line to the end of the "/opt/SMAW/SMAWRrms/bin/hvenv.local" file on each node.
export HV_RCSTART=0
It is necessary to perform the procedure above so that RMS will not automatically start immediately after OS startup.
# sdtool -e
LOG3.013806902801080028 11 6 30 4.5A00 SMAWsf : RCSD returned a
successful exit code for this command
3. Perform the following operation on each node to change the timeout value of PRIMECLUSTER CF:
- Add the following line to the "/etc/default/cluster.config" file.
CLUSTER_TIMEOUT "600"
# cfset -r
# cfset -g CLUSTER_TIMEOUT
>From cfset configuration in CF module:
Value for key: CLUSTER_TIMEOUT --->600
#
4. Use DR.
See
For DR operation, refer to the related hardware manual.
5. Perform the following operation on each node to return the timeout value of PRIMECLUSTER CF to the default value:
- Change the value of CLUSTER_TIMEOUT defined in "/etc/default/cluster.config" file earlier to 10.
Before change
CLUSTER_TIMEOUT "600"
After change
CLUSTER_TIMEOUT "10"
# cfset -r
# cfset -g CLUSTER_TIMEOUT
>From cfset configuration in CF module:
Value for key: CLUSTER_TIMEOUT --->10
#
- 299 -
6. Execute the "sdtool" command on each node to start the PRIMECLUSTER SF.
# sdtool -b
7. Check if PRIMECLUSTER SF is running. (The following indicates an output example of a two-node configuration)
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
node0 SA_mmbp.so Idle Unknown TestWorked InitWorked
node0 SA_mmbr.so Idle Unknown TestWorked InitWorked
node1 SA_mmbp.so Idle Unknown TestWorked InitWorked
node1 SA_mmbr.so Idle Unknown TestWorked InitWorked
# hvcm
Starting Reliant Monitor Services now
9. RMS must be running on all the nodes. Check if each icon indicating the node state is green (Online) in the RMS main window of
Cluster Admin.
Finally, remove the following line from "/opt/SMAW/SMAWRrms/bin/hvenv.local" file on each node.
export HV_RCSTART=0
Note
- If you plan to use DR, be sure to verify a cluster system during cluster configuration using the above steps.
- If a node failure (such as a node panic or reset) or a hang-up occurs due to hardware failure and so on during step 1 through 7, you need
to follow the procedure below to start the cluster application, which was running on the node where DR is used, on a standby node.
1. If a hang-up occurs, stop the failed node forcibly, and then check that the node is stopped.
2. Mark the node DOWN by executing the "cftool" command on any of the nodes where a failure has not been occurred and
specifying the node number and CF node name for failed nodes. However, if the state of the failed node is not LEFTCLUSTER,
wait until the node becomes LEFTCLUSTER, and then execute the "cftool -k" command.
# cftool -n
Node Number State Os Cpu
node0 1 UP Linux EM64T
node1 2 LEFTCLUSTER Linux EM64T
# cftool -k
This option will declare a node down. Declaring an operational
node down can result in catastrophic consequences, including
loss of data in the worst case.
If you do not wish to declare a node down, quit this program now.
- 300 -
3. Perform Steps 5 through 9 on all the nodes where no failure occurred, and then start RMS. If the cluster application is in an active
standby configuration, execute the "hvswitch -f " command to force the cluster application to go Online. For details on the
"hvswitch" command, see the description of the -f option of the online manual page for the command.
# hvswitch -f userApplication
The use of the -f (force) flag could cause your data to be corrupted and could cause your node
to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular
RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the
cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk
of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not
running.
Do you wish to proceed ? (default: no) [yes, no]:yes
#
4. After restoring the failed node, perform step 5 through 9 on the appropriate node to start RMS.
- 301 -
Figure 8.3 Procedure to delete a shared disk device
Operation Procedure:
- Fsystem resource
- Gds resource
See
To change the configuration of a cluster application and delete resources, see "10.3 Changing the Cluster Configuration" and "10.5
Deleting a Resource."
See
For deleting a GDS object, see "Removing Configuration" of "PRIMECLUSTER Global Disk Services Configuration and
Administration Guide."
3. Change the device names set in resources of the shared disk device.
Before deleting resources, update the device names set in the resources to the current device names. Execute the following command.
For filepath, specify an empty file with absolute path.
# /etc/opt/FJSVcluster/bin/clautoconfig -f filepath
- 302 -
After executing the "cldelrsc" command, execute the following command to inform that resources are deleted to GDS.
Specify the full path of an empty file for filepath.
# /etc/opt/FJSVcluster/bin/clautoconfig -f filepath
Note
- When the shared disk device, from which resources are to be deleted, is registered to a GDS class, delete the shared disk device
from the GDS class first, and then delete resources of the shared disk device. To delete the shared disk device from a GDS class,
see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
- When SDX_UDEV_USE=off is described in the GDS configuration file /etc/opt/FJSVsdx/sdx.cf, do not execute the
clautoconfig command.
6. Change the device names set in resource of the shared disk device.
By deleting the shared disk device, any device name of the shared disk device which has not been deleted may be changed. To modify
the device name of the resource of the shared disk device according to the correct device name, execute the following command.
Specify the full path of an empty file for filepath.
# /etc/opt/FJSVcluster/bin/clautoconfig -f filepath
Note
When SDX_UDEV_USE=off is described in the GDS configuration file /etc/opt/FJSVsdx/sdx.cf, do not perform Step 6.
8.1.2.2 Deleting a network interface card used for the public LAN and the administrative
LAN
To delete a network interface card used for the public LAN and the administrative LAN, you need to change the cluster configuration which
includes resources of the network interface card to be deleted beforehand.
Operation Procedure:
- 303 -
See
To change the configuration of a cluster application and delete resources, see "10.3 Changing the Cluster Configuration" and 10.5
Deleting a Resource."
Note
A system board equipped with I/O cannot be removed by DR. Before removing a system board, also make sure to estimate that the ongoing
operation can be continued even after the amount of CPU and memory is decreased.
1. Execute the "hvshut" command on each node to stop PRIMECLUSTER RMS as follows. Answer "yes," then only RMS will stop.
The cluster application will remain running.
# hvshut -L
WARNING
-------
The '-L' option of the hvshut command will shut down the RMS
software without bringing down any of the applications.
In this situation, it would be possible to bring up the same
application on another node in the cluster which *may* cause
data corruption.
Do you wish to proceed ? (yes = shut down RMS / no = leave RMS running).
yes
NOTICE: User has been warned of 'hvshut -L' and has elected to proceed.
Add the following line to the end of the "/opt/SMAW/SMAWRrms/bin/hvenv.local" file on each node.
export HV_RCSTART=0
It is necessary to perform the procedure above so that RMS will not automatically start immediately after OS startup.
2. Execute the "sdtool" command on each node to stop the PRIMECLUSTER shutdown facility as follows.
# sdtool -e
LOG3.013806902801080028 11 6 30 4.5A00 SMAWsf : RCSD returned a successful
exit code for this command
3. Perform the following operation on each node to change the timeout value of PRIMECLUSTER CF:
- Add the following line to the "/etc/default/cluster.config" file.
CLUSTER_TIMEOUT "600"
- 304 -
- Execute the following command.
# cfset -r
# cfset -g CLUSTER_TIMEOUT
>From cfset configuration in CF module:
Value for key: CLUSTER_TIMEOUT --->600
#
4. Use DR.
See
For DR operation, refer to the related hardware manual.
5. Perform the following operation on each node to return the timeout value of PRIMECLUSTER CF to the default value.
- First, change the value of CLUSTER_TIMEOUT defined in "/etc/default/cluster.config" file earlier to 10.
Before change:
CLUSTER_TIMEOUT "600"
After change:
CLUSTER_TIMEOUT "10"
# cfset -r
# cfset -g CLUSTER_TIMEOUT
>From cfset configuration in CF module:
Value for key: CLUSTER_TIMEOUT --->10
#
6. Execute the "sdtool" command on each node to start the PRIMECLUSTER shutdown facility.
# sdtool -b
7. Check if the PRIMECLUSTER shutdown facility is running. (The following indicates an output example of a two-node
configuration.)
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
node0 SA_mmbp.so Idle Unknown TestWorked InitWorked
node0 SA_mmbr.so Idle Unknown TestWorked InitWorked
node1 SA_mmbp.so Idle Unknown TestWorked InitWorked
node1 SA_mmbr.so Idle Unknown TestWorked InitWorked
# hvcm
Starting Reliant Monitor Services now
- 305 -
9. RMS must be running on all the nodes. Check if each icon indicating the node state is green (Online) in the RMS main window of
Cluster Admin.
Finally, remove the following line from "/opt/SMAW/SMAWRrms/bin/hvenv.local" file on each node.
export HV_RCSTART=0
Note
- If you plan to use DR, be sure to verify a cluster system during cluster configuration using the above steps.
- If a node failure (such as a node panic or reset) or a hang-up occurs due to hardware failure and so on during step 1 through 7, you need
to follow the procedure below to start the cluster application, which was running on the node where DR is used on a standby node.
1. If a hang-up occurs, stop the failed node forcibly, and then check that the node is stopped.
2. Mark the node DOWN by executing the "cftool" command on any of the nodes where a failure does not occur and specifying the
node number and CF node name for failed nodes. However, if the state of the failed node is not LEFTCLUSTER, wait until the
node becomes LEFTCLUSTER, and then execute the "cftool -k" command.
# cftool -n
Node Number State Os Cpu
node0 1 UP Linux EM64T
node1 2 LEFTCLUSTER Linux EM64T
# cftool -k
This option will declare a node down. Declaring an operational
node down can result in catastrophic consequences, including
loss of data in the worst case.
If you do not wish to declare a node down, quit this program now.
3. Perform Steps 5 through 9 on all the nodes where no failure occurred, and then start RMS. If the cluster application is in an active
standby configuration, execute the "hvswitch -f " command to force the cluster application to go Online. For details on the
"hvswitch" command, see the description of the -f option of the online manual page for the command.
# hvswitch -f userApplication
The use of the -f (force) flag could cause your data to be corrupted and could cause your node
to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular
RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the
cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk
of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not
running.
Do you wish to proceed ? (default: no) [yes, no]:yes
#
4. After restoring the failed node, perform step 5 through 9 on the appropriate node to start RMS.
- 306 -
8.1.3 Changing Hardware
This section describes how to change hardware.
Operation Procedure:
- Fsystem resource
- Gds resource
See
For details on how to change the cluster application configuration and delete resources, see "10.3 Changing the Cluster
Configuration" and "10.5 Deleting a Resource."
- 307 -
2. Delete a GDS object.
Delete a GDS object related to the shared disk device to be changed.
See
For deleting a GDS object, see "Removing Configuration" of "PRIMECLUSTER Global Disk Services Configuration and
Administration Guide."
3. Change the device names set in resources of the shared disk device.
Before deleting resources, update the device names set in the resources to the current device names. Execute the following command.
For filepath, specify an empty file with absolute path.
# /etc/opt/FJSVcluster/bin/clautoconfig -f filepath
# /etc/opt/FJSVcluster/bin/clautoconfig -f filepath
Note
- When resources of the shared disk device to be deleted are registered to a GDS class, delete the shared disk device from the GDS
class first, and then delete resources of the shared disk device. To delete the shared disk device from a GDS class, see
"PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
- When SDX_UDEV_USE=off is described in the GDS configuration file /etc/opt/FJSVsdx/sdx.cf, do not execute the
clautoconfig command.
6. Change the device names set in resources of the shared disk device.
Before adding resources to the changed shared disk device, update the device names set in the resources to the new device names.
Execute the following command. For filepath, specify an empty file with absolute path.
# /etc/opt/FJSVcluster/bin/clautoconfig -f filepath
Note
When SDX_UDEV_USE=off is described in the GDS configuration file /etc/opt/FJSVsdx/sdx.cf, do not execute the clautoconfig
command.
See
For information on how to register the resource database, see "5.1.3.2 Registering Hardware Devices."
- 308 -
8. Set up Gds resources.
To use GDS, set up GDS and create Gds resources.
See
For information on how to set up GDS and create Gds resources, see "6.3 GDS Configuration Setup" and "6.7.3.4 Setting Up Gds
Resources."
9. Add resources.
If you have deleted Fsystem resources in Step 1, add Fsystem resources.
See
To add resources, see "6.7.3 Setting Up Resources."
8.1.3.2 Changing a network interface card used for the public LAN and the administrative
LAN
To change a network interface card used for the public LAN and the administrative LAN, you need to delete resources of the target network
interface card beforehand. After the change, you need to add resources of the network interface card.
Operation Procedure:
See
For details on how to change the cluster application configuration and delete resources, see "10.3 Changing the Cluster
Configuration" and "10.5 Deleting a Resource."
- 309 -
2. Delete resources of the network interface card to be changed.
Delete resources of the registered network interface card by using the "cldelrsc" command.
For details on the "cldelrsc" command, see the manual page.
See
For information on how to register the resource database, see "5.1.3.2 Registering Hardware Devices."
5. Add resources.
If you have deleted takeover network resources and Gls resources in Step 1, add takeover network resources and Gls resources.
See
To add resources, see "6.7.3 Setting Up Resources."
Note
A network interface card used for cluster interconnects cannot be replaced using PCI Hot Plug. Stop the node and then replace the network
interface card.
2. Check interfaces currently used by executing the following command on all the nodes.
# cfconfig -g
The own node name the cluster name eth3
# cfconfig -d
5. Make sure that the interfaces currently used has been changed by executing the following command on all the nodes.
# cfconfig -g
The own name the cluster name eth4 (Check that eth4 has been displayed).
- 310 -
6. In the environment where the shutdown agent SA_icmp for VMware environment is used, if the cluster interconnect is used to
check whether the node is alive or not, modify /etc/opt/SMAW/SMAWsf/SA_icmp.cfg on each node.
See
For details, see "H.2.3.3 Setting Up the Shutdown Facility (when using I/O fencing function)."
7. Make sure that all the nodes are Online on cf in Cluster Admin. In addition, make sure that each connector is UP.
8. Finish Cluster Admin.
9. Log out from Web-Based-Admin View.
If CF over IP is used
2. If the IP address is not set to the changed interface, edit the /etc/sysconfig/network-scripts/ifcfg-ethX file to set the IP address.
3. When using different IP addresses before and after changing the network interface card, changed the IP address of CF over IP.
For details, refer to "9.2.3 Changing the IP Address of CF over IP."
Skip this step when changing the network interface card only and keeping the same IP address.
4. In the VMware environment using the SA_icmp shutdown agent, if the cluster interconnect is used to check whether the node is
alive or not, modify /etc/opt/SMAW/SMAWsf/SA_icmp.cfg on each node.
See
For details, see "H.2.3.3 Setting Up the Shutdown Facility (when using I/O fencing function)."
7. Make sure that all the nodes are Online on cf in Cluster Admin. In addition, make sure that each connector is UP.
8. Finish Cluster Admin.
9. Log out from Web-Based-Admin View.
- 311 -
Chapter 9 Changing the Cluster System Environment
This chapter describes how to change the configuration information and environmental settings of PRIMECLUSTER system.
Before adding the cluster application or the resource, check "Design (the number of resources)" of PRIMECLUSTER Designsheets to verify
that the number of resource objects and the number of detectors that can be set in the whole PRIMECLUSTER system do not exceed their
maximum values.
After changing the cluster system environment, use the PRIMECLUSTER environment checking tool to check the PRIMECLUSTER
environment.
For details on checking the PRIMECLUSTER environment, see "6.9 Checking the Cluster Environment."
Note
Changing a node name may have a serious impact on the system. Therefore, make this change only when it is absolutely necessary.
Operation Procedure:
1. Stop the CF on the node whose node name is to be changed.
For information on how to stop CF, see "4.6 Starting and stopping CF" in "PRIMECLUSTER Cluster Foundation (CF) Configuration
and Administration Guide."
2. On the node whose node name is changed, change the old host name in the /etc/hosts file to the new host name.
(Example)
[Before change]
10.20.30.40 node1
[After change]
10.20.30.40 nodeA
3. On the node whose node name is changed, change the old host name in the /etc/sysconfig/network file (for RHEL6) and the /etc/
hostname file (for RHEL7) to the new host name.
(Example) for RHEL6
[Before change]
HOSTNAME=node1
[After change]
HOSTNAME=nodeA
node1
[After change]
- 312 -
nodeA
After restarting OS, execute the following procedure for the other node.
5. After r restarting the system, change the old host name in the /etc/hosts file on the other node to the new host name.
Note
If the host name is set in the shutdown facility, correct the "/etc/opt/SMAW/SMAWsf/rcsd.cfg" file on each node. For details, see
"5.1.2 Setting up the Shutdown Facility."
See
For information on how to restart Web-Based Admin View, see "PRIMECLUSTER Web-Based Admin View Operation Guide."
Operation Procedure:
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
CFNameX,weight=weight,admIP=myadmIP: agent=SA_xxx,timeout=timeout
Since the node weight affects the survival priority, see "5.1.2.1 Survival Priority" to determine the value to be set.
3. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
Note
If you use the virtual machine function, this section explains the Public / administrative LAN of the guest OS.
- 313 -
Operation Procedure:
1. Execute the following command on one of the cluster nodes to stop RMS operation.
# hvshut -a
2. Execute the following commands on all the nodes to start the system in single-user mode.
[For RHEL6]
# /sbin/shutdown now
[For RHEL7]
# /bin/mount -a -t ext3
4. Edit the "/etc/hosts" file, and change the IP address on each node.
5. Change the IP address of the public LAN.
For details on how to change the IP address, see the Linux documentation.
6. If the IP address of CF over IP must be changed as the IP address of public LAN is changed, change /etc/default/cluster on each node.
See
For details, refer to "1.1.7 Example of CF configuration by CLI" in "PRIMECLUSTER Cluster Foundation Configuration and
Administration Guide."
7. If the administrative LAN is shared with the public LAN, the IP address of the shutdown facility or the IP address of the shutdown
agent needs to be changed. In this case, change "/etc/opt/SMAW/SMAWsf/rcsd.cfg" and "/etc/opt/SMAW/SMAWsf/SA_xxx.cfg"
on each node.
SA_xxx.cfg indicates the configuration file for the Shutdown Agent.
See
For details, see "5.1.2 Setting up the Shutdown Facility."
8. If an IP address used by Web-Based Admin View also needs to be changed along with the IP address of the public LAN changes,
change it on each node.
See
For details, see "7.1 Network address," "7.3 Management server," and "7.5 Multi-network between server and client by classified use"
in "PRIMECLUSTER Web-Based Admin View Operation Guide."
9. If a takeover IP address must be changed (when the takeover IP address is changed after installation, or when the takeover IP address
is changed due to transfer of the node), correct the IP address being used as the takeover IP address in the "/etc/hosts" file of each node.
When you have created takeover network resources, and change the subnet mask value due to the change of the public LAN, you also
need to edit the /usr/opt/reliant/etc/hvipalias file.
- 314 -
See
For information on how to edit the /usr/opt/reliant/etc/hvipalias file, see "6.7.3.6 Setting Up Takeover Network Resources."
10. If GLS is used with the public LAN, refer to "PRIMECLUSTER Global Link Services Configuration and Administration Guide:
Redundant Line Control Function" and change the IP address of GLS.
11. If the public LAN is shared with the network used for the mirroring among servers, refer to "Changing IP Addresses Used for
Mirroring among Servers" in "PRIMECLUSTER Global Disk Services Configuration and Administration Guide" and change the
settings of each node.
# /sbin/shutdown -r now
[For RHEL7]
Note
If the administrative LAN is shared with the public LAN, do not perform the following procedure, but change the IP address according to
the procedure described in "9.2.1 Changing the IP Address of the Public LAN."
Operation Procedure:
1. Execute the following command on one of the cluster nodes to stop RMS operation.
# hvshut -a
2. Execute the following commands on all the nodes to start the system in single-user mode.
[For RHEL6]
# /sbin/shutdown now
[For RHEL7]
# /bin/mount -a -t ext3
4. Edit the "/etc/hosts" file, and change the IP address on each node.
5. Change the IP address of the administrative LAN.
For details on how to change the IP address, see the Linux documentation.
- 315 -
6. If the IP address of CF over IP must be changed as the IP address of administrative LAN is changed, change /etc/default/cluster on
each node.
See
For details, refer to "1.1.7 Example of CF configuration by CLI" in "PRIMECLUSTER Cluster Foundation Configuration and
Administration Guide."
7. As the IP address of the administrative LAN is changed, the IP address of the shutdown facility or the IP address of the shutdown agent
needs to be changed. In this case, change "/etc/opt/SMAW/SMAWsf/rcsd.cfg" and "/etc/opt/SMAW/SMAWsf/SA_xxx.cfg" on each
node.
SA_xxx.cfg indicates the configuration file for the Shutdown Agent.
See
For details, see "5.1.2 Setting up the Shutdown Facility."
8. If an IP address used by Web-Based Admin View also needs to be changed along with the IP address of the administrative LAN
changes, change it on each node.
9. If the administrative LAN is shared with the network used for the mirroring among servers, refer to "Changing IP Addresses Used
for Mirroring among Servers" in "PRIMECLUSTER Global Disk Services Configuration and Administration Guide" and change the
settings of each node.
# /sbin/shutdown -r now
[For RHEL7]
Note
If the administrative LAN is shared with the public LAN, do not perform the following procedure, but change the IP address according to
the procedure described in "9.2.1 Changing the IP Address of the Public LAN."
Operation Procedure
1. Edit the /etc/default/cluster file on all the nodes in the cluster to change the IP address and the broadcast address.
Edit the file appropriately depending on if the cluster nodes are located in the same network segment or they are located in different
network segments.
If the cluster nodes are located in one of the following environments:
- Different network segments
- K5 environment
- RHOSP environment
- 316 -
If the cluster nodes are located in one of the following environments:
- Physical environment
- KVM environment
- VMware environment
- Make sure that all the nodes have joined the cluster.
Execute the following command on any one node in the cluster system and make sure that all the CF node names are displayed
in "Node" field. Also make sure that UP is displayed in "State" field.
# cftool -n
Example
# cftool -n
Node Number State Os Cpu
node1 1 UP Linux EM64T
node2 2 UP Linux EM64T
Make sure that all the CF node names are displayed in "Node" field, and UP is displayed in "State" field.
# cftool -d
# cftool -d
Number Device Type Speed Mtu State Configured Address
4 /dev/ip0 6 n/a 1392 UP YES 0a.00.00.c9.00.00
5 /dev/ip1 6 n/a 1392 UP YES 0a.00.00.ca.00.00
Make sure that only /dev/ipX is displayed in "Device" field (X indicates the number of cluster interconnects ranged from 0 to 3).
Operation Procedure:
1. Start all the nodes that constitute the cluster system.
If the nodes are already operating, you do not need to restart them.
- 317 -
3. While referring to the cip.cf file, confirm the CIP name to change the IP address.
For details on the cip.cf file, see "1.2 CIP configuration file" in "PRIMECLUSTER Cluster Foundation (CF) Configuration and
Administration Guide" and the manual page describing cip.cf.
4. For the IPv6 address, edit the cip.cf file and change the IP address corresponding to the CIP name.
When the original address and the modified address are both IPv4, you do not need to change it.
Perform this procedure on all the nodes constituting the cluster system.
5. Change the IP address of the CIP name that is defined in the hosts(5) file.
Perform this procedure on all the nodes constituting the cluster system.
6. In the environment where the shutdown agent SA_icmp for VMware environment is used, if the cluster interconnect is used to check
whether the node is alive or not, modify /etc/opt/SMAW/SMAWsf/SA_icmp.cfg on each node.
See
For details, see "H.2.3.3 Setting Up the Shutdown Facility (when using I/O fencing function)."
8. Use the ciptool command to confirm that the IP address of CIP was changed.
# /opt/SMAW/SMAWcf/bin/ciptool -a
See
For details on the "ciptool" command, see the manual page describing "ciptool".
Note
Do not change anything other than a subnet mask for this file.
9.2.6 Changing the MTU Value of a Network Interface Used for Cluster
Interconnects
This section describes how to change the MTU value of a network interface used for cluster interconnects.
2. Change the MTU value of a network interface used for cluster interconnects.
- 318 -
3. Start CF on all the nodes that constitute the cluster.
For information on how to start CF, see "4.6 Starting and stopping CF" in "PRIMECLUSTER Cluster Foundation (CF) Configuration
and Administration Guide."
Note
The MTU value of a network interface used for cluster interconnects must be the same on all the nodes. If there is a different value on a node,
the node cannot join the cluster.
9.2.7 Changing the IP Address Used for the Mirroring among Servers
To change the IP address used for the mirroring among servers, refer to "Changing IP Addresses Used for Mirroring among Servers" in
"PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
However, if the public LAN or the administrative LAN is shared with the network used for the mirroring among servers, refer to "9.2.1
Changing the IP Address of the Public LAN" or "9.2.2 Changing the IP Address of the Administrative LAN", not the above GDS manual.
Note
Operation Procedure:
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
2. Execute the following command on the node in which IP address is changed to stop MMB asynchronous monitoring daemons.
# /etc/opt/FJSVcluster/bin/clmmbmonctl stop
# /etc/opt/FJSVcluster/bin/clmmbmonctl start
# sdtool -b
5. After the shutdown facility started in Step 4, start the shutdown facility on the remaining nodes.
# sdtool -b
- 319 -
6. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the hardware when displayed as follows though change
of the setting of the shutdown facility is completed.
Note
Operation Procedure:
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
# sdtool -b
4. After the shutdown facility started in Step 3, start the shutdown facility on the remaining nodes.
# sdtool -b
5. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the hardware when displayed as follows though change
of the setting of the shutdown facility is completed.
9.3.1.2 Changing the User Name and Password for Controlling the MMB with RMCP
- 320 -
Operation Procedure
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
2. According to the procedures of MMB, change the user name and password to control MMB by RMCP. If you change the user name
and password for multiple nodes, change them for all the nodes.
3. By executing the following command, change the user name and password of MMB information for MMB asynchronous monitoring
function. If the user name and the password are to be changed on multiple nodes, change the values on all the nodes to be changed.
4. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
5. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the hardware when displayed as follows though change
of the setting of the shutdown facility is completed.
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
2. According to the procedures of MMB, change the user name and password to control MMB by RMCP. If you change the user name
and password for multiple nodes, change them for all the nodes.
3. By executing the following command, change the user name and password of MMB information for iRMC asynchronous monitoring
function. If the user name and the password are to be changed on multiple nodes, change the values on all the nodes to be changed.
Example 2: Changing both user name and password, or changing only password
4. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
- 321 -
5. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the hardware when displayed as follows though change
of the setting of the shutdown facility is completed.
9.3.2.1.1 Using PRIMERGY RX/TX series and BX series with ServerView Resource
Orchestrator Virtual Edition
This section explains how to change the iRMC IP address when using PRIMERGY RX/TX series or BX series with ServerView Resource
Orchestrator Virtual Edition.
Operation Procedure:
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
See
For details on how to define the configuration file, see "5.1.2.3.3 Setting up IPMI Shutdown Agent."
4. Execute the following command on any node to apply changes of the configuration file.
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
5. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
6. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
- 322 -
Note
There is a possibility that the mistake is found in the configuration setting of the agent or hardware when displayed as follows though
changing the setting of the shutdown facility is completed.
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
# sdtool -b
4. After the shutdown facility started in Step 3, start the shutdown facility on the remaining nodes.
# sdtool -b
5. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the hardware when displayed as follows though changing
the setting of the shutdown facility is completed.
9.3.2.2.1 Using PRIMERGY RX/TX series and BX series with ServerView Resource
Orchestrator Virtual Edition
This section explains how to change the user name and password for iRMC when using PRIMERGY RX/TX series or BX series with
ServerView Resource Orchestrator Virtual Edition.
Operation Procedure:
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
2. Change the user name and password according to the procedure for iRMC.
- 323 -
3. Encrypt the password.
# /opt/SMAW/SMAWsf/bin/sfcipher -c
Enter Password:
Re-Enter Password:
D0860AB04E1B8FA3
4. Define the changed user name and the encrypted password for iRMC in the Shutdown Agent configuration file.
See
For details on how to define the configuration file, see "5.1.2.3.3 Setting up IPMI Shutdown Agent."
5. Execute the following command on any node to apply changes of the configuration file.
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
6. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
7. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the agent or hardware when displayed as follows though
changing the setting of the shutdown facility is completed.
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
2. According to the procedures of iRMC, change the user name and password.
If you change the user name and password for multiple nodes, change them for all the nodes.
3. By executing the following command, change the user name and password of iRMC information for iRMC asynchronous monitoring
function.
If the user name and the password are to be changed on multiple nodes, change the values on all the nodes to be changed.
- 324 -
Example 2: Changing both user name and password, or changing only password
4. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
5. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the hardware when displayed as follows though changing
the setting of the shutdown facility is completed.
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
See
For details on how to define the configuration file, see "5.1.2.3.4 Setting up Blade Shutdown Agent."
4. Execute the following command on any node to apply changes of the configuration file.
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
5. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
6. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
- 325 -
Note
There is a possibility that the mistake is found in the configuration setting of the agent or hardware when displayed as follows though
changing the setting of the shutdown facility is completed.
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
2. Change the slot position according to procedure for the server blade.
3. Define the changed slot number of the server blade in the Shutdown Agent configuration file.
See
For details on how to define the configuration file, see "5.1.2.3.4 Setting up Blade Shutdown Agent."
4. Execute the following command on any node to apply changes of the configuration file.
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
5. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
6. Execute the following command on all the nodes and check that the shutdown facility operates normally.
# sdtool -s
Note
There is a possibility that the mistake is found in the configuration setting of the agent or hardware when displayed as follows though
changing the setting of the shutdown facility is completed.
- 326 -
9.4.1 Changing Host OS Settings (KVM environment)
This section describes how to change the settings of the shutdown facility when changing the settings of the host OS in the environment
where the KVM virtual machine function is used.
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
See
For details on how to define the configuration file, see "5.1.2.6.2 Setting up libvirt Shutdown Agent."
3. For the host OS IP addresses (ip-address) you want to change, log in as a shutdown facility user on all guest OSes (nodes) in advance,
as you need to authenticate yourself (create the RSA key), which is required when using SSH for the first time.
4. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
# /opt/SMAW/SMAWsf/bin/sfcipher -c
Enter Password:
Re-Enter Password:
Xh+kSlJ8nlQ=
See
For details on how to define the configuration file, see "5.1.2.6.2 Setting up libvirt Shutdown Agent."
4. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
- 327 -
9.4.1.3 Changing the Settings in /etc/sysconfig/libvirt-guests
This section explains the procedure for changing the settings in /etc/sysconfig/libvirt-guests after installing the PRIMUCLUSTER system
in a KVM environment.
Operation procedure
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
See
For details on the settings in /etc/sysconfig/libvirt-guests, see "Setting the guest OS in the host OS (in a KVM environment)" for each
virtual environment shown below:
- When building a cluster system between guest OSes on one host OS, see "3.2.1.2 Host OS setup (after installing the operating
system on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes without using Host OS failover function, see "3.2.2.2
Host OS setup (after installing the operating system on guest OS)."
- When building a cluster system between guest OSes on multiple host OSes using Host OS failover function, see "3.2.3.1.4 Host
OS setup (after installing the operating system on guest OS)."
3. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
- 328 -
Chapter 10 Configuration change of Cluster Applications
This chapter describes how to change the configuration of cluster applications.
Before adding the cluster application or the resource, check "Design (the number of resources)" of PRIMECLUSTER Designsheets to verify
that the number of resource objects and the number of detectors that can be set in the whole PRIMECLUSTER system do not exceed their
maximum values.
Operation flow
Operation Procedure:
1. Stop RMS of all the nodes.
If RMS is running, see "7.2.1.2 Stopping RMS" and stop RMS of all the nodes.
# /opt/SMAW/SMAWRrms/bin/hvw -n testconf
- 329 -
3. Select "Configuration-Generate" from the "Main configuration menu."
# /etc/opt/FJSVcluster/bin/clrwzconfig -c
7. If the results of the cluster service check for the PRIMECLUSTER-compatible product shows that the "clrwzconfig" command
output message 8050, re-register the cluster service for the PRIMECLUSTER-compatible product.
Execute the following command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clrwzconfig
- 330 -
8. Start RMS.
Start RMS as described in "7.2.1.1 Starting RMS."
Note
Be sure to stop RMS of all the nodes before deleting a cluster application and its resources. For instructions on stopping RMS, see "7.2.1.2
Stopping RMS."
Procedure
1. Stop RMS of all the nodes.
If RMS is activated, stop RMS of all the nodes as explained in "7.2.1.2 Stopping RMS."
Note
- If you have deleted an available network interface card by mistake, reregister the resources for the accidentally deleted network
interface card by executing the "clautoconfig" command.
- If the shared disk for which resources are to be deleted is registered to a GDS class, first delete the shared disk from the GDS class, and
then delete the resources of the shared disk. For instructions on how to delete a shared disk from a GDS class, refer to
"PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
Note
- If you delete a userApplication, all the resources registered to the userApplication will also be deleted.
- If Gds resources are registered to the userApplication to be deleted, bring the Gds volume online. See "10.5.1 Settings made when
deleting a Gds resource."
Operation Procedure:
1. Log in to any one of the cluster nodes using system administrator access privileges.
2. Start the RMS Wizard.
Execute the "hvw -n configuration file" command. Specify a name of the configuration file in which the userApplication is defined.
- 331 -
The following example shows how to start RMS Wizard with the configuration file name "testconf."
# /opt/SMAW/SMAWRrms/bin/hvw -n testconf
4. Select the userApplication that you want to delete from the "Application selection menu."
The following example shows how to select APP2.
Note
When deleting a cluster application that is performing standby operation as a component of the cluster application in scalable
operation, change the resources of the Controller after deleting the cluster application that is performing standby operation. For details
on how to change the resource of the Controller, see "10.3 Changing the Cluster Configuration."
- 332 -
6. Select "Configuration-Activate" from the "Main configuration menu."
7. Select "QUIT" from the "Main configuration menu" to exit from the RMS Wizard.
Note
If all userApplications are deleted, you do not have to take the remaining steps.
# /etc/opt/FJSVcluster/bin/clrwzconfig -c
9. If the results of the cluster service check for the PRIMECLUSTER-compatible product shows that the "clrwzconfig" command
output message 8050, re-register the cluster service for the PRIMECLUSTER-compatible product.
Execute the following command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clrwzconfig
- 333 -
Operation flow
Operation Procedure:
1. Stop RMS of all the nodes.
If RMS is running, see "7.2.1.2 Stopping RMS" and stop RMS of all the nodes.
2. Change the configuration of the cluster applications with the RMS Wizard.
1. Log in to any one of the cluster nodes using system administrator access privileges.
2. Start up the RMS Wizard.
Execute the "hvw -n configuration file" command. Specify the name of the configuration file in which the configuration is
defined.
The following example shows how to start up RMS Wizard with the configuration file name "testconf."
# /opt/SMAW/SMAWRrms/bin/hvw -n testconf
1. Select the userApplication that needs modification of configuration from "Application selection menu." If more than one
selection item is displayed, select userApplication written in capital letters. The following example shows how to select
"APP1."
- 334 -
2. When "turnkey wizard" appears, select what you want to change from the following table.
For details on the operation when you select above items, see "6.7 Setting Up Cluster Applications." After you change the
configuration, select "SAVE+EXIT" to return to the "Application selection menu." After that, select "RETURN" to return to
the "Main configuration menu."
The following example shows how to change the attribute of "AutoStartUp" of the userApplication setting from "no" to "yes":
- 335 -
2. Select "AutoStartUp."
3. Select "yes."
- 336 -
4. Confirm that "AutoStartUp" is changed to "yes," and then select "SAVE+EXIT."
- 337 -
Note
For information on how to change a cluster application performing standby operation and which forms part of a cluster
application in a scalable operation, see "When a cluster application that is performing standby operation is to be changed."
1. Select the userApplication to be reconfigured from "Application selection menu." If more than one selection item is
displayed, select userApplication written in capital letters. The following example shows how to select "APP3."
3. "Settings of application type "Controller"" is displayed. Select one of the following according to the contents to be
changed:
[Supplement]
A number is specified in the "*" mark included in "Controllers[*]". Select the cluster application in a standby operation
that you want to delete. You can delete a cluster application in a standby operation by specifying "NONE" on the screen
after the selection.
For details on the operation to be performed after making the above selection, see "6.7 Setting Up Cluster
Applications." After you change the configuration, select "SAVE+EXIT" to return to the "Application selection menu."
After that, select "RETURN" to return to the "Main configuration menu."
The following is an example in which the "AutoStartUp" attribute of the userApplication is changed to "yes" from "no."
- 338 -
2. Select "(AutoStartUp=no)" from the "Machines+Basics" menu.
3. Select "yes."
- 339 -
4. Check that "AutoStartUp" has been changed to "yes," and then select "SAVE+EXIT."
- 340 -
3. Select "Configuration-Generate" from the "Main configuration menu."
# /etc/opt/FJSVcluster/bin/clrwzconfig -c
7. If the results of the cluster service check for the PRIMECLUSTER-compatible product shows that the "clrwzconfig" command
output message 8050, re-register the cluster service for the PRIMECLUSTER-compatible product.
Execute the following command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clrwzconfig
- 341 -
8. Start RMS.
Start RMS as described in "7.2.1.1 Starting RMS."
Operation flow
Operation Procedure:
1. Stop RMS of all the nodes.
If RMS is running, see "7.2.1.2 Stopping RMS" and stop RMS of all the nodes.
2. Register the new resources to the cluster application with the RMS Wizard.
1. Log in to any one of the cluster nodes using system administrator access privileges.
2. Start up the RMS Wizard.
Execute the "hvw -n configuration file" command. Specify the name of the configuration file in which the configuration is
defined.
The following example shows how to start up RMS Wizard with the configuration file name "testconf."
# /opt/SMAW/SMAWRrms/bin/hvw -n testconf
4. Select a registered userApplication for adding resources from the "Application selection menu."
The following example shows how to select "APP1."
- 342 -
5. Register the added resources.
See "6.7.3 Setting Up Resources" and register the added resources.
- 343 -
5. Select "QUIT" from the "Main configuration menu."
# /etc/opt/FJSVcluster/bin/clrwzconfig -c
7. If the results of the cluster service check for the PRIMECLUSTER-compatible product shows that the "clrwzconfig" command
output message 8050, re-register the cluster service for the PRIMECLUSTER-compatible product.
Execute the following command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clrwzconfig
8. Start RMS.
Start RMS as described in "7.2.1.1 Starting RMS."
Note
- If the Gds resource was deleted, setting for the GDS shared class is required.
See "10.5.1 Settings made when deleting a Gds resource."
- When deleting a procedure resource, first delete the procedure resource from the cluster resource management facility after deleting the
procedure resource from the cluster application. For details on how to delete a procedure resource from the cluster resource management
facility, see "D.3 Deleting a Procedure Resource."
- When deleting an Fsystem resource, delete the mount point that was being used as the resource (mount point of the line beginning with
"#RMS#") from /etc/fstab.pcl on all the nodes.
- When deleting takeover network resource, delete entries added at the time of setting up takeover network resource from the following
environment files:
- /usr/opt/reliant/etc/hvipalias
- /etc/hosts
- To delete the resource (Gds resource or Fsystem resource) that controls the shared disk in the VMware environment where the I/O
fencing function is used, make sure that userApplication is Offline on all the nodes before stopping RMS.
If an error such as a resource failure or an OS panic has occurred right before stopping RMS, take the following steps first and then delete
the resource:
- 344 -
2. Start userApplication once and then stop it.
3. Make sure that userApplication stopped in step 2 becomes Offline successfully.
Operation Procedure:
1. Stop RMS of all the nodes.
If RMS is running, see "7.2.1.2 Stopping RMS" and stop RMS of all the nodes.
2. Log in to any one of the cluster nodes using system administrator access privileges.
3. Start the RMS Wizard.
Execute the "hvw -n configuration file" command. Specify the name of the configuration file in which the resource is defined.
The following example shows how to start RMS Wizard with the configuration file name "testconf."
# /opt/SMAW/SMAWRrms/bin/hvw -n testconf
5. Select the userApplication in which the resource is registered from the "Application selection menu." The following example shows
how to select "APP1."
- 345 -
7. In "turnkey wizard", select "SAVE+EXIT" and go back to "Application selection menu." After that, select "RETURN" and go back
to "Main configuration menu."
- 346 -
10. Select "QUIT" from the "Main configuration menu" to exit from the RMS Wizard.
# /etc/opt/FJSVcluster/bin/clrwzconfig -c
12. If the results of the cluster service check for the PRIMECLUSTER-compatible product shows that the "clrwzconfig" command
output message 8050, re-register the cluster service for the PRIMECLUSTER-compatible product.
Execute the following command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clrwzconfig
Operation Procedure:
Execute the following command on the node on which the Gds resource was deleted.
# /opt/SMAW/SMAWRrms/bin/hvgdsetup -d [class-name]
Point
It is possible to change the resources only when RMS is stopped.
- 347 -
Operation flow
Operation Procedure:
1. Stop RMS of all the nodes.
If RMS is running, see "7.2.1.2 Stopping RMS" and stop RMS of all the nodes.
<node name> : When the host name is changed, the CF node name also needs to be changed.
Change the value of this field to the modified CF node name.
<takeover> : Change this host name when the host name associated with the takeover IP address
was changed.
Note
For changing only the IP addresses of takeover network resource but not the host names, it is not necessary to use the RMS Wizard.
See
For details on changing settings with the RMS Wizard, see "8.5 Changing the Operation Attributes of a userApplication."
- 348 -
4) Application-Edit 13) Configuration-Edit-Global-Settings
5) Application-Remove 14) Configuration-Consistency-Report
6) Application-Clone 15) Configuration-ScriptExecution
7) Configuration-Generate 16) RMS-CreateMachine
8) Configuration-Activate 17) RMS-RemoveMachine
9) Configuration-Copy
Choose an action: 4
2. Select the userApplication that needs modification of the configuration from the "Application selection menu."
Edit: Application selection menu (restricted):
1) HELP
2) QUIT
3) RETURN
4) OPTIONS
5) APP1
Application Name: 5
4. Select Interfaces[X] to set the host name to be changed from the "Ipaddresses and ipaliases menu."
Consistency check ...
5. Select the changed host name associated with the takeover IP address.
1) HELP 6) node2RMS
2) RETURN 7) takeover2
3) NONE
4) FREECHOICE
5) node1RMS
Choose an interface name: 7
6. Select "SAVE+RETURN."
Set flags for interface: takeover2
Currently set: VIRTUAL,AUTORECOVER,PING (VAProuter,l3hub)
1) HELP 4) DEFAULT 7) MONITORONLY(M)
2) - 5) BASE(B) 8) NOT:PING(P)
3) SAVE+RETURN 6) NOT:AUTORECOVER(A)
Choose one of the flags: 3
- 349 -
7. Make sure that the changed host name is displayed in Interfaces[X] in the "Ipaddresses and ipaliases menu."
Ipaddresses and ipaliases (Adr_APP1:consistent)
1) HELP 7) Interfaces[0]=VAProuter,l3hub:takeover2
2) NO-SAVE+EXIT 8) PingHostPool[0]=router
3) SAVE+EXIT 9) PingHostPool[1]=l3hub
4) REMOVE+EXIT 10) (NeedAll=yes)
5) AdditionalInterface 11) (Timeout=60)
6) AdditionalPingHost 12) (InterfaceFilter=)
Choose the setting to process:
8. If you have to change multiple objects, repeat Steps 4. to 7. for each object. After completing all changes, select "SAVE
+EXIT."
Note
In the VMware environment where the I/O fencing function is used, make sure that userApplication is Offline on all the nodes before
stopping RMS.
If an error such as a resource failure or an OS panic has occurred right before stopping RMS, take the following steps first and then change
the device:
- 350 -
2. Start userApplication once and then stop it.
3. Make sure that userApplication stopped in step 2 becomes Offline successfully.
Operation Procedure:
1. Stop RMS of all the nodes.
If RMS is running, see "7.2.1.2 Stopping RMS" and stop RMS of all the nodes.
Note
In the dynamic changing configuration, RMS is stopped with the cluster application operating.
When RMS is disabled, a cluster application is not failed over if an error occurs in the cluster application. In this case, to minimize the
shutdown time of RMS, check the following operation procedure carefully, then investigate and sort out the necessary operating steps.
Moreover, disable the failover report function or take another action if necessary when using middleware that notifies an error when RMS
is stopped.
- 351 -
Operation flow
Operation Procedure:
1. Check Cmdline Resources name and Online/Offline Script.
Check the resource name of the Cmdline resource by "hvdisp -T gResource" command when the Cmdline resource is included in the
cluster application.
If the Cmdline resource name contains the resource name that starts with "RunScriptsAlways", the NULLDETECTOR flag is set to
that resource.
Example
When the execution result of the hvdisp command is the following, it can be judged that the NULLDETECTOR flag is set to the
Cmdline resource RunScriptsAlways001_Cmd_APP1 and the Cmdline resource RunScriptsAlways001_Cmd_APP2.
# hvdisp -T gResource
Local System: node01RMS
Configuration: /opt/SMAW/SMAWRrms/build/config.us
- 352 -
It is necessary to add the processing described in "6.11.2.1.4 Notes When Setting the NULLDETECTOR Flag" to the Online/Offline
scripts of the Cmdline resource when the NULLDETECTOR flag is enabled.
Modify the script after stopping RMS according to the following procedure when the necessary processing is not included.
Example
When the execution result of the hvdisp command is the following, the operational node of app1 is node02 and the operational node
of app2 is node01.
# hvdisp -T userApplcation
Local System: node01RMS
Configuration: /opt/SMAW/SMAWRrms/build/config.us
When determining the node that mounts the file system manually according to the following procedure, information of the operation
node of the cluster application is necessary.
Information
For details on starting the volume of GDS and creating file system, see "6.7.3.2 Setting Up Fsystem Resources."
Example
According to the following Step 8, specify an example to add the following line to the /etc/fstab.pcl file.
Execute the command below in the operational node to mount file system.
After mounting, execute the command below to check that if the mount point is displayed (if the file system is mounted).
Additionally, check that the file system is not mounted on the standby node.
5. Stop RMS.
Execute the hvshut -L command on all the nodes to stop RMS when cluster application is still operating.
- 353 -
Enter 'yes' in response to the warning message when the hvshut -L command is executed.
# hvshut -L
WARNING
-------
The '-L' option of the hvshut command will shut down the RMS
software without bringing down any of the applications.
In this situation, it would be possible to bring up the same
application on another node in the cluster which *may* cause
data corruption.
Do you wish to proceed ? (yes = shut down RMS / no = leave RMS running).
yes
# hvdisp -a
hvdisp: RMS is not running
7. Modify the Online/Offline scripts of the Cmdline resources when NULLDETECTOR flag is enabled if necessary.
As a result of the check of Step 1, if the correction is necessary for the Online/Offline scripts of the Cmdline resources when
NULLDETECTOR flag is enabled, see "6.11.2.1.4 Notes When Setting the NULLDETECTOR Flag" to modify the scripts.
# hvcm -a
Note
UserApplication will be Inconsistent state on either or all of the nodes after starting RMS in Step 10 when the mount of file system
is not correctly operated according to Step 4. In this case, perform the following procedures.
1. Execute the hvutil -f command on the standby node so that the state of userApplication on the standby node becomes Offline.
2. When userApplication on the standby node is transited to Standby, execute the hvutil -s command on the standby node.
3. Execute the hvswitch command on the operational node so that the state of userApplication on the operational node becomes
Offline.
- 354 -
Chapter 11 Changing the Operation Attributes of a Cluster
System
- Change methods
- "11.1.1 Changing the Operation Attributes (CUI)"
Explains how to change the operation attributes of the userApplication.
Note
Be sure to stop RMS before you change the operation attributes of userApplication. For instructions on stopping RMS, see "7.2.1.2 Stopping
RMS."
Note
"Application" on the CUI screen indicates a cluster application.
1. Log in to any one of the cluster nodes using system administrator access privileges.
2. Stop RMS.
If RMS is running, see "7.2.1.2 Stopping RMS" and stop RMS.
- 355 -
4. Select "Application-Edit" from the main menu of CUI. Enter a number and then press the Enter key.
5. Select the userApplication for which you want to change the operation attributes from the "Application selection menu."
The following example shows how to select "APP1."
6. When "turnkey wizard "STANDBY"" appears, select "Machines+Basics" and then change the operation attributes of the
userApplication.
- 356 -
7. Select the operation attribute that you want to change from "Machines+Basics."
Select a setup value. Enter a number and then press the Enter key.
Point
Select "RETURN" to return to the previous menu.
- 357 -
If there are multiple attributes to be changed, repeat steps 7 and 8 for each attribute.
If the attribute is other than "OnlinePriority," the menu number in step 8 will be different from that in this example.
9. Select "SAVE+EXIT" from the "Machines+Basics" screen to return to the "turnkey wizard "STANDBY"".
11. Select "Configuration-Generate" and then "Configuration-Activate" from the main menu.
Content changes will be enabled on all the cluster nodes.
- 358 -
Figure 11.9 Configuration distribution (Example of executing Configuration-Activate)
Note
When the processing is successfully done, the message "The activation has finished successfully" appears. If this message is not
displayed, the modified information contains incorrect settings. Check and correct the settings.
- 359 -
14. Check the cluster service for the PRIMECLUSTER-compatible product.
Execute the following command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clrwzconfig -c
15. If the results of the cluster service check for the PRIMECLUSTER-compatible product shows that the "clrwzconfig" command
output message 8050, re-register the cluster service for the PRIMECLUSTER-compatible product.
Execute the following command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clrwzconfig
Information
For instructions on starting RMS, see "7.2.1.1 Starting RMS."
For instructions on starting the cluster application, see "7.2.2.1 Starting a Cluster Application."
See
- For details on hvenv.local, see "1.9 Environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools
Configuration and Administration Guide."
- For details on the RMS environment variables, see "Appendix E Environment variables" in "PRIMECLUSTER Reliant Monitor
Services (RMS) with Wizard Tools Configuration and Administration Guide."
1. The maximum required time to finish the Offline processing of a cluster application
2. The maximum required time to stop BM (base monitor) (30 seconds)
Note
If the value of RELIANT_SHUT_MIN_WAIT is too small, the hvshut may time out often before finishing the Offline processing of a
cluster application. Tune RELIANT_SHUT_MIN_WAIT carefully.
- 360 -
See
For details on RELIANT_SHUT_MIN_WAIT, see "RELIANT_SHUT_MIN_WAIT" of "E.2 Global environment variables" in
"PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
For how to refer to or change the RMS environment variable, see "5.3.4 Displaying environment variables" or "E.1 Setting environment
variables" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
Note
If you set the heartbeat time long, it takes long to detect an error. Therefore, tune the heartbeat time carefully.
To tune the heartbeat time (10 seconds), perform the following procedure:
1. Add the following to the end of the "/etc/default/cluster.config" file on all the nodes configuring a cluster system. To restore the older
file version, take a note of the contents before changing it.
CLUSTER_TIMEOUT "second"
Example: Changing it to 30 seconds
CLUSTER_TIMEOUT "30"
2. To enable the setting value, you need to execute cfset -r at the same time on all the nodes configuring a cluster system.
# cfset -r
# cfset -a
From cfset configuration in CF module:
Note
- If you set the heartbeat time long, it takes long to detect an error. Therefore, tune the heartbeat time carefully.
- If you set the heartbeat time shorter than CF heartbeat time, a warning message is output during RMS startup. For details, see the notes
on "7.6 CF and RMS Heartbeats."
- 361 -
1. Stop a cluster application and RMS on all the nodes.
# hvshut -a
Example
To change the default value from 600 to 800 seconds
-h monitoring timeout (Maximum: 3600)
hvcm -c config -h 800
- 362 -
Part 5 Maintenance
This part explains the procedure for maintaining the PRIMECLUSTER system.
- 363 -
Chapter 12 Maintenance of the PRIMECLUSTER System
This chapter explains items and procedures related to maintenance of the PRIMECLUSTER system.
1. All the nodes of the running PRIMECLUSTER system shall be stopped by the administrator of the PRIMECLUSTER system.
2. Pass the operation over to field engineers.
3. Field engineers shall then perform maintenance of the erroneous location (repair or replacement). Confirm that the system operates
normally by running a test program, etc.
4. After the completion of maintenance by field engineers, check the relevant equipment and then boot the PRIMECLUSTER system.
When job hot maintenance is to be performed
1. The administrator of the PRIMECLUSTER system shall shut down the node that contains the target equipment, so as to separate it
from the operation, and then pass the operation over to field engineers.
For details on how to separate the node from the operation, see "12.2.1 Detaching Resources from Operation."
2. Field engineers shall confirm the target equipment and perform maintenance of the erroneous equipment (repair or replacement).
Operation shall be confirmed by using a test program, etc.
3. After field engineers complete the maintenance and confirm the operation of the relevant equipment, boot the node and then execute
standby restoration for the operation.
For details on standby restoration for the operation, see "12.2.2 Executing Standby Restoration for an Operating Job."
See
For details on how to determine whether the relevant node is operating, see "7.1.3.1 RMS Tree."
Stopping RMS
- 364 -
After confirming that the relevant node is in either the Offline or Standby state, stop RMS running on the relevant node by executing the
"hvshut" command.
See
For details on how to stop RMS, see "7.1.3 Stopping RMS" in "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools
Configuration and Administration Guide."
Stopping a node
Execute the "shutdown(8)" command to stop the relevant node.
Procedure
1. Power on the relevant node.
2. Perform standby restoration for the relevant node (if necessary, subsequently execute failback).
See
For details on how to start up the cluster application, see "7.2.2.1 Starting a Cluster Application." For details on how to execute failover/
failback, see "7.2.2.3 Switching a Cluster Application."
- To apply an intensive correction, you must stop the node temporarily. This means that the job must be stopped, albeit temporarily. You
should consider a maintenance plan to ensure that the maintenance is completed within a specified period. You must also examine the
time and duration of the maintenance to minimize the impact on a job.
- Rolling update is a method by which software is updated while the job continues to operate by executing job failover for a node in a
cluster to separate the standby node from the operation in order to apply corrections to the node one by one.
If you apply this method, the job stop time required for software update can be minimized. To perform update with this method,
however, you must satisfy the prerequisites for rolling update (the items to be corrected must be correctable with rolling update).
To apply this method, you must confirm the contents of the README file for the relevant patch and then contact field engineers.
- 365 -
12.3.2.1 Procedure for Applying Corrections by Stopping an Entire System
This section explains the procedure for applying corrections by stopping the entire cluster system. An example of a two-node 1:1 standby
configuration is used here.
Flow of operation
Procedure
Copy the correction to be applied to each node to the local file system in advance.
1. Stop RMS.
Execute hvshut -a on either cluster node to stop the operation of RMS.
5. Apply corrections.
Apply the corrections that were copied to the local file system in advance.
6. Restart.
After applying the corrections, boot the nodes by using shutdown -r.
Note
- For details on the corrections, refer to the manuals provided with the corrections.
- For details on the standby restoration of cluster applications, see "7.2.2.1 Starting a Cluster Application." For details on failback, see
"7.2.2.3 Switching a Cluster Application."
- 366 -
12.3.2.2 Procedure for Applying Correction by Rolling Update
This section explains the procedure for applying corrections by rolling update. An example of two-node 1:1 standby configuration is used
for this explanation.
Flow of operation
Procedure
1. Shut down the standby node (node1).
To apply corrections to the standby node (node1), shut down the node after stopping RMS.
Note that, as a result of this shutdown, a cutoff state transition occurs and dual instance operation is disabled until standby restoration
is performed.
- 367 -
3. Apply corrections.
Apply the necessary corrections.
9. Apply corrections.
Apply the necessary corrections.
Note
- For details on the corrections, refer to the manuals provided with the corrections.
- For details on standby restoration of cluster applications, see "7.2.2.1 Starting a Cluster Application." For details on failback, see
"7.2.2.3 Switching a Cluster Application."
- 368 -
Appendix A PRIMECLUSTER Products
PRIMECLUSTER products are as follows:
See
For details on the version levels of PRIMECLUSTER products and the range of support, see the manual of each product.
- 369 -
Appendix B Manual Pages
This appendix provides online manual page lists for CF, CIP, operator intervention, PAS, cluster resource management facility, RMS,
shutdown facility (SF), tracing failed resource, SIS, Web-Based Admin View, procedure resource, and the RMS wizards.
To view a manual page, enter the following command:
$ man man_page_name
Note:
To view these manual pages, you must set the MANPATH environment variable so that /etc/opt/FJSVcluster/man is included.
To print a hard copy of a manual page, enter the following command:
$ man man_page_name |col -b |lpr
Note
In some cases, "(1M)" may be output as the section number of the manual page that is displayed with the man command.
Should this occur, assume the section number to be "(8)."
B.1 CF
System administrator
Command Function
cfconfig Configures or unconfigures a node for a PRIMECLUSTER cluster.
cfregd CF registry synchronization daemon
cfset Applies or modifies /etc/default/cluster.config entries into the CF module.
cftool Prints the node communications state of a node or the cluster.
changeng Replaces a node group definition.
deleteng Deletes a node group.
descng Replaces a node group explanation.
detailng Displays the dynamic expansion of a node group.
newng Creates a new node group.
rcqconfig Configures or starts the quorum operation of a cluster system.
rcqquery Acquires the state of consistency (quorum) of the cluster.
showng Displays the name and definition of the node group.
B.2 CIP
System administrator
Command Function
cipconfig Starts or stops CIP 2.0.
ciptool Retrieves CIP information about local and remote nodes in the cluster.
File format
File Format
cip.cf CIP configuration file format
- 370 -
B.3 Operator Intervention
System administrator
Command Function
clreply Responds to an operator intervention request message.
B.4 PAS
System administrator
Command Function
mipcstat MIPC statistics
Command Function
clautoconfig Executes automatic resource registration.
clbackuprdb Saves the resource database.
clinitreset Resets the resource database.
clrestorerdb Restores the resource database.
clsetparam Checks the connections of shared disk units and sets up the operation for automatic
resource registration.
clsetup Sets up the resource database.
clstartrsc Activates a resource (GDS only).
clstoprsc Deactivates a resource (GDS only).
clsyncfile Distributes a file between cluster nodes.
User command
Point
There is also a "clgettree" command in the Web-Based System Administration tool WSA.
Command Function
clgettree Outputs tree information for the resource database.
B.6 RMS
System administrator
Command Function
hvassert Asserts (tests for) an RMS resource state.
hvcm Starts the RMS configuration monitor.
hvconfig Displays or saves the RMS configuration file.
- 371 -
Command Function
hvdisp Displays RMS resource information.
hvdispall Displays RMS resource information on all the nodes.
hvdump Collects debugging information about RMS.
hvlogclean Cleans the RMS log files.
hvshut Shuts down RMS.
hvswitch Switches control of an RMS user application or resource to another host.
hvutil Manipulates the availability of an RMS resource.
File format
File Format
hvenv.local RMS local environment valuables file
Command Function
cldevparam Changes and displays the tunable operation environment for asynchronous monitoring.
clirmcmonctl Displays the status of the iRMC asynchronous monitoring daemon, and starts, stops,
restarts the iRMC asynchronous monitoring daemon.
clirmcsetup Registers, changes, deletes, and displays iRMC/MMB information of iRMC
asynchronous monitoring.
clmmbmonctl Displays the status of the MMB asynchronous monitoring daemon, and starts, stops,
restarts the MMB asynchronous monitoring daemon.
clmmbsetup Registers, changes, deletes, and displays MMB information of MMB asynchronous
monitoring.
clvmgsetup Registers, changes, deletes, and displays host OS information.
sdtool Interface tool for shutdown daemon
rcsd Shutdown daemon for shutdown manager
File format
File Format
rcsd.cfg Configuration file for shutdown daemon
SA_ipmi.cfg Configuration file for IPMI Shutdown Agent
SA_blade.cfg Configuration file for blade Shutdown Agent
Command Function
cldispfaultrsc Outputs a list of the current failed resources
- 372 -
B.9 SIS
System administrator
Command Function
dtcpadmin Starts the SIS administration utility.
dtcpd Starts the SIS daemon for configuring VIPs.
dtcpdbg Displays SIS debugging information.
dtcpstat Displays state information on SIS.
Command Function
fjsvwvbs Starts or stops Web-Based Admin View.
wvCntl Starts, stops, or gets debugging information for Web-Based Admin View.
wvGetparam Displays the Web-Based Admin View environment variables.
wvSetparam Sets the Web-Based Admin View environment variables.
wvstat Displays the operating state of Web-Based Admin View.
Command Function
claddprocrsc Registers an application resource that uses a state transition procedure.
cldelproc Deletes a state transition procedure.
cldelprocrsc Deletes an application resource that uses state transition procedure.
clgetproc Gets a state transition procedure.
clsetproc Registers a state transition procedure.
clsetprocrsc Changes the registered information of an application resource that uses a state transition
procedure.
User command
Command Function
cldspproc Outputs information on the resource that uses the state transition procedure.
Command Function
clrwzconfig Sets up the linking function between the PRIMECLUSTER resource manager and the
middleware products after the RMS configuration definitions are activated.
- 373 -
The RMS Wizard manual will be saved in the following directory when the SMAWRhvdo package is installed.
/usr/doc/packages/SMAWRhv-do/wizards.en
- 374 -
Appendix C Troubleshooting
This appendix explains how to collect troubleshooting information if an error occurs in the PRIMECLUSTER system.
Information
- When reporting a problem, collect the information required for an error investigation. If you do not provide information for problem
checking and error reproduction execution, it may take a long time to reproduce and diagnose the problem or it may become impossible
to do so.
- Collect investigation material promptly from all the nodes of the PRIMECLUSTER system. Necessary information may become lost
if a long time elapses after the error occurs. This applies especially to information collected by fjsnap, FJQSS or pclsnap.
- 375 -
- For pclsnap
/opt/FJSVpclsnap/bin/pclsnap -a output
- The file name which becomes an output destination of system information collected by using the fjsnap or pclsnap command for output
is specified.
- The following messages may be output to a switchlog and /var/log/messages when the fjsnap or pclsnap command is executed while
one or more cluster nodes are stopped. However, no action is required for these messages.
(WRP, 11) Message send failed, queue id <queueid>, process <process>, <name>, to host <node>.
See
For details on the "fjsnap" command, see the "README" file included in the "FJSVsnap" package.
For details on the "pclsnap" command, see the "README" file included in the "FJSVpclsnap" package.
Information
Execution timings for the fjsnap or pclsnap command
- For problems that occur during operation, for example, if an error message is output, execute the "fjsnap" or "pclsnap" command
immediately after the problem occurs.
- If the "fjsnap" or "pclsnap" command cannot be executed because the system hangs, collect a crash dump. Then start the system in single
user mode, and execute the "fjsnap" or "pclsnap" command.
For information on how to collect a crash dump, see "C.1.3 Crash Dump."
- After an error occurs, if a node restarts automatically (the node could not be started in single-user mode) or if the node is mistakenly
started in multi-user mode, execute the "fjsnap" or "pclsnap" command.
- If investigation information cannot be collected because the "fjsnap" or "pclsnap" command results in an error, or the "fjsnap" or
"pclsnap" command does not return, then collect a crash dump.
2. The product selection menu appears. Input the number of the product of which you want to collect the information, then input
"[Enter]".
Select from the following product numbers:
- 376 -
4. After the FJQSS has completed the collection, the name of the output directory of the collected information appears.
Verify that the information have been collected in the directory.
5. The following file is created in the output directory of the collected information. Please send it to field engineers.
resultYYYYMMDDHHMMSS.tar.gz
(YYYYMMDDHHMMSS: time (year, month, day, hour, minute, and second) that the collection started)
See
About FJQSS (Information Collection Tool) and its usage
You can collect the information necessary for the trouble investigation with FJQSS (Information Collection Tool). See the FJQSS User's
Guide bundled to the installation medium of the product.
When you see the FJQSS User's Guide, open the following file in the installation medium of the product by the browser.
documents/fjqss-manual_sollnx/index_en.html
Information
Crash dump directory
A crash dump is stored as a file on the node in which the error occurred.
If your guest OS has been forcefully stopped by the shutdown facility or the guest OS has been panicked in the environment where the
KVM virtual machine function is used, the crash dump will be stored in the following directory for the host OS.
/var/crash/<shutdown time of the guest OS (YYYYMMDDHHMMSS)>.<Domain name for the guest OS>.core
/opt/fujitsu/SVmco/sh/getosvmco <filename>
Example:
/opt/fujitsu/SVmco/sh/getosvmco /tmp/node1_getosvmco
- 377 -
See
For details on the "getosvmco" command, see the following manuals:
Note
To use the history function of the failed resource, the resource database must be set up correctly. Also, the "AutoStartUp" and
"PersistentFault" attributes of userApplication must be set to yes(1).
For information on the resource database settings, see "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration
Guide."
To use the detection function of the failed resources, you must enable an operator intervention request. For information on the use of the
operator intervention request, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
The operator intervention function and the failed resource history function are both dependent on the "clwatchlogd" daemon. This daemon
can be started automatically with the "rc" script in multi-user mode. The "clwatchlogd" daemon uses the "RELIANT_LOG_PATH"
environment variable of RMS. The value of this variable is set when the "rc" script starts up for the first time.
When this value is changed, you need to send the "SIGHUP" signal to clwatchlogd. When clwatchlogd receives this signal, clwatchlogd
acquires the latest value of RELIANT_LOG_PATH. After completing the above processing, start RMS.
This manual is installed in the /etc/opt/FJSVcluster/man directory.
Before executing the "man (1)" command, add this directory to the beginning of MANPATH. Usually, a directory name is added to the line
beginning with "setenv MANPATH" within the ".cshrc" file or the line beginning with "export MANPATH" within the ".profile" file.
- 378 -
Note
If a message frame title says "Cluster resource management facility," see "3.2 CRM View Messages" and "Chapter 4 FJSVcluster Format
Messages" in "PRIMECLUSTER Messages."
Icon Meaning
Notice
Warning
Error
Other
Procedure
1. Click on the OK button to respond to the message.
2. Click the up arrow mark or down arrow mark to go to the previous or next message. Then, a message appears to remind you that you
have not yet entered a response or confirmed the displayed message.
If you subsequently enter a response, the message is cleared and the next message appears. If the next message does not appear and the
message prior to that for which a response was entered is still available, the previous message will appear. If there is any message for which
confirmation or a response has not yet been entered, the message screen closes. For information on the message contents, refer to "3.2 CRM
View Messages" in "PRIMECLUSTER Messages" and for information on how to display previous messages, refer to "C.2.2 Resource Fault
History."
Note
If you close Web-Based Admin View or Cluster Admin after this message is displayed, a fault resource message with the same contents will
not be displayed. Therefore, you are recommended to confirm the message contents if a fault resource message is displayed for the first time.
After you have closed the message, refer to the fault history on the "Resource Fault History" screen. For information on the message display
language, refer to "4.3.3.3 Setting the Web-Based Admin View Language."
If the Cluster Admin screen is not displayed on the client PC when the fault resource message is displayed, the message is transmitted only
to the client to which the management server was first connected.
Each management server administers its fault resource messages. If you change the management server after confirming the message, the
same message will be displayed again. To delete these messages, select Cluster Admin by using the GUI of Web-Based Admin View after
closing Cluster Admin, and then open Cluster Admin again.
Procedure
1. Open the "Web-Based Admin View" screen and then select Global Cluster Services.
- 379 -
2. Choose Resource Fault History.
Note
The "Resource Fault History" cannot be displayed automatically. To display the latest history information, select View -> Update
menu.
- 380 -
Menu of the fault resource list screen
The "Resource Fault History" screen contains the following menu items:
Menu Function
View -> Update latest information The duration is initialized to the present time and date. A maximum of 100
of the latest history resources are displayed.
View -> Fault Resource List A list of resources in which failures are present is displayed (see "C.2.3
Fault Resource List").
View -> Exit The "Resource Fault History" screen is cleared.
Help -> Help The GUI help screen is displayed.
- Event time - The time at which the RMS detected a resource failure is displayed.
- State - One of the following statuses is indicated.
- Responded - The operator has already responded the message.
- Not responded - The operator has not responded to the message for which a response is required.
- Responding - The operator is currently responding to the message.
- Confirm - Notification message for which no response is required.
- Message - The message is displayed.
- Selection information - Operator intervention message information from the client that is connected to the management server is
displayed. If the message is canceled or if a response to the message is entered by executing the "clreply" command, nothing will be
displayed.
- Execution result - The result and time of the response processing are displayed.
Information field
The information related to error detection during the acquisition or read-in of the history files is displayed. The following items will be
displayed:
- 381 -
C.2.3 Fault Resource List
If you select View -> Fault Resource List on the "Resource Fault History" screen, the fault resource list is displayed as follows:
/
/var/opt/FJSVcluster/cores/FJSVcldev/devirmcd
/var/opt/FJSVcluster/cores/FJSVcldev/devirmcmonitord
/var/opt/FJSVcluster/cores/FJSVcldev/devmmbd
/var/opt/FJSVcluster/cores/FJSVcldev/devmmbmond
/var/opt/FJSVcluster/cores/FJSVcldev/devmmbmonitord
/var/opt/FJSVcluster/cores/dcmevmd
/var/opt/FJSVwvbs/logs/node
/var/opt/FJSVwvbs/logs/server
/var/opt/FJSVwvcnf
/var/opt/SMAWsf/log
/opt/SMAW/SMAWRrms
- 382 -
Current directory (command)
- 383 -
hvdet_prmd
hvdet_execproc
[After change]
Information
When Primesoft Server for a server is installed, the log volume increased per day is as follows:
Calculation formula for increased log volume per day
(number of nodes x 4) + (number of registered resources x 6) + ((number of Cmdline resources + 2) x 16) + (number of Fsystem resources
x 35) + ((number of Primesoft Server resources + number of application resources) x 6) + 540 = log volume increased per day (MB)
- 384 -
Note
- Increased log volume varies depending on the system operation state. It is an approximated value.
For the actual increased system volume, check the increased movement of log volume under RELIANT_LOG_PATH.
- If RMS is run for one or more days with changing log level, configure the cron job settings to execute the hvlogclean command in order
to avoid shortage of disk space caused by RMS log files. For details, see "C.3.4 Rotation and Deletion of RMS Log Files."
- RELIANT_LOG_LIFE
- HV_LOG_ACTION_THRESHOLD
- HV_LOG_WARN_THRESHOLD
- HV_LOG_ACTION
For the value of this environment variable, you can change it corresponding to the system requirement. For the meaning of each RMS
environment variable, see "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration
Guide."
Note
1. RMS log files are deleted by the setting of the RELIANT_LOG_LIFE. This process is executed by hvlogcron, which is activated by
a cron.
For notes and contents of hvlogcron, see "7.7 cron Processing."
2. When deleting RMS log files with RELIANT_LOG_LIFE setting, the log files that RMS is outputting are not deleted. In the
operation that RMS is operated one day or more continuously and also in the operation to dispatch old log information, which had
been created before the RELIANT_LOG_LIFE was created, from RMS log files and delete them, set the hvlogclean command to be
executed once a day to the cron configuration.
- 385 -
Appendix D Registering, Changing, and Deleting State
Transition Procedure Resources for
PRIMECLUSTER Compatibility
To use a procedure resource in a cluster application, you must register the procedure resource before setting up the cluster application.
This appendix explains how to register, change, and delete procedure resources.
Operation Procedure:
1. Log in with the system administrator authority to the node in which the procedure resource is to be registered.
2. Execute the "clsetproc" command to register the state transition procedure.
See
For details on the "clsetproc" command, see the manual page.
Example
To register the "/tmp/program" state transition procedure as program (file name) to the BasicApplication class
Point
To overwrite a state transition procedure that is already registered, specify the -o option.
See
For details on the "claddprocrsc" command, see the manual page for claddprocrsc .
Example
When registering a procedure resource, this procedure resource has to meet the following conditions:
- 386 -
# /etc/opt/FJSVcluster/bin/claddprocrsc -k SDISK -m program -c BasicApplication -s NODE1 -K AFTER
-S BEFORE
Operation Procedure:
1. Log in with the system administrator authority to the node in which the state transition procedure is to be changed.
2. Execute the "clgetproc" command to retrieve the state transition procedure.
See
For details on the "clgetproc" command, see the manual page.
Example
When retrieving a state transition procedure, this procedure resource has to meet the following conditions:
Example
To register the "/tmp/program" state transition procedure as program (file name) to the BasicApplication class
Note
To change the startup priority of a state transition procedure, you need to delete a procedure resource with the procedure for changing a
cluster application configuration and create a procedure resource again.
- 387 -
For more details, see "Chapter 10 Configuration change of Cluster Applications."
Operation Procedure:
1. Log in with the system administrator authority to the node in which the startup priority of state transition procedure is to be changed.
2. Delete the procedure resource of the cluster application.
For deleting the procedure resource of the cluster application, refer to "10.5 Deleting a Resource."
3. Execute the "clsetprocrsc(1M)" command to change the startup priority of the state transition procedure used by the procedure
resource.
After performing this step on all the nodes where the procedure resource is registered, go to the next step.
See
For details on the "clsetprocrsc(1M)" command, see the manual page.
Example
When changing the startup priority of the state transition procedure to 10000, this procedure resource has to meet the following
conditions:
- The resource class registered in the node (NODE1) is the BasicApplication class.
- The resource name is SDISK.
Note
To change the registration information of the procedure resource, you need to delete the procedure resource with the procedure for changing
the cluster application configuration and create the procedure resource again.
For more details, see "Chapter 10 Configuration change of Cluster Applications."
Operation Procedure:
1. Log in with the system administrator authority to the node in which the registration information of procedure resource is to be
changed.
3. Execute the "clsetprocrsc(1M)" command to change the registration information of the procedure resource.
After performing this step on all the nodes where the procedure resource is registered, go to the next step.
- 388 -
See
For details on the "clsetprocrsc(1M)" command, see the manual page.
Example
When the procedure resource with the following conditions receives a state transition request of START RUN BEFORE in addition
to START RUN AFTER and STOP RUN BEFORE;
Operation Procedure:
1. Log in with the system administrator authority to the node from which the procedure resource is to be deleted.
2. Execute the "cldelprocrsc" command to delete the procedure resource.
See
For details on the "cldelprocrsc" command, see the manual page.
Example
When deleting a procedure resource, the procedure resource needs to meet the following conditions:
See
For details on the "cldelproc" command, see the manual page.
- 389 -
Example
When deleting a procedure resource, the procedure resource needs to meet the following conditions:
- 390 -
Appendix E Configuration Update Service for SA
This appendix explains Configuration Update Service for SA.
- 391 -
- 392 -
- 393 -
Note
- Use the same user name and password for BMC or iRMC on every node.
- If the PersistentFault attribute of RMS is set to "1," the Fault information is kept even if RMS is started on a normal spare node. (The
default value of the PersistentFault attribute is "0.")
- When you update the configuration file for the shutdown agent, the updated configuration file is distributed to nodes in which the
communication is available. The file is not distributed to nodes in which operation is stopped or the network communication is not
available.
In addition, when you start multiple nodes simultaneously, the configuration file for the shutdown agent is updated and distributed on
multiple nodes at the same time. In this case, inconsistencies may occur in the information of the configuration file for the shutdown
agent stored in each node.
To check that correct information is distributed to all the nodes, execute the following command on any node when all the nodes are
activated.
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -s
When the information that is output by the command is different between nodes, restore the service according to the procedure in "E.
6 Restoration."
- Server model
Models using the IPMI shutdown agent (SA_ipmi)
See
For details on models using the IPMI shutdown agent, see "5.1.2 Setting up the Shutdown Facility."
Note
When using Configuration Update Service for SA, available IP address for BMC or iRMC is only IPv4 address.
- Operating system
The following operating systems are supported:
Note
This service is not available in a virtual machine environment.
- Required package
- Red Hat Enterprise Linux 6, Red Hat Enterprise Linux 7
- OpenIPMI
- ipmitool
- 394 -
Check that the packages described above are installed by executing the rpm command. Install packages if they are not installed.
Packages are included in the installation media for the operating system.
E.3 Configuration
This section describes how to set up this service.
Note
Copy the value when the run level is set to "on." The above example shows that the run level is 3. This value is required for canceling
this service and restoring the environment.
Execute the following command on all the nodes to read the IPMI service on startup.
# /sbin/chkconfig ipmi on
Information
You can set "on" to run levels only that you want to activate this service. In this case, specify run levels in the range from 2 to 5.
- 395 -
Loaded: loaded (/usr/lib/systemd/system/ipmi.service; disabled)
Active: inactive (dead)
See
For details on the sfsacfgupdate command, see "E.7 sfsacfgupdate."
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -e
Information
In the RHEL6 environment
If you set "on" to run levels only that you want to activate this service in Step 2 in "E.3.1 Startup Configuration for the IPMI Service," specify
the values of the run levels.
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -e 35
When run levels are omitted, this service is activated in run levels 2 to 5.
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -c
Configuration file exists. [ OK ]
ipmitool command exists. [ OK ]
ipmi service has been started. [ OK ]
ipmi service's run level :
0:off 1:off 2:on 3:on 4:on 5:on 6:off
- 396 -
Configuration Update Service's run level :
0:off 1:off 2:on 3:on 4:on 5:on 6:off
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -c
Configuration file exists. [ OK ]
ipmitool command exists. [ OK ]
ipmi service has been started. [ OK ]
ipmi service state. [ enabled ]
Configuration Update Service state. [ enabled ]
- 397 -
sfsacfgupdate: ERROR: "sfsacfgupdate -e " is not executed.
Note
In the RHEL6 environment
Check that run levels, which are "on" in "Configuration Update Service's run level" are also "on" in "ipmi service's run level."
If the status of each run level is not identical, any setting may be incorrect. Review the processes of Step 2 in "In the RHEL6 environment"
and "E.3.2.1 Startup Configuration for Update Service for SA."
E.3.2.3 Checking the BMC or iRMC IP Address and the Configuration Information of the
Shutdown Agent
To check the BMC or iRMC IP address and the configuration information of the shutdown agent, execute the sfsacfgupdate command on
any node.
Check that the following information is consistent with the displayed contents.
nodeA: 10.20.30.41
nodeB: 10.20.30.42
nodeC: 10.20.30.43
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -s
Node : nodeA
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
10.20.30.41
Configuration file :
nodeA 10.20.30.41
nodeB 10.20.30.42
nodeC 10.20.30.43
Node : nodeB
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
10.20.30.42
Configuration file :
nodeA 10.20.30.41
nodeB 10.20.30.42
nodeC 10.20.30.43
Node : nodeC
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
10.20.30.43
Configuration file :
nodeA 10.20.30.41
nodeB 10.20.30.42
nodeC 10.20.30.43
- 398 -
Node :
The node name is displayed.
Node status :
The startup status of the node is displayed.
When the node is running, the status is "UP." For other than "UP," the subsequent information is not displayed.
Configuration Update Service status :
The setup status of Configuration Update Service for SA is displayed.
If no problem is found in "E.3.2.2 Checking the Configuration," the status is "ENABLE." For other than "ENABLE," the subsequent
information is not displayed.
BMC IP Address :
The current BMC or iRMC IP address is displayed.
Configuration file :
The BMC or iRMC IP address of each node stored in the current configuration file for the shutdown agent is displayed.
# cp -p /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg.bk
# vi /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg
~~~
nodeA 10.20.30.41:user:pass cycle
The new address is as follows:
nodeA 255.255.255.255:user:pass cycle <- Change to an unused IP address
Note
When you change the IP address, the following message may be displayed on syslog. As a result of execution of sdtool -s, the state
of SA_ipmi may be "TestFailed," however, there is no problem.
# shutdown -r now
- 399 -
4. Checking the configuration file for the shutdown agent
Check that the BMC or iRMC IP address of nodeA is updated in the configuration file for the shutdown agent in nodeA.
# vi /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg
~~~
nodeA 10.20.30.41:user:pass cycle
# rm -f /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg.bk
E.5 Cancellation
The following describes how to cancel this service.
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -d
E.6 Restoration
This section describes restoration methods if correct information is not distributed to all the nodes when this service operates.
sfsacfgupdate: ERROR: Failed to change the access permission of <file> on node <node>.
- 400 -
sfsacfgupdate: ERROR: Failed to change the owner of <file> on node <node>.
If any of the above messages are output, the process for <node> has failed.
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -s
Node : nodeA
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
10.20.30.51 <- Changed from 10.20.30.41
Configuration file :
nodeA 10.20.30.51 <- Updated with the changed information on nodeA
nodeB 10.20.30.42
nodeC 10.20.30.43
Node : nodeB
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
10.20.30.42
Configuration file :
nodeA 10.20.30.41 <- Not updated with the changed information on nodeB
nodeB 10.20.30.42
nodeC 10.20.30.43
Node : nodeC
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
11.22.33.46
Configuration file :
nodeA 10.20.30.51 <- Updated with the changed information on nodeC
nodeB 10.20.30.42
nodeC 10.20.30.43
In the above example, you can see the BMC IP address of nodeA is not updated with the changed information in the configuration
file for the shutdown agent stored in nodeB.
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -r
- 401 -
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -s
Node : nodeA
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
10.20.30.51
Configuration file :
nodeA 10.20.30.51
nodeB 10.20.30.42
nodeC 10.20.30.43
Node : nodeB
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
10.20.30.42
Configuration file :
nodeA 10.20.30.51 <- Updated with the changed information on nodeB
nodeB 10.20.30.42
nodeC 10.20.30.43
Node : nodeC
Node status : UP
Configuration Update Service status : ENABLE
BMC IP Address :
11.22.33.46
Configuration file :
nodeA 10.20.30.51
nodeB 10.20.30.42
nodeC 10.20.30.43
E.7 sfsacfgupdate
The following describes how to use the sfsacfgupdate command.
Name
sfsacfgupdate(8) - Management of Configuration Update Service for SA
Synopsis
/opt/SMAW/SMAWsf/bin/sfsacfgupdate {-e [<levels>]|-d|-c|-s|-r}
Feature description
This command manages Configuration Update Service for SA.
When Configuration Update Service for SA is activated, the configuration information of the shutdown agent is automatically updated on
operating system startup. Execute this command with the system administrator authority.
Options
-e
Activates Configuration Update Service for SA.
Specify the value of run levels 2 to 5 which you want to activate for <levels>. You can specify several run levels.
For example, when you specify "-e 35," run levels 3 and 5 will be activated.
When you omit the value, all run levels from 2 to 5 will be activated.
-d
Deactivates Configuration Update Service for SA.
- 402 -
-c
Checks the setup status of Configuration Update Service for SA.
-s
Displays the configuration information of the shutdown agent stored in all the nodes.
-r
Restores the configuration information of the shutdown agent.
Example
# /opt/SMAW/SMAWsf/bin/sfsacfgupdate -c [Return]
Configuration file exists. [ OK ]
ipmitool command exists. [ OK ]
ipmi service has been started. [ OK ]
ipmi service's run level :
0:off 1:off 2:on 3:on 4:on 5:on 6:off
Configuration Update Service's run level :
0:off 1:off 2:on 3:on 4:on 5:on 6:off
#
Exit status
0 : Normal exit
Other than 0 : Abnormal exit
Corrective action:
Copy this message, and then contact field engineers.
Corrective action:
Create <file>.
Corrective action:
Install the ipmitool command.
- 403 -
Content:
The ipmi service does not start.
Corrective action:
Start the ipmi service.
Corrective action:
Check the contents in <file> and enter the correct information.
Corrective action:
Review the contents of the configuration file for the shutdown agent, and check if the correct information is entered.
Corrective action:
Check that the communication with <node> is available. After restoring the state of <node>, execute this command with the -r option
and restore the configuration information of the shutdown agent.
Corrective action:
Copy this message, and then contact field engineers.
Corrective action:
Check that the communication with <node> is available. After restoring the state of <node>, execute this command with the -r option
and restore the configuration information of the shutdown agent.
sfsacfgupdate: ERROR: Failed to change the access permission of <file> on node <node>.
Content:
Changing the mode of <file> failed on <node>.
Corrective action:
Check that the communication with <node> is available. After restoring the state of <node>, execute this command with the -r option
and restore the configuration information of the shutdown agent.
- 404 -
sfsacfgupdate: ERROR: Failed to change the group of <file> on node <node>.
Content:
Changing the group of <file> failed on <node>.
Corrective action:
Check that the communication with <node> is available. After restoring the state of <node>, execute this command with the -r option
and restore the configuration information of the shutdown agent.
Corrective action:
Check that the communication with <node> is available. After restoring the state of <node>, execute this command with the -r option
and restore the configuration information of the shutdown agent.
- 405 -
Appendix F Setting up Cmdline Resource to Control Guest
OS from Cluster Application of Host OS in
KVM Environment
This appendix explains how to set up the Cmdline resource to control the guest OS from the cluster application of host OS in KVM
environment.
<Stop script>
<Check script>
- NULLDETECTOR
Disabled (to enable Check script)
- STANDBYCAPABLE
Disabled (Standby is disabled)
- ALLEXITCODES
Disabled (Standby is disabled)
- TIMEOUT
The default value is 300 seconds. Set the timeout duration to be longer than the time until the boot/shutdown sequence of the guest OS
completes.
Information
Execute virsh command as below to check the domain name of the guest OS.
(Example) The domain name of the guest OS is domain 1
- 406 -
# virsh list --all
Id Name Status
----------------------------------
0 Domain-0 Active
- domain1 Shutoff
- 407 -
Appendix G Using the Migration Function in KVM
Environment
This appendix describes design, prerequisites and operations when using the Migration function in a KVM environment.
G.1 Design
Following three types of the Migration function can be used for a cluster system in a KVM environment:
- Live Migration
Transferring an active guest OS.
- Offline Migration
Transferring a suspended guest OS.
- Migration by Export/Import
Exporting/Importing the XML setup files of stopped guest OSes.
For the cluster configurations which are available for the KVM migration function, see "2.2.1 Virtual Machine Function."
Note
In the migrated guest OS, virtio block storages are added under the device name "vdpcl". Note the following points to add virtio block
storages for migration.
- Keep the number of virtio block storages in guest OSes within 27 devices except the device (vdpcl) to be added for migration.
- Do not use "vdpcl" for the device name of virtio block storages in guest OSes.
G.2 Prerequisites
This section describes the prerequisites for the migration function in a KVM environment.
hostip
Specify the IP address of the host OS.
Available IP address formats are IPv4 and IPv6.
IPv6 link local addresses are not available.
hostname
Specify the host name of the host OS.
- 408 -
2. Distributing the host OS information file (host OS)
Forward the host OS information file created in Step 1 to each host OS and change the file name to "sfkvmmigrate.img." Then, place
it to "/var/opt/SMAWsf".
# mkdir -p /var/opt/SMAWsf
# cp sfkvmmigrate.img.hostname /var/opt/SMAWsf/sfkvmmigrate.img
1. Stopping of guest OS
Execute the following command on the guest OS to stop the guest OS.
# /sbin/shutdown -P now
[RHEL7]
domain
Specify the domain name of the guest OS.
3. Startup of guest OS
Start the guest OS.
- 409 -
G.2.2 Using the Host OS failover function
Perform the following procedure on guest OSes in which the Migration is performed and all host OSes.
You need to perform this procedure only once and not for each Migration.
2. Set the sudo command so that the created user can execute the command as a root user.
Execute the visudo command by using the sudo command. Describe the following setting in the displayed setting file.
# sfcipher -c
Enter User's Password:
Re-enter User's Password:
D0860AB04E1B8FA3
4. Create /etc/opt/FJSVcluster/etc/kvmguests.conf.
Create /etc/opt/FJSVcluster/etc/kvmguests.conf with the following contents.
Create the kvmguests.conf file as a root user. Set the permission as 600.
- 410 -
- The kvmguests.conf file must be the same on all cluster nodes.
guest-name :Specify the domain name of the guest OS to be migrated.
host-cfname :Specify the CF node name of the host OS in which "guest-name"
is running.
If you execute "cftool -l" on the host OS in which "guest-name"
is running, you can confirm the CF node name of the node.
guest-clustername :Specify the cluster name of the guest OS.
If you execute "cftool -c" on the guest OS, you can confirm
the cluster name of the node.
guest-cfname :Specify the CF node name of the guest OS.
If you execute "cftool -l" on the guest OS, you can confirm
the CF node name of the node.
guest_IP :Specify the IP address of the guest OS.
Available IP address formats are IPv4 and IPv6 addresses.
IPv6 link local addresses are not available.
guest_user :Specify the user name for logging in to the guest OS.
Specify the user created as a root user or the created in step 2.
guest_passwd :Specify the user password for logging in to the guest OS.
Specify the password encrypted in step 3.
Example: In a two-node configuration between guest OSes, two cluster systems are configured
# /opt/SMAW/SMAWsf/bin/sfkvmtool -c
NOTICE: The check of configuration file succeeded.
# sdtool -s
If the shutdown facility has already been started, execute the following on all the nodes to restart it.
# sdtool -e
# sdtool -b
If the shutdown facility has not been started, execute the following on all the nodes to start it.
# sdtool -b
- 411 -
2. Registration of host OS information (host OS)
Execute the following command on the all host OSes to register the host OS information.
hostip
Specify the IP address of the host OS on which this command was executed.
Available IP address formats are IPv4 and IPv6.
IPv6 link local addresses are not available.
-w off
Specify this option if the weights of the guest OS shutdown facility and that of the host OS shutdown facility should not be linked
when migrating the guest OS.
Without this option, linkage of the weights of the guest OS shutdown facility and the host OS shutdown facility is enabled when
migrating the guest OS.
This option must be the same on all host OSes.
1. Stopping of guest OS
Execute the following command on the guest OS to stop the guest OS.
# /sbin/shutdown -P now
# /opt/SMAW/SMAWsf/bin/sfkvmmigratesetup -s domain
domain
Specify the domain name of the guest OS.
3. Startup of guest OS
Start the guest OS.
G.3 Operation
This appendix describes operation with the migration function in a KVM environment.
- 412 -
G.3.1 When performing Live Migration
# /opt/SMAW/SMAWsf/bin/sfkvmmigrate -p source-domain -g
source-domain
Domain name of guest OS to be migrated
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
# /opt/SMAW/SMAWsf/bin/sfkvmmigrate -u source-domain -g
source-domain
Domain name of migrated guest OS
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
# sdtool -s
Note
If TestFailed or InitFailed is displayed, there is a possibility that the settings of the shutdown facility were not changed.
Perform the procedure from Step 1 again.
- 413 -
G.3.1.2 When using the Host OS failover function
# /opt/SMAW/SMAWsf/bin/sfkvmmigrate -p source-domain
source-domain
Domain name of guest OS that is to be migrated
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
# /opt/SMAW/SMAWsf/bin/sfkvmmigrate -u source-domain
source-domain
Domain name of migrated guest OS
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
# sdtool -s
Note
If TestFailed or InitFailed is displayed, there is a possibility that the settings of the shutdown facility were not changed.
Perform the procedure from Step 1 again.
- 414 -
G.3.2 When performing Offline Migration
source-domain
Domain name of guest OS to be migrated
CFtimeout
Timeout of CF cluster interconnect (seconds)
For the value of CFtimeout, specify (real time of Offline Migration + 300 seconds of tolerance time for processing delay).
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
# /opt/SMAW/SMAWsf/bin/sfkvmmigrate -u source-domain -g
source-domain
Domain name of migrated guest OS
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
- Timeout of CF cluster interconnect (value specified before Offline Migration [seconds] to 10 seconds)
- Settings of the shutdown facility (IP address of host OS, CF node name of host OS, weight of SF)
- Settings of the Host OS failover function (CF node name of host OS)
- Startup of the shutdown facility
2. Checking the status of the shutdown facility (guest OS)
Execute the following command on all the nodes to check that the cluster settings are correct after Offline Migration.
# sdtool -s
Note
If TestFailed or InitFailed is displayed, there is a possibility that the settings of the shutdown facility were not changed.
- 415 -
Perform the procedure from Step 1 again.
source-domain
Domain name of guest OS to be migrated
CFtimeout
Timeout of CF cluster interconnect (seconds)
For the value of CFtimeout, specify (real time of Offline Migration + 300 seconds of tolerance time for processing delay).
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
# /opt/SMAW/SMAWsf/bin/sfkvmmigrate -u source-domain
source-domain
Domain name of migrated guest OS
After executing this command, the cluster settings on all the nodes of clusters among the guests specified by source-domain will be
changed as follows:
- Timeout of CF cluster interconnect (value specified before Offline Migration [seconds] to 10 seconds)
- Settings of the shutdown facility (IP address of host OS, CF node name of host OS, weight of SF)
- Settings of the Host OS failover function (CF node name of host OS)
- Startup of the shutdown facility
2. Checking the status of the shutdown facility (guest OS)
Execute the following command on all the nodes to check that the cluster settings are correct after Offline Migration.
# sdtool -s
Note
If TestFailed or InitFailed is displayed, there is a possibility that the settings of the shutdown facility were not changed.
- 416 -
Perform the procedure from Step 1 again.
# sdtool -s
Note
If TestFailed or InitFailed is displayed, there is a possibility that the settings of the shutdown facility were not changed.
Perform the procedure in "G.3.1.2.2 Operations after Live Migration."
# sdtool -s
Note
If TestFailed or InitFailed is displayed, there is a possibility that the settings of the shutdown facility were not changed.
Perform the procedure in "G.3.1.2.2 Operations after Live Migration."
- 417 -
1. Setting up the guest OS (host OS/guest OS)
Take the following steps on the guest OS when the migration for this OS is no longer necessary.
You can perform this procedure on multiple guest OSes at the same time, or on each guest OS one after another.
1. Stopping of guest OS
Execute the following command on the guest OS to stop the guest OS.
# /sbin/shutdown -P now
domain
Specify the domain name of the guest OS.
3. Startup of guest OS
Start the guest OS.
# rm /var/opt/SMAWsf/sfkvmmigrate.img
1. Stopping of guest OS
Execute the following command on the guest OS to stop the guest OS.
# /sbin/shutdown -P now
# /opt/SMAW/SMAWsf/bin/sfkvmmigratesetup -r domain
domain
Specify the domain name of the guest OS.
3. Startup of guest OS
Start the guest OS.
# /opt/SMAW/SMAWsf/bin/sfkvmmigratesetup -d
- 418 -
Appendix H Using PRIMECLUSTER in a VMware
Environment
This appendix explains how to use PRIMECLUSTER in a VMware environment.
See
For details on VMware, see the documentation for VMware.
Note
Supported configuration
- The following functions are not available in a virtual machine in which PRIMECLUSTER is to be installed.
- Migration with VMware vCenter Converter
- Snapshot of VMware
- Backup by Data Protection
- Following hot swap operations cannot be performed for the virtual machine hardware.
- Increasing disk size
- Increasing memory
- Increasing CPU
- Using snapshot
- Over committing of memory that causes virtual swap or memory ballooning
- VMware vCenter Server is disabled, or the guest OS cannot communicate with VMware vCenter Server or cannot operate VMware
vCenter Server.
- 419 -
- Upgrading from the VMware environment of PRIMECLUSTER 4.3A40 or earlier in which the I/O fencing function is used.
Note
- Note the following points when using the forcible stop with the I/O fencing function:
- The guest OS on which the cluster application is started panics regardless the survival priority if the cluster partition occurs due
to failure of the cluster interconnect.
- If the operation node panics when the operation is failed over, the status of cluster application may become Online temporarily
on both operation and standby guest OSes. However, as access to the shared disk from both guest OSes at the same time is
prevented, there is no impact on the operation.
- The cluster application cannot be switched by the forcible stop with the VMware vCenter Server functional cooperation when
an error occurs in ESXi or in the server, and the cluster node becomes the status of LEFTCLUSTER at this time. By using
VMware vSphere HA, the cluster application can be switched when an error occurs in ESXi or in the server.
Figure H.1 Cluster Systems in a VMware Environment (VMware vCenter Server functional cooperation)
If the VMware vCenter Server functional cooperation is used with VMware vSphere HA, an operation can be failed over even in the
case of ESXi failure or server failure.
- 420 -
Figure H.2 Cluster Systems in a VMware Environment (VMware vCenter Server functional cooperation +
VMware vSphere HA + vSAN)
Note
A forcible stop with the I/O fencing function is disabled in the following environments:
Information
In the cluster configuration where the I/O fencing function is used, by setting the SA_icmp shutdown agent, response from the guest
OSes is checked on the network paths (administrative LAN/interconnect). The application will be switched when no response is
confirmed from the guest OSes. In this case, if the failed guest OS does not stop completely (when the OS is hanging, for example),
both guest OSes may access the shared disk at the same time. By using SCSI-3 Persistent Reservation, the I/O fencing function
- 421 -
prevents both guest OSes from accessing the shared disk at the same time. (To prevent the access from both guest OSes in the
configuration where the VMware vCenter Server function is used, stop the failed guest OS completely before switching the guest
OS.)
In the cluster system in which the takeover IP address is registered, the route information of communication device is updated when
switching the application. Therefore, the switching destination node is accessible even when the takeover IP address is activated on
multiple guest OSes. However, if the failed guest OS remains running without completely shut down, the route information of
communication device may return to the switching source node. By advertising the route information from the switching destination
node to the communication device in a 60-second cycle, the time to accidentally access the switching source node can be reduced.
The comparison table below shows the forcible stop with VMware vCenter Server functional cooperation and the forcible stop with the
I/O fencing function.
- 422 -
Item Function to stop a virtual machine forcibly
VMware vCenter Server functional I/O fencing function
cooperation (recommended)
Cluster application Unlimited Allowed only one of the following
configuration configurations:
- Only one cluster application
- Among multiple cluster applications,
only one of them contains a shared disk.
Settings of survival Not allowed
priority
Allowed (regardless of the survival priority, a
guest OS on which cluster applications
are started panics)
Shared disk Optional
Following disks are available:
- Virtual disk created on the data
store that can be accessed from each
Required
ESXi host
(Shared RDM (Raw Device Mapping)
- RDM (Raw Device Mapping) disk)
disk supporting SCSI-3 Persistent
- VMware vSAN disk
Reservation)
Note: When the disk is shared
The following disks are not allowed:
between the cluster nodes, for all of
- A virtual disk created on the datastore
the virtual disk, RDM disk, and
accessible from each ESXi host
VMware vSAN, the number of
- VMware vSAN disk
shared ESXi hosts must be within 8.
If the number of shared ESXi hosts is
within 8, up to 16 cluster nodes can
share the disk.
Path policy for the
Native Only either of "Most Recently Used" or
All supported
Multipathing "Round Robin" is supported.
(NMP)
VMware vSphere
Allowed Not allowed
HA
PRIMECLUSTER
Wizard for SAP Allowed Not allowed
HANA
Other unsupported - VMware vSphere FT
configurations and - VMware vSphere DRS
- VMware vSphere FT
functions - VMware vSphere DPM
- VMware vSphere DRS
- Snapshot function
- VMware vSphere DPM
- Backup by Data Protection
- Snapshot function
- Suspending the virtual machine
- Backup by Data Protection
- FCoE connection for storages
- Suspending the virtual machine
- VMware vSphere vMotion
- VMware vSphere Storage vMotion
Operation - Only the cluster interconnect is
when an error specified for SA_icmp:
occurs An operating node or a standby node An old operating node may panic due to
Error in cluster is forcibly stopped, and an operation the I/O fencing function even when the
interconnect is failed over or the standby node is cluster application is switched.
cut off.
- The cluster interconnect and any other
networks are specified for SA_icmp:
- 423 -
Item Function to stop a virtual machine forcibly
VMware vCenter Server functional I/O fencing function
cooperation (recommended)
The cluster application is not switched
and the cluster node becomes the status
of LEFTCLUSTER.
Error in operating An operating node is forcibly
An operating node panics, and an
guest OS or in stopped, and an operation is failed
operation is failed over.
virtual machine over.
Error in standby
A standby node is forcibly stopped A standby node is cut off (the standby
guest OS or in
and then cut off. node does not panic). *
virtual machine
- If VMware vSphere HA is allowed:
An operation is failed over or the
standby node is cut off.
The cluster application is switched (the
- If VMware vSphere HA is not
Failure in ESXi or operating node panics) or the standby
allowed:
in server node is cut off (the standby node does not
An operation is not failed over on a
panic). *
single PRIMECLUSTER. A node on
the error ESXi becomes
LEFTCLUSTER.
Failure in VMware A virtual machine cannot be forcibly
-
vCenter Server stopped
Failure in network
between a virtual
A virtual machine cannot be forcibly
machine and -
stopped
VMware vCenter
Server
Not allowed
Dump collection (Forcible stop by power-off is only
when an error allowed. In this case, a cause of the Allowed
occurs error of the cluster node may not be
determined.)
Restrictions in When using Cold None If the migration is performed to operate
maintenance Migration two nodes that configure the cluster on a
single ESXi host, an operation cannot be
failed over when an error occurs either in
a guest OS, a virtual machine, and the
cluster interconnect.
* If the I/O fencing function is used, the standby node is cut off when it temporarily does not work. The standby node works as follows
after it can work again.
When specifying only the cluster interconnect to SA_icmp:
The cluster application is switched to the standby node that became to work. The old operation node may panic by the I/O fencing
function.
When specifying the cluster interconnect and other networks to SA_icmp:
The cluster application cannot be switched and the cluster node becomes the status of LEFTCLUSTER. Restart OS of the standby
node.
- 424 -
Note
- Make sure to set either one of VMware vCenter Server functional cooperation or the I/O fencing function. A configuration with both
functions or a configuration with neither of them is not allowed.
H.2 Installation
This section describes procedures for installing PRIMECLUSTER between guest OSes on multiple ESXi hosts in a VMware environment.
Note
I/O fencing function
- The I/O fencing function must be set up at the earlier stage of configuring the cluster application.
- The I/O fencing function uses the LUN on the shared disk unit registered to GDS disk class, or uses the LUN which contains the file
system managed by the Fsystem resource. When using the I/O fencing function, register the GDS resource of the disk class containing
the LUN or the disk, or register the Fsystem resource to the cluster application.
- The I/O fencing function cannot be used in the environment where the Gds resources and Fsystem resources are respectively registered
in the multiple cluster applications.
- In the cluster application where a disk is not managed by the Fsystem resource or GDS, do not set the I/O fencing function.
- Set the path policy for the Native Multipathing (NMP) as "Most Recently Used" or "Round Robin". No other settings are supported.
Fsystem resource
- When using the file system that is created on the shared disk as Fsystem resources, you need to register all the file systems that are
created on the same disk (LUN) or on the same disk class to the same userApplication. Due to the restriction of the I/O fencing function,
you cannot create multiple file systems on one disk (LUN) or on one disk class and register each file system to the different
userApplications to monitor and control them.
- In /etc/fstab.pcl file, add either of the following description formats to specify the devices of the file systems controlled by Fsystem
resources.
- 425 -
H.2.1.1 Installation and Configuration of Related Software
After installing the software related to PRIMECLUSTER, you need to take it into operation and make various settings for the OS and the
hardware.
Perform the following steps as necessary.
- For types of SCSI controllers, set to "LSI Logic Parallel" or "VMware Paravirtual".
- Set to "None" for sharing of the SCSI bus.
- Setting up shared disks (when using the I/O fencing function)
- Add a shared disk to be taken over in the cluster system to the virtual machines as Raw Device Mapping (RDM). Also create
a data store to be shared among multiple ESXi hosts. This data store must be different from the shared disk to be taken over
in the cluster system. On the data store, deploy the mapping file (.vmdk) of the shared disk.
- To add a shared disk to the first virtual machine, select "Raw Device Mapping".
- To add a shared disk to the second virtual machine, select "Use an existing virtual disk" and specify the mapping file of the
shared disk added to the first virtual machine.
- Set the controller number and the disk number of virtual device nodes to be consistent among all the nodes that configure the
cluster system.
- For types of SCSI controllers, set the same type as the system disk on a guest OS.
- For sharing SCSI buses, set to "Physical."
- For all the ESXi hosts on which PRIMECLUSTER runs, it is necessary to mark the disk device of Raw Device Mapping used
for the shared disk of PRIMECLUSTER as "Permanent Reservation".
Use the following esxcli command to mark the device as permanent reservation.
- 426 -
esxcli storage core device setconfig -d <naa.id> --perennially-reserved=true
See KB1016106 in the Knowledge Base site of VMware Inc. for configuration instructions.
Note
Do not mark the LUN of the VMFS datastore in which the mapping file of the shared disk is allocated as "Permanent
Reservation".
- Setting up shared disks (when using the function to stop the link with VMware vCenter Server)
- To use the virtual disk as the shared disk, create the data store shared with each ESXi host. Create the virtual disk in this data
store.
- For virtual device nodes, use a new SCSI controller which is different from the system disk.
(Example: For the SCSI disk [SCSI(X:Y)], X indicates the controller number, and Y indicates the disk number. When the
virtual device node of system disk is [SCSI(0:0)], do not use the virtual device node with the controller number 0
[SCSI(0:Y)]. Use [SCSI(1:0)] etc.)
- Set the controller number and the disk number of virtual device nodes to be consistent among all the nodes that configure the
cluster system.
- For types of SCSI controllers, set the same type as the system disk on a guest OS.
- For sharing SCSI buses, set as follows:
- In the cluster environment between guest OSes on a single ESXi host
[Virtual]
[Physical]
- For sharing the physical network adapter that is used as the cluster interconnect with multiple clusters, allocate a different
port group to each cluster system for a vSwitch. In this case, set different VLAN ID to each port group.
Note
- When bundling the network that is specified to the interconnect by using NIC teaming of VMware, make sure to use any one
of the following configurations to set the load balancing option (active-active configuration) to NIC teaming.
1. Route based on source port ID
2. Route based on source MAC hash
3. Use explicit failover order
Redundant configuration (active-standby) is enabled in any configurations other than the above configurations 1 to 3.
- When using VMware vSphere HA, apply the settings to the destination host of the virtual machine.
- 427 -
- File system settings for system volume
If an I/O device where the system volume is placed fails, a cluster failover does not occur and the system operation may continue
based on the data stored on the memory.
If you want PRIMECLUSTER to trigger a cluster failover by panicking a node in the event that an I/O device where the system
volume is placed fails, set the ext3 or the ext4 file system to the system volume and perform the following setting.
Setting
Specify "errors=panic" to the mount option of each partition (the ext3 or the ext4 file system) included in the system volume.
Example: To set it in /etc/fstab (when /, /var, and /home exist in one system volume)
However, an immediate cluster failover may not become available due to taking time for an I/O error to reach the file system. The
regularly writing to the system volume enhances the detection frequency of I/O error.
- Network settings
In the guest OS in the cluster system, it is necessary to make network settings such as IP addresses for the public LAN and the
administrative LAN.
Implement these settings on the guest OS that you are going to run as a cluster.
See
For details on the installation procedure, see the Installation Guide for PRIMECLUSTER.
See
For details on the kernel parameters, see "3.1.7 Checking and Setting the Kernel Parameters."
SDX_VM_IO_FENCE=on
Applicable nodes:
All the nodes on which PRIMECLUSTER is to be installed.
- 428 -
7. Setting up the /etc/hostid file
Set hostid that is used with the I/O fencing function.
According to the following steps, check whether setting up the /etc/hostid file is required, and then, set it up if needed.
How to check
Execute the hostid command and check the output.
When the output is other than "00000000," setting up the /etc/hostid file is not necessary.
# hostid
a8c00101
When the output is "00000000," follow the setting procedure below to set the host identifier (output of hostid) on all the nodes that
configure the cluster. For the host identifier, specify the value unique to each node. Do not set 00000000 for the value.
Setting procedure
#!/usr/bin/python
from struct import pack
filename = "/etc/hostid"
hostid = pack("I",int("0x<hhhhhhhh>",16))
open(filename, "wb").write(hostid)
(<hhhhhhhh>: Describe the intended host identifier in base 16, 8 digit numbers.)
3. Set the execute permissions to the created script file and then, execute it.
# chmod +x <created script file name>
# ./<created script file name>
4. Execute the hostid command to check if the specified host identifier is obtained.
# hostid
hhhhhhhh
1. For VMware vCenter Server functional cooperation, add the roles to which the following authorities are applied to VMware
vCenter Server:
- Virtual machine-Interaction-Power-off
- Virtual machine-Interaction-Power-on
If the roles cannot be added, check the registered roles that have the above authorities.
2. For VMware vCenter Server functional cooperation, create the user in VMware vCenter Server.
3. Add the user created in step 2 to the authority of the virtual machine that is used as the cluster. Apply the roles that are added
or checked in step 1 to this user.
- 429 -
Note
- If the route from the virtual machine to VMware vCenter Server is interrupted, the virtual machine cannot be forcibly stopped.
In this case, configuring the route to VMware vCenter Server to be redundant is recommended.
- Do not include "\" in the virtual machine name. If it is included, the virtual machine cannot be forcibly stopped normally.
Note
Note
- To activate the modified kernel parameters and the I/O fencing function of GDS, restart the guest OS after installation settings for
related software is complete.
- When using the VMware vCenter Server functional cooperation, do not include "\" in the virtual machine name. If it is included, the
virtual machine cannot be forcibly stopped normally.
See
- 430 -
H.2.3.1 Initial Setup of CF and CIP
Refer to "5.1.1 Setting Up CF and CIP" to set up CF and CIP on the guest OS.
H.2.3.2 Setting Up the Shutdown Facility (when using VMware vCenter Server
Functional Cooperation)
For details on survival priority, see "5.1.2.1 Survival Priority."
In VMware environments, when a failure occurs in a guest OS, the virtual machine of the guest OS where a failure is detected is powered
off forcibly by cooperating with VMware vCenter Server. By this process, an operation can be failed over.
This section explains the method for setting up the SA_vwvmr shutdown agent as the shutdown facility.
Note
Be sure to perform the following operations on all guest OSes (nodes).
# sfcipher -c
Enter User's Password:
Re-enter User's Password:
D0860AB04E1B8FA3
# comment line
CFName: cfname1
VMName: vmname1
vCenter_IP: ipaddress1
vCenter_Port: port
user: user
passwd: passwd
# comment line
CFName: cfname2
VMName: vmname2
vCenter_IP: ipaddress2
vCenter_Port: port2
user: user
passwd: passwd
- 431 -
cfnameX : Specify the CF node name.
vmnameX : Specify the virtual machine name that controls the guest OS described
in CFName.
ipaddressX : Specify the IP address of VMware vCenter Server that manages the virtual
machine.
Available IP addresses are IPv4 and IPv6 address.
IPv6 link local addresses are not available.
When specifying the IPv6 address, enclose it in brackets "[ ]".
(Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0])
portX : Specify the port number of VMware vCenter Server.
When using the default value (443), describe "vCenter_Port:". Do not specify
the parameter.
user : Specify the user of VMware vCenter Server created in
"H.2.1.1 Installation and Configuration of Related Software."
When logging in with single sign-on (SSO), specify user@SSO_domain_name.
passwd : A login password of the account specified by "user".
Specify the encrypted password encrypted in 1.
Note
- One-byte space and a double-byte space is used as a different character. Use one-byte space when inserting a space in the file.
- Only the line start with "#" is treated as a comment. When "#" is in the middle of a line, this "#" is treated as a part of the setting
value.
In the following example, "vm1 # node1's virtual machine." is used as the virtual machine name.
...
VMName: vm1 # node1's virtual machine.
...
- The contents of SA_vwvmr.cfg must be the same on all the guest OSes. If not, the shutdown facility may not work correctly.
Example
##
## node1's information.
##
CFName: node1
VMName: vm1
vCenter_IP: 10.20.30.40
vCenter_Port:
user: [email protected]
passwd: D0860AB04E1B8FA3
##
## node2's information.
##
CFName: node2
VMName: vm2
vCenter_IP: 10.20.30.40
vCenter_Port:
- 432 -
user: [email protected]
passwd: D0860AB04E1B8FA3
##
## node1's information.
##
CFName: node1
VMName: vm1
vCenter_IP: 10.20.30.40
vCenter_Port:
user: root
passwd: D0860AB04E1B8FA3
##
## node2's information.
##
CFName: node2
VMName: vm2
vCenter_IP: 10.20.30.40
vCenter_Port:
user: root
passwd: D0860AB04E1B8FA3
CFNameX,weight=weight,admIP=myadmIP:agent=SA_vwvmr,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_vwvmr,timeout=timeout
Note
The rcsd.cfg file must be the same on all guest OSes (nodes). Otherwise, operation errors might occur.
Example
Below is the setting examples:
node1,weight=1,admIP=10.0.0.1:agent=SA_vwvmr,timeout=45
node2,weight=1,admIP=10.0.0.2:agent=SA_vwvmr,timeout=45
- 433 -
4. Starting the shutdown facility
Check that the shutdown facility has started.
# sdtool -s
If the shutdown facility has already started, execute the following command to restart the shutdown facility.
# sdtool -r
If the shutdown facility is not started, execute the following command to start the shutdown facility.
# sdtool -b
# sdtool -s
H.2.3.3 Setting Up the Shutdown Facility (when using I/O fencing function)
This section explains the method for setting up the SA_icmp shutdown agent as the shutdown facility.
Note
Be sure to perform the following operations on all guest OSes (nodes).
TIME_OUT=value
cfname:ip-address-of-node:NIC-name1,NIC-name2
value : Specify the interval (in seconds) for checking whether the node is
alive. The recommended value is "5" (s).
cfname : Specify the name of the CF node.
ip-address-of-node : Specify the IP addresses of any one of the following networks
utilized for checking whether the cfname node is alive.
Checking via multiple networks is also available.
In this case, add a line for each utilized network.
To check LAN paths, we recommend that you use multiple ones to surely
determine an error.
However, if you prioritize to switch over automatically to
surely determine an error, set only cluster interconnects to the
LAN paths.
If only cluster interconnects are set to the LAN paths, the automatic
switchover is available even though communication is disabled
between cluster interconnects but available via other LAN (when you
determined that the node in the communication destination is alive).
- Cluster interconnect (IP address of CIP)
- Administrative LAN
- Public LAN
Available IP addresses are IPv4 and IPv6 address.
IPv6 link local addresses are not available.
When specifying the IPv6 address, enclose it in brackets "[ ]".
(Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0])
Enter the IP address for all guest OSes (nodes) that configure the
cluster system.
- 434 -
NIC-nameX : Specify the network interface of the local guest OS (node) utilized
for checking whether the node defined by ip-address-of-node is alive.
If there is more than one, delimit them with commas (",").
Note
Registering network interfaces
- For duplicating by GLS, define all redundant network interfaces. (Example: eth0,eth1)
- If you are bonding NICs, define the bonding device behind the IP address. (Example: bond0)
- For registering the cluster interconnect, define all network interfaces that are used on all paths of the cluster interconnect.
(Example: eth2,eth3)
Example
Below indicates the setting example of clusters (consisted by 2 nodes) between guest OSes on multiple ESXi hosts.
- When the public LAN (duplicated (eth0,eth1) by GLS) and the administrative LAN (eth4) are set
TIME_OUT=5
node1:10.20.30.100:eth0,eth1
node1:10.20.40.200:eth4
node2:10.20.30.101:eth0,eth1
node2:10.20.40.201:eth4
CFNameX,weight=weight,admIP=myadmIP:agent=SA_icmp,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_icmp,timeout=timeout
- 435 -
(2) TIME_OUT is less than 18
20
Note
The rcsd.cfg file must be the same on all guest OSes (nodes). Otherwise, operation errors might occur.
Example
Below indicates the setting example to check survival of a node by using administrative LAN and public LAN when TIME_OUT
value described in the SA_icmp.cfg is 10, in a two-node configuration.
node1,weight=1,admIP=192.168.100.1:agent=SA_icmp,timeout=24 (*)
node2,weight=1,admIP=192.168.100.2:agent=SA_icmp,timeout=24 (*)
timeout = (10 (TIMEOUT value) + 2) X 2(administrative LAN, public LAN) = 24
# sdtool -s
If the shutdown facility has already started, execute the following command to restart the shutdown facility.
# sdtool -r
If the shutdown facility is not started, execute the following command to start the shutdown facility.
# sdtool -b
# sdtool -s
- 436 -
H.2.3.5 Setting Up Fault Resource Identification and Operator Intervention Request
Refer to "5.2 Setting up Fault Resource Identification and Operator Intervention Request" to make the settings for identifying fault resources
and for requesting operator intervention.
1. In the Cmdline resource settings, add the Start script, the Stop script, and the Check script in the following format:
<Start script>
/opt/SMAW/bin/hvsgpr -c
<Stop script>
/opt/SMAW/bin/hvsgpr -u
<Check script>
/opt/SMAW/bin/hvsgpr -m
To create Cmdline resources, see, "6.7.3.1 Setting Up Cmdline Resources."
2. In the attribute settings of the Cmdline resources, set the AutoRecover attribute to disabled ("0"). Do not change the default
settings for other attributes.
1. In the cluster application settings, add the PreOnline and OfflineDone scripts in the following format.
<PreOnline script>
/opt/SMAW/bin/hvsgpr -r
<OfflineDone script>
/opt/SMAW/bin/hvsgpr -o
Machines+Basics (app1:consistent)
1) HELP
2) -
3) SAVE+EXIT
4) REMOVE+EXIT
5) AdditionalMachine
6) AdditionalConsole
7) Machines[0]=vm21RMS
8) Machines[1]=vm22RMS
- 437 -
9) (PreCheckScript=)
10) (PreOnlineScript=)
11) (PostOnlineScript=)
12) (PreOfflineScript=)
13) (OfflineDoneScript=)
14) (FaultScript=)
15) (AutoStartUp=yes)
16) (AutoSwitchOver=HostFailure|ResourceFailure|ShutDown)
17) (PreserveState=no)
18) (PersistentFault=0)
19) (ShutdownPriority=)
20) (OnlinePriority=)
21) (StandbyTransitions=ClearFaultRequest|StartUp|SwitchRequest)
22) (LicenseToKill=no)
23) (AutoBreak=yes)
24) (AutoBreakMaintMode=no)
25) (HaltFlag=yes)
26) (PartialCluster=0)
27) (ScriptTimeout=)
Choose the setting to process:10
2. Select "FREECHOICE" and enter the full path of the PreOnline script.
1) HELP
2) RETURN
3) NONE
4) FREECHOICE
Enter the command line to start prior to the application becoming ONLINE:4
>> /opt/SMAW/bin/hvsgpr -r
Machines+Basics (app1:consistent)
1) HELP
2) -
3) SAVE+EXIT
4) REMOVE+EXIT
5) AdditionalMachine
6) AdditionalConsole
7) Machines[0]=vm21RMS
8) Machines[1]=vm22RMS
9) (PreCheckScript=)
10) (PreOnlineScript='/opt/SMAW/bin/hvsgpr~-r')
11) (PostOnlineScript=)
12) (PreOfflineScript=)
13) (OfflineDoneScript=)
14) (FaultScript=)
15) (AutoStartUp=yes)
16) (AutoSwitchOver=HostFailure|ResourceFailure|ShutDown)
17) (PreserveState=no)
18) (PersistentFault=0)
19) (ShutdownPriority=)
20) (OnlinePriority=)
21) (StandbyTransitions=ClearFaultRequest|StartUp|SwitchRequest)
22) (LicenseToKill=no)
23) (AutoBreak=yes)
24) (AutoBreakMaintMode=no)
25) (HaltFlag=yes)
26) (PartialCluster=0)
27) (ScriptTimeout=)
Choose the setting to process:13
- 438 -
4. Select "FREECHOICE" and enter the full path of the OfflineDone script.
1) HELP
2) RETURN
3) NONE
4) FREECHOICE
Enter the command line to start prior to the application becoming ONLINE:4
>> /opt/SMAW/bin/hvsgpr -o
2. In the attribute settings of the cluster application, if the HaltFlag attribute is set to enabled ("1"), add the Fault script in the
following format.
<Fault script>
/opt/SMAW/bin/hvsgpr -f
Machines+Basics (app1:consistent)
1) HELP
2) -
3) SAVE+EXIT
4) REMOVE+EXIT
5) AdditionalMachine
6) AdditionalConsole
7) Machines[0]=vm21RMS
8) Machines[1]=vm22RMS
9) (PreCheckScript=)
10) (PreOnlineScript='/opt/SMAW/bin/hvsgpr~-r')
11) (PostOnlineScript=)
12) (PreOfflineScript=)
13) (OfflineDoneScript='/opt/SMAW/bin/hvsgpr~-o')
14) (FaultScript=)
15) (AutoStartUp=yes)
16) (AutoSwitchOver=HostFailure|ResourceFailure|ShutDown)
17) (PreserveState=no)
18) (PersistentFault=0)
19) (ShutdownPriority=)
20) (OnlinePriority=)
21) (StandbyTransitions=ClearFaultRequest|StartUp|SwitchRequest)
22) (LicenseToKill=no)
23) (AutoBreak=yes)
24) (AutoBreakMaintMode=no)
25) (HaltFlag=yes)
26) (PartialCluster=0)
27) (ScriptTimeout=)
Choose the setting to process:14
2. Select "FREECHOICE" and enter the full path of the Fault script.
1) HELP
2) RETURN
3) NONE
4) FREECHOICE
Enter the command line to start prior to the application becoming ONLINE:4
>> /opt/SMAW/bin/hvsgpr -f
3. Setting up the function to advertise the route information from the switching destination node
It is recommended to enable this function when:
- 439 -
2. The IPv4 address is used as the takeover IP address.
To enable this function, add the following one line at the end of /opt/SMAW/SMAWRrms/bin/hvenv.local file in each node:
export HV_VM_ENABLE_IP_ADVERTISE=1
Note
In any one of the following cases, the function is not enabled even it is set.
Information
If this function is enabled, the ARP packet is sent from the switching destination node in a 60-second cycle for a specified time.
When specifying any command other than hvsgpr command in PreOnline script, OfflineDone script, and Fault script at the same time,
specify the command as any one of the following examples shows:
Example
Example
/opt/SMAW/bin/hvsgpr -o ; /var/tmp/command
- Create the script that runs more than one commands, and then specify that command.
Example
#!/bin/sh
/opt/SMAW/bin/hvsgpr -r
ret1=$?
/var/tmp/command
ret2=$?
if [ $ret1 = 0 ]; then
exit $ret2
fi
exit $ret1
The table below shows how the command can be specified in each script and the notes on specifying the command.
- 440 -
Create the script that runs
Separate the command
Separate the command more than one
by double-ampersand
by semicolon (;). commands, and then
(&&).
specify that command.
PreOnline script Y (*1) - Y (*2)
OfflineDone script - Y Y
Fault script - Y Y
H.3 Operations
For details on functions for managing PRIMECLUSTER system operations, see "Chapter 7 Operations."
Note
- When the hvswitch -f command is executed to start or switch the cluster application, the following message is output and starting or
switching of the cluster application may fail.
ERROR: Forcibly switch request denied, unable to kill node <SysNode name>
This message is output when the node displayed as <SysNode name> is in the LEFTCLUSTER state. Perform the procedure in "5.2
Recovering from LEFTCLUSTER" in "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide." After
that, start or switch the cluster application.
- Do not perform "Suspend operation" for the virtual machine on which the cluster is running. If "Suspend" is performed by mistake, an
operation may not switch automatically. In this case, power off the virtual machine on which "Suspend" is performed, and then switch
the operation manually.
- 441 -
- After the operational virtual machine (VM1) is migrated, both the operational (VM2) and standby (VM1) virtual machines exist on the
same ESXi host.
After restoring the failure, migrate VM1 to another ESXi host so that VM1 and VM2 can operate on different ESXi host.
- 442 -
- After the operational virtual machine (VM1) is migrated, the operational virtual machine (VM2) and the standby virtual machine
(VM1) exist on different ESXi host.
In this case, it is not necessary to migrate VM1 to another ESXi host. However, start VM1 if it is stopped.
Note
After the migration, the status of shutdown facility may be displayed as "KillFailed" or "KillWorked" in the operational virtual machine.
In this case, no corrective action is required. Restart the shutdown facility if restore the status of shutdown facility.
H.5 Maintenance
For details on items and procedures required for maintenance of the PRIMECLUSTER system, see "Chapter 12 Maintenance of the
PRIMECLUSTER System."
- 443 -
Appendix I Using PRIMECLUSTER in RHOSP
Environment
In RHOSP environment, PRIMECLUSTER can be used on the virtual machine instance (hereinafter virtual machine).
See
For more information on RHOSP, refer to the RHOSP manual of Red Hat, Inc.
Note
- Building the cluster system between guest OSes on one compute node
- Building the cluster system between guest OSes on multiple compute nodes
See the table below for usages of each cluster system and notes when building each cluster system.
- 444 -
Cluster type Usage Note
testing a cluster application or for By using high availability configuration for
business operation. compute instances, the operation can continue. *1
*1 For more information on high availability configuration for compute instances, refer to "Red Hat OpenStack Platform High Availability
for Compute Instances."
- Building the cluster system between guest OSes on one compute node
In this configuration, the cluster system can be operated on one compute node. It is suitable configuration for verifying the operation
of userApplication operating on PRIMECLUSTER.
Figure I.1 Cluster system between guest OSes on one compute node
- 445 -
- Building the cluster system between guest OSes on multiple compute nodes
In this configuration, by allocating different hardware (network or disk) for each compute node, the operation can be continued by
failover even if the network or the disk fails.
Figure I.2 Cluster system between guest OSes on multiple compute nodes
Note
If an error occurs in the compute node in the environment where the high availability configuration for compute instances is not used,
the node status becomes LEFTCLUSTER. For how to recover from LEFTCLUSTER, see "I.3.2.1 If Not Using the High Availability
Configuration for Compute Instances."
By using the high availability configuration for compute instances, the operation can continue even if an error occurs in the compute
node. However, recover both compute node and virtual machine where an error occurred manually. For the recovery procedure, see "I.
3.2.2 If Using the High Availability Configuration for Compute Instances."
In RHOSP environment, set up the network configuration and the security groups as follows:
- Network configuration:
- The cluster interconnect must be the network independent from the administrative LAN, the public LAN, and the network used for
the mirroring among servers function of GDS.
- The virtual machines configuring the cluster can communicate with various service end points of RHOSP.
- Security groups:
Set up the following two security groups:
- The security group for both public and administrative LANs between the virtual machines configuring the cluster
- The security group for cluster interconnect that disables a communication with other than the virtual machines configuring the
cluster
- 446 -
I.2 Installation
This section describes how to install PRIMECLUSTER in RHOSP environment.
The installation must be done according to the following flow.
# rpm -q openstack-selinux
Example
# rpm -q openstack-selinux
openstack-selinux-0.8.14-1.el7ost.noarch
If the version of openstack-selinux is older than 0.8.13-1, apply errata to update the openstack-selinux package to its latest version.
- 447 -
See
For how to set up RHOSP, refer to the RHOSP manual of Red Hat, Inc.
To communicate with various service end points of RHOSP from the virtual machine, connect to the subnets of public LAN (also used as
the administrative LAN).
2. Creating Security Group for Public LAN (also used as Administrative LAN)
Set IP filter rules necessary for the PRIMECLUSTER operations to the security group for the public LAN (also used as the administrative
LAN).
Use the setting values below.
Communication Communication target Protocol Start port number End port number
direction information
egress Not specified tcp 443 443
ingress Local security group udp 9382 9382
egress Local security group udp 9382 9382
ingress Local security group udp 9796 9796
egress Local security group udp 9796 9796
ingress Local security group tcp 9797 9797
egress Local security group tcp 9797 9797
egress Virtual gateway IP address icmp Not specified Not specified
ingress Local security group tcp 3260 3260
egress Local security group tcp 3260 3260
ingress Client IP address (*) tcp 8081 8081
- 448 -
Communication Communication target Protocol Start port number End port number
direction information
ingress Client IP address (*) tcp 9798 9798
ingress Client IP address (*) tcp 9799 9799
ingress Local security group tcp 9200 9263
egress Local security group tcp 9200 9263
(*) If multiple clients connect to Web-Based Admin View, register IP addresses of all of the connected clients.
When building multiple cluster systems in the same tenant (project), create only one security group in the tenant (project). The security
group can be used for the multiple cluster systems in the same tenant (project).
Communication Communication target Protocol Start port number End port number
direction information
egress Local security group 123 Not specified Not specified
ingress Local security group 123 Not specified Not specified
When building multiple cluster systems in the same tenant (project), create only one security group in the tenant (project). The security
group can be used for the multiple cluster systems in the same tenant (project).
Communication Communication target Protocol Start port number End port number
direction information
ingress ssh client IP address tcp 22 22
egress DNS server IP address udp 53 53
egress NTP server IP address udp 123 123
Note
When the yum command is used, use the setting values below.
Communication Communication target Protocol Start port number End port number
direction information
egress Repository IP address tcp 80 80
- 449 -
Item name Setting value
Server group behavior* anti-affinity (for the cluster system between guest OSes on
multiple compute nodes)
or
affinity (for the cluster system between guest OSes on one
compute node)
* soft-affinity and soft-anti-affinity can also be set. However, it is not recommended because the compute node in which the guest OS is
working may change at startup of the guest OS. If soft-affinity or soft-anti-affinity is set, be aware that the server group may work in a
different configuration other than "Cluster type" selected in "I.1 Cluster System in RHOSP Environment."
Note
When creating multiple cluster systems, each cluster system needs its own server group.
- Creating Port for Public LAN (also used as the administrative LAN)
- Creating Port for Cluster Interconnect
- Creating Virtual Machine
- Connecting Storage Device (iSCSI connection) or Block Storage
- Applying errata
- Creating .curlrc
- 450 -
Table I.2 Port created in the subnet of cluster interconnect
Item name Setting value
Port name Any port name
Network ID Network ID
Subnet ID Subnet ID for the cluster interconnect created in "1. Creating Provider
Network"
Private IP address IP address of the cluster interconnect
ID list of security group Security group for the cluster interconnect created in "3. Creating Security
Group for Cluster Interconnect"
- 451 -
See
For how to connect the iSCSI device to the virtual machine, refer to "Red Hat Enterprise Linux 6 Storage Administration Guide" or "Red
Hat Enterprise Linux 7 Storage Administration Guide."
5. Applying errata
Execute the following command to check the version of curl.
# rpm -q curl
Example
# rpm -q curl
curl-7.19.7-52.el6.x86_64
If the version of curl is 7.19.7-43 or older, apply errata to update the curl package to its latest version.
6. Creating .curlrc
Add the following line to the /root/.curlrc file. If there is no file, create it and describe the following line.
tlsv1.2
I.2.3 Presetting
1. Disabling Firewall
[Red Hat Enterprise Linux 6]
Check if iptables and ip6tables are disabled.
2. NTP settings
Before building the cluster, make sure to set up NTP that synchronizes the time of each node in the cluster system.
- 452 -
Make these settings on the guest OS before you install PRIMECLUSTER.
Note
If OS is never restarted after creating the virtual machine, restart OS and then install PRIMECLUSTER.
See
For details on the installation procedure, see the Installation Guide for PRIMECLUSTER.
See
For details on the kernel parameters, see "3.1.7 Checking and Setting the Kernel Parameters."
Note
Restart OS to enable the changed kernel parameters.
See
- 453 -
I.2.7.1 Initial GLS Setup
When using GLS, take the following steps to set up the initial settings of GLS for the network used as the public LAN (also used as the
administrative LAN). For more information on each setting, refer to "PRIMECLUSTER Global Link Services Configuration and
Administration Guide Redundant Line Control Function."
Note
If the initial settings are not correct, you may not access the system. Take the snapshot of the system disk before applying the settings.
Example
2. In the /etc/sysconfig/network-scripts/ifcfg-eth0 file, comment out TYPE, set "static" to BOOTPROTO and "no" to PEERDNS.
Add "HOTPLUG=no" and "DEVICETYPE=hanet".
- /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
#TYPE=Ethernet
BOOTPROTO=static
UUID=<Fixed value depending on environment (no change necessary)>
HOTPLUG=no
ONBOOT=yes
DEVICETYPE=hanet
PEERDNS=no
- /etc/sysconfig/network-scripts/ifcfg-sha0
DEVICE=sha0
#IPADDR=
#NETMASK=
BOOTPROTO=dhcp
ONBOOT=yes
DEVICETYPE=sha
HOTPLUG=no
PEERDNS=yes
DNS1=<IP address of master DNS server>
DNS2=<IP address of sub DNS server>
- 454 -
Note
Do not set SHAMACADDR in the ifcfg-sha0 file.
Example
Example
Example
Example
# /opt/FJSVhanet/usr/sbin/hanetconfig print
[IPv4,Patrol / Virtual NIC]
[IPv6]
# /opt/FJSVhanet/usr/sbin/hanetpathmon target
[Target List]
Name VID Target
+-------+----+----------------------------------------------------------+
sha0 - 172.16.0.1
# /opt/FJSVhanet/usr/sbin/hanetpathmon param
[Parameter List]
Name Monitoring Parameter
+-------+----------------------------------------------------------+
sha0 auto_startup = yes
interval = 3 sec
- 455 -
times = 5 times
repair_times = 2 times
idle = 45 sec
Auto fail-back = no
FAILOVER Status = no
# /opt/FJSVhanet/usr/sbin/hanetmask print
network-address netmask
+---------------+---------------+
172.16.0.0 255.255.255.0
# /opt/FJSVhanet/usr/sbin/hanethvrsc print
ifname takeover-ipv4 takeover-ipv6 vlan-id/logical ip address list
+----------+----------------+----------------+--------------------------------+
sha0:65 172.16.0.100 - -
# /sbin/shutdown -r now
Example
DOMAIN_NAME=primecluster_domain
PROJECT_NAME=primecluster_project
IDENTITY=https://192.168.11.11:5000
COMPUTE=https://192.168.11.11:8774
- 456 -
I.2.8.1 Initial Setup of Cluster
This section describes the initial setup of cluster of PRIMECLUSTER.
For more information on each setting, refer to the following sections.
Note
- After setting up the shutdown agent, conduct the forcible shutdown testing of cluster node to confirm that the correct node can be
forcibly shut down. For more information on the forcible shutdown testing of cluster node, refer to "1.4 Test."
- Contents of SA_vmosr.cfg and rcsd.cfg files must be the same on all the nodes. If not, malfunction will occur.
- If the user password created in "I.2.2.1 Creating User for Forcible Shutdown" is changed, log in with a new password and perform this
procedure again.
- 457 -
CFNameX,weight=weight,admIP=myadmIP:agent=SA_vmosr,timeout=125
CFNameX,weight=weight,admIP=myadmIP:agent=SA_vmosr,timeout=125
Example:
# cat /etc/opt/SMAW/SMAWsf/rcsd.cfg
node1,weight=1,admIP=192.168.1.1:agent=SA_vmosr,timeout=125
node2,weight=1,admIP=192.168.1.2:agent=SA_vmosr,timeout=125
After creating the /etc/opt/SMAW/SMAWsf/rcsd.cfg file, set the owner, group, and the access authority as follows.
# sfcipher -c
Example:
If the password is "rhospadmin$"
# sfcipher -c
Enter Password: <= Enter rhospadmin$
Re-Enter Password: <= Enter rhospadmin$
RpM9gPEcc3n1Mm3fVr77Ig==
Example:
If the CF node name of cluster host is node1/node2, the instance name is instance1/ instance2, and the user name for instance control
is pcl.
# cat /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg
node1 instance1 pcl RpM9gPEcc3n1Mm3fVr77Ig==
node2 instance2 pcl RpM9gPEcc3n1Mm3fVr77Ig==
Create the /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg file and then set the owner, group, and access authority as shown below.
- 458 -
Note
- Make sure that the setting contents of /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg file are correct. If not, the shutdown facility
cannot be performed normally.
- Make sure that the instance name (InstanceName) corresponding to the CF node name (CFNameX) of the cluster host in the /etc/
opt/SMAW/SMAWsf/SA_vmosr.cfg file is set. If not, a different node may be forcibly shut down.
# sdtool -s
If the shutdown facility is already started, execute the following commands to restart the shutdown facility on all the nodes.
# sdtool -e
# sdtool -b
If the shutdown facility is not started, execute the following command to start the shutdown facility on all the nodes.
# sdtool -b
# sdtool -s
Note
- If "The RCSD is not running" is displayed, the settings of shutdown daemon or shutdown agent are incorrect. Perform Step 1 to
4 again.
- If the virtual machine name created in "I.2.2.4 Creating Virtual Machine for Cluster Node" is changed, perform Step 3 to 5 again.
Information
Display results of the sdtool -s command
- If Init State is "Unknown" or "Init-ing" is displayed as Init State, wait for a minute and then check again.
- If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA
initialization. "Unknown" is displayed temporarily until the actual status can be confirmed.
- If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node
displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, or
network resources being used by that agent.
- If "InitFailed" is displayed as Init State, a communication with the endpoint of RHOSP Identity or Compute service may fail, or
the settings are incorrect. Confirm the following items for resetting.
After the failure-causing problem is resolved and SF is restarted, the status display changes to InitWorked or TestWorked.
a. Execute the following command and confirm that the instance where the cluster host is operating can communicate with
the Identity service.
- 459 -
- errata must be applied.
When the curl version displayed after executing rpm -q curl is 7.19.7-43 or older, errata is not applied. Perform "5.
Applying errata".
- curlrc must be created.
See "6. Creating .curlrc" and make sure that .curlrc is created as indicated by the procedure.
- The RHOSP security group must be set properly.
- The virtual router of RHOSP must be created.
- The default router of cluster host must be set in the virtual router.
- The URL of Identity service endpoint is correct.
b. Execute the following command and check if the instance where the cluster host is operating can communicate with the
Compute service.
{"error": {"message": "The request you have made requires authentication.", "code": 401,
"title": "Unauthorized"}}
If messages other than the above are displayed, make sure the following settings are done correctly.
- The RHOSP security group must be set properly.
- The virtual router of RHOSP must be created.
- The default router of cluster host must be set in the virtual router.
- The URL of Compute service endpoint is correct.
- 460 -
I.3 Operations
For details on functions for managing PRIMECLUSTER system operations, see "Chapter 7 Operations."
For the operations required for Live Migration, refer to "I.3.1 Required Operations for Live Migration."
See
For the operations required for GDS, refer to "Operation and Maintenance" in "PRIMECLUSTER Global Disk Services Configuration and
Administration Guide", and for the operations required for GLS, refer to "GLS operation on cluster systems" in "PRIMECLUSTER Global
Link Services Configuration and Administration Guide Redundant Line Control Function."
Note
# sdtool -e
# sdtool -b
# sdtool -s
- 461 -
I.3.2 Corrective Actions When an Error Occurs in the Compute Node
I.3.2.1 If Not Using the High Availability Configuration for Compute Instances
If an error occurs in the compute node in the environment where the high availability configuration for compute instances is not used, the
compute node becomes LEFTCLUSTER. This section describes the recovery procedure from the LEFTCLUSTER state.
1. Make sure that the cluster node is actually stopped. Stop the node if it is operating.
2. If the cluster node where an error occurred becomes LEFTCLUSTER, perform the procedure described in "Recovering from
LEFTCLUSTER" in "PRIMECLUSTER Cluster Foundation Configuration and Administration Guide."
3. Check the compute node status and recover the compute node.
You can skip this step if the compute node is recovered automatically.
Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.
Example
# cftool -n
Node Number State Os Cpu
node1 1 UP Linux EM64T
node2 2 UP Linux EM64T
Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.
For the following operations, refer to "7.2 Operating the PRIMECLUSTER System."
1. Perform the following procedures on the director or the controller node to move the cluster node to another compute node.
1. Execute the following command to reset the cluster node status on the compute node where an error occurred.
Example: If the instance name of the cluster node is instance1
2. If the cluster node on the compute node where an error occurred is not moved automatically to another compute node after step
1 was executed, execute the following command to move it to another compute node.
Example: If the instance name of the cluster node is instance1
For more information on the nova command, refer to the RHOSP manual of Red Hat, Inc.
2. Execute the following command on any one node in the cluster system and make sure that all the cluster nodes have joined the cluster.
# cftool -n
Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.
- 462 -
Example
# cftool -n
Node Number State Os Cpu
node1 1 UP Linux EM64T
node2 2 UP Linux EM64T
Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.
For the following operations, refer to "7.2 Operating the PRIMECLUSTER System."
3. Check the compute node status and recover the compute node.
You can skip this step if the compute node is recovered automatically.
I.5 Maintenance
For the items and procedures required for the maintenance of PRIMECLUSTER system in RHOSP environment, refer to "Chapter 12
Maintenance of the PRIMECLUSTER System." For the maintenance of GDS, refer to "Operation and Maintenance" in "PRIMECLUSTER
Global Disk Services Configuration and Administration Guide." For the maintenance of GLS, refer to "Maintenance" in
"PRIMECLUSTER Global Link Services Configuration and Administration Guide Redundant Line Control Function."
See
Refer to "Settings Before Backing Up" of "Backing Up and Restoring System Disk" in "PRIMECLUSTER Global Disk Services
Configuration and Administration Guide."
See
For how to create snapshots, refer to the RHOSP manual of Red Hat, Inc.
See
Refer to "Settings After Backing Up" of "Backing Up and Restoring System Disk" in "PRIMECLUSTER Global Disk Services
Configuration and Administration Guide."
- 463 -
I.5.1.2 Restoring Virtual Machine
In either of the following cases, take the following steps for restoring:
[How to restoring]
See
Refer to "Settings Before Restoring" of "Backing Up and Restoring System Disk" in "PRIMECLUSTER Global Disk Services
Configuration and Administration Guide."
4. Restore the virtual machine from the snapshot. At the same time when restoring, OS is started.
Set up the virtual machine to be restored as follows.
- 464 -
Note
Make sure to use this procedure to set up the additional volume registered in GDS.
If the additional volume is not set up during this procedure, do not attach the additional volume to the restored virtual machine
but restore the virtual machine again according to this procedure. If the additional volume is attached to the restored virtual
machine, the remaining steps fail.
See
Refer to "Settings After Restoring" of "Backing Up and Restoring System Disk" in "PRIMECLUSTER Global Disk Services
Configuration and Administration Guide."
4. If the virtual machine name has been changed in Step 2, take the following steps and changed the settings of shutdown facility.
1. Execute the following command on all the nodes to stop the shutdown facility.
# sdtool -e
2. Describe the changed virtual machine name to the configuration file of the shutdown agent.
See
For the descriptions of configuration file, refer to "2. Setting up Shutdown Facility."
3. Execute the following command on all the nodes to start the shutdown facility.
# sdtool -b
4. Execute the following command on all the nodes and make sure that the shutdown facility operates normally.
# sdtool -s
Note
If "InitFailed" is displayed as the default status, or "Unknown" or "TestFailed" is displayed in the test status even after the settings
of shutdown facility are changed, the settings of agent or network may be incorrect. Check again the settings of agent or network.
- 465 -
Appendix J Startup Scripts and Startup Daemons, and
Port Numbers in PRIMECLUSTER
This appendix provides explanations on scripts and daemons that are started by PRIMECLUSTER, and the port numbers being used in
RHEL6.
Startup script
Name of startup script.
Function
Function of startup script and daemon.
Effect if stopped
Effect if startup script and daemon are stopped.
Startup daemon
Daemon started by startup script.
If no mentions are described in "Remarks", the daemon is resident in the system without depending on the settings or configurations.
Utilized port
Port
Port number.
Protocol
Protocol - TCP or UDP.
Send/Receive
"s" if port sends data, "r" if it receives data, "s, r" for both.
Network
Utilized network - any of Cluster interconnect, administrative LAN, or public LAN.
Target
Node that uses the port.
Communication target
Port
Port number of communication target.
Target
Node or device that uses the port of the communication target.
Remarks
Remarks
- 466 -
init (inittab)
Function
Basic part of GDS.
Effect if stopped
GDS functions cannot be used.
Startup daemon
/usr/sbin/sdxmond
Utilized port
None.
Remarks
None.
init (inittab)
Function
Monitoring of shutdown facility.
Effect if stopped
If shutdown facility terminates abnormally, it will not be restarted.
Startup daemon
/opt/SMAW/SMAWsf/bin/rcsd_monitor
Utilized port
None.
Remarks
None.
/etc/rc3.d
S05poffinhibit
Function
Initializing kdump shutdown agent.
Effect if stopped
Forcible stop by kdump shutdown agent is disabled.
Startup daemon
None.
Utilized port
None.
Remarks
Enabled only in physical environment.
S06clonltrc
Function
Loading the driver of the online trace.
Effect if stopped
The information for investigation of the cluster resource management facility cannot be collected.
- 467 -
Startup daemon
None.
Utilized port
None.
Remarks
None.
S07clapi
Function
Beginning of online trace of the Cluster Resource Management facility (1).
Effect if stopped
The cluster cannot be started.
Startup daemon
None.
Utilized port
None.
Remarks
None.
S07cllkcd
Function
Initializing kdump shutdown agent.
Effect if stopped
None.
Startup daemon
None.
Utilized port
None.
Remarks
None.
S08clrms
Function
Beginning of online trace of the Cluster Resource Management facility (2).
Effect if stopped
The cluster cannot be started.
Startup daemon
None.
Utilized port
None.
Remarks
None.
- 468 -
S12cf
Function
Loading of CF and CIP drivers.
Effect if stopped
The cluster cannot be started.
Startup daemon
/opt/SMAW/SMAWcf/bin/cfregd
Utilized port
None.
Remarks
None.
S12zcldevmon
Function
Startup of MMB asynchronous monitoring.
Effect if stopped
MMB asynchronous monitoring cannot be used.
Startup daemon
/etc/opt/FJSVcluster/sys/devmmbd
/etc/opt/FJSVcluster/sys/devmmbmond
/etc/opt/FJSVcluster/sys/devmmbmonitord
/etc/opt/FJSVcluster/sys/devmalogd
Utilized port
Remarks
(*1) These ports are used when SA_mmbp and SA_mmbr are set in the Shutdown Facility on PRIMEQUEST.
S13SMAWsf
Function
Startup of Shutdown Facility.
Effect if stopped
Shutdown Facility cannot be used.
Startup daemon
/opt/SMAW/SMAWsf/bin/rcsd
Utilized port
- 469 -
Port Protocol Send/ Network Target Communication target
Receive
Port Target
9382 (*1) UDP s, r Administrative Cluster node ANY Remote cluster
LAN node
ANY UDP s, r Administrative Cluster node 623 (*2) BMC/iRMC
LAN
ANY UDP s,r Administrative Cluster node 161 (*3) Management blade
LAN
Remarks
These ports are used to prevent split brain.
(*1) No. 9382 is set to support the service name "sfadv."
(*2) This port is used when SA_ipmi is set in the Shutdown Facility on PRIMERGY.
(*3) This port is used when SA_blade is set in the Shutdown Facility on the Blade server.
S11hanet
Function
Startup of daemons and activation of virtual interfaces.
Effect if stopped
Creation of LAN redundancy using the Redundant Line Control function is not available.
Startup daemon
/opt/FJSVhanet/etc/sbin/hanetctld
/opt/FJSVhanet/etc/sbin/hanetselect (*1) (*2)
/opt/FJSVhanet/etc/sbin/hanetpathmd (*2)
Utilized port (*3)
Remarks
(*1) This daemon is started by hanetctld only when NIC switching mode or GS linkage mode is used. The start timing of the daemon
depends on the configuration.
(*2) Availability of startup and the number of processes rely on the configuration. Also, this may be suspended according to the
monitoring status.
(*3) The port is used only in GS linkage mode.
S24hanet2
Function
Startup of monitoring daemon and self check daemon.
Effect if stopped
The line monitoring function and the self-checking function cannot work.
Startup daemon
/opt/FJSVhanet/etc/sbin/hanetmond (*1)
Utilized port
None.
- 470 -
Remarks
(*1) This daemon is started only when the self-checking function is used.
S27SMAWsfex
Function
Starting Configuration Update Service for shutdown agent.
Effect if stopped
Configuration Update Service for shutdown agent does not work.
Startup daemon
None.
Utilized port
None.
Remarks
Only when Starting Configuration Update Service for shutdown agent is enabled.
S51cldbm
Function
Startup of cluster configuration management facility.
Effect if stopped
The cluster cannot be started.
Startup daemon
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmmond
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmmstd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmevmd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmfcpd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmsynd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmprcd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmcfmd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmdbud
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmcomd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmdbcd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmlckd
/etc/opt/FJSVcluster/FJSVclrms/daemons/clwatchlogd
Utilized port
- 471 -
Port Protocol Send/ Network Target Communication target
Receive
Port Target
9384 (*8) TCP s, r Interconnect Cluster node ANY Remote cluster
node
Remarks
(*1) No. 9331 is set to support the service name "dcmcom."
(*2) No. 9379 is set to support the service name "dcmsync."
(*3) No. 9378 is set to support the service name "dcmlck."
(*4) No. 9377 is set to support the service name "dcmfcp."
(*5) No. 9376 is set to support the service name "dcmevm."
(*6) No. 9375 is set to support the service name "dcmmst."
(*7) No. 9383 is set to support the service name "dcmcom2."
(*8) No. 9384 is set to support the service name "dcmlck2."
S51clrmgr
Function
Startup of Cluster Resource Management facility.
Effect if stopped
The cluster cannot be started.
Startup daemon
/etc/opt/FJSVcluster/FJSVcldbm/daemons/clrmd
Utilized port
None
Remarks
None.
S51clrwz
Function
Setting of cluster applications.
Effect if stopped
Cluster applications cannot be configured correctly, or will not work correctly.
Startup daemon
None.
Utilized port
None.
Remarks
None.
S52sfdsk
Function
Basic part of GDS.
Effect if stopped
GDS functions cannot be used.
- 472 -
Startup daemon
/usr/sbin/sdxlogd
/usr/sbin/sdxservd
/usr/sbin/sdxexd
Utilized port
None.
Remarks
None.
S53clctrl
Function
Waiting for completion of startup of Cluster Resource Management facility.
Effect if stopped
The cluster cannot be started.
Startup daemon
/usr/sbin/sdxclc
/usr/sbin/sdxcle
/usr/sbin/sdxcld
Utilized port
None.
Remarks
None.
S53sfdsk2
Function
Basic part of GDS.
Effect if stopped
GDS functions cannot be used.
Startup daemon
None.
Utilized port
None.
Remarks
None.
S57sfcfsrm
Function
Startup control for monitoring facility of GFS shared file system, mount control for GFS shared file system.
Effect if stopped
Functions of GFS shared file system cannot be used.
Startup daemon
/usr/lib/fs/sfcfs/sfcpncd
/usr/lib/fs/sfcfs/sfcprmd
/usr/lib/fs/sfcfs/sfchnsd
- 473 -
/usr/lib/fs/sfcfs/sfcfrmd
/usr/lib/fs/sfcfs/sfcfsd
/usr/lib/fs/sfcfs/sfcfsmg
Utilized port
Remarks
(*1) No. 9300 is set to support the service name "sfcfsrm."
(*2) From No. 9200 to No. 9263 are set to support the service names from sfcfs-1 to sfcfs-64.
S76clprmd
Function
Startup of process monitoring facility.
Effect if stopped
Applications using the process monitoring functions will not work.
Startup daemon
/etc/opt/FJSVcluster/FJSVclapm/daemons/prmd
Utilized port
None.
Remarks
Exclusive for PRIMECLUSTER products.
S99SMAWRrms
Function
Startup of RMS.
Effect if stopped
Even if HV_RCSTART=1 is set, RMS will not start automatically at node startup.
Startup daemon
/opt/SMAW/SMAWRrms/bin/bm
/opt/SMAW/SMAWRrms/bin/hvdet_xxxx
(Detectors and applications used in cluster applications will start.)
Utilized port
- 474 -
Remarks
(*1) No. 9786 is set to support the service name "rmshb."
If the port number overlaps with another application, change the number used in the application to resolve the conflict.
S99fjsvwvbs
Function
Startup of daemons on Web-Based Admin View management server or monitoring nodes.
Effect if stopped
Settings and monitoring via the GUI provided by Web-Based Admin View will not be available.
Startup daemon
[For nodes working as primary or secondary management servers]
/opt/SMAW/SMAWcj2re/jre/bin/java
/opt/FJSVwvbs/etc/bin/wvAgent (2 processes)
/etc/opt/FJSVwvfrm/sbin/wvClEventd (0 to 2 processes)
/etc/opt/FJSVwvfrm/sbin/wvFaultEventd (0 to 2 processes)
[For nodes other than those described above]
/opt/FJSVwvbs/etc/bin/wvAgent (2 processes)
/etc/opt/FJSVwvfrm/sbin/wvClEventd (0 to 2 processes)
/etc/opt/FJSVwvfrm/sbin/wvFaultEventd (0 to 2 processes)
Utilized port
Remarks
(*1) No. 9799 is set to support the service name "fjwv_c."
(*2) No. 9798 is set to support the service name "fjwv_s."
(*3) No. 9797 is set to support the service name "fjwv_n."
(*4) No. 9796 is set to support the service name "fjwv_g."
(*5) Including concurrent use with cluster nodes.
(*6) PC
S99fjsvwvcnf
Function
WWW server for sending Java applets, Java classes, and HTML contents to clients.
Effect if stopped
Settings and monitoring via the GUI provided by Web-Based Admin View will not be available.
Startup daemon
/opt/FJSVwvcnf/bin/wvcnfd
- 475 -
Utilized port
Remarks
(*1) No. 8081 is set to support the service name "fjwv-h."
(*2) Including concurrent use with cluster nodes.
(*3) PC
For wvcnfd of the Web-Based Admin View process, there is an additional child process of the same name while processing a request
from a client. This process, however, terminates immediately after processing the request.
- crond
- iscsid (*1)
- libvirtd (*2)
- ntpd
- radvd (*3)
- rsyslog (rsyslogd)
- snmptrapd (*4)
- tgtd (*1)
(*1) The iscsid daemon and the tgtd daemon are necessary when using the mirroring among servers.
(*2) The libvirtd daemon is necessary for the KVM environment.
(*3) The radvd daemon is necessary only if Fast switching mode is used as the redundant line control method of GLS, and IPv6
communication is used.
(*4) The snmptrapd daemon is necessary only when MMB asynchronous monitoring is used.
- 476 -
Appendix K Systemd Services and Startup Daemons, and
Port Numbers in PRIMECLUSTER
This appendix provides explanations on systemd services and daemons that are started by PRIMECLUSTER, and the port numbers being
used in RHEL7.
Name of Unit
Name of Unit.
Function
Function of Unit.
Effect if stopped
Effect if unit is stopped.
Dependence with other Units
Requires
Prerequisite Units needed by this Unit. If the Units listed here fail to start, this Unit will not be started.
Wants
Prerequisite Units needed by this Unit. If the Units listed here fail to start, this Unit will be started.
Before
Other Units started after this Unit.
After
Other Units started before this Unit.
Startup daemon
Daemon started by Unit.
If no mentions are described in "Remarks", the daemon is resident in the system without depending on the settings or configurations.
Utilized port
Port
Port number.
Protocol
Protocol - TCP or UDP.
Send/Receive
"s" if port sends data, "r" if it receives data, "s, r" for both.
Network
Utilized network - any of Cluster interconnect, administrative LAN, or public LAN.
Target
Node that uses the port.
Communication target
Port
Port number of communication target.
- 477 -
Target
Node or device that uses the port of the communication target.
Remarks
Remarks
fjsvclapi.service
Function
Beginning of online trace of the Cluster Resource Management facility (2).
Effect if stopped
The cluster cannot be started.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
fjsvclrmgr.service
Startup daemon
None.
Utilized port
None.
Remarks
None.
fjsvclctrl.service
Function
Waiting for completion of startup of Cluster Resource Management facility.
Effect if stopped
The cluster cannot be started.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
- 478 -
After
fjsvclrmgr.service
Startup daemon
/usr/sbin/sdxclc
/usr/sbin/sdxcle
/usr/sbin/sdxcld
Utilized port
None.
Remarks
None.
fjsvcldbm.service
Function
Startup of Cluster Resource Management facility (1).
Effect if stopped
The cluster cannot be started.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
fjsvclapi.service
fjsvclrms.service
smawcf.service
Startup daemon
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmmond
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmmstd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmevmd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmfcpd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmsynd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmprcd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmcfmd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmdbud
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmcomd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmdbcd
/etc/opt/FJSVcluster/FJSVcldbm/daemons/dcmlckd
/etc/opt/FJSVcluster/FJSVclrms/daemons/clwatchlogd
Utilized ports
- 479 -
Port Protocol Send/ Network Target Communication target
Receive
Port Target
9379 (*2) TCP s, r Interconnect Cluster node ANY Local and remote
cluster nodes
9378 (*3) TCP s, r Interconnect Cluster node ANY Local cluster node
9377 (*4) TCP s, r Interconnect Cluster node ANY Local and remote
cluster nodes
9376 (*5) TCP s, r Interconnect Cluster node ANY Local cluster node
9375 (*6) TCP s, r Interconnect Cluster node ANY Local cluster node
9383 (*7) TCP s, r Interconnect Cluster node ANY Remote cluster
node
9384 (*8) TCP s, r Interconnect Cluster node ANY Remote cluster
node
Remarks
(*1) No. 9331 is set to support the service name "dcmcom."
(*2) No. 9379 is set to support the service name "dcmsync."
(*3) No. 9378 is set to support the service name "dcmlck."
(*4) No. 9377 is set to support the service name "dcmfcp."
(*5) No. 9376 is set to support the service name "dcmevm."
(*6) No. 9375 is set to support the service name "dcmmst."
(*7) No. 9383 is set to support the service name "dcmcom2."
(*8) No. 9384 is set to support the service name "dcmlck2."
fjsvcldev.service
Function
Startup of iRMC/MMB asynchronous monitoring.
Effect if stopped
iRMC/MMB asynchronous monitoring cannot be used.
Dependence with other Units
Requires
None.
Wants
None.
Before
smawrrms.service
After
poffinhibit.service
y30SVmco.service
FJSVfefpcl.service
smawcf.service
fjsvclonltrc.service
FJSVossn.service
ipmi.service
snmptrapd.service
- 480 -
Startup daemon
PRIMEQUEST 2000 series
/etc/opt/FJSVcluster/sys/devmmbd
/etc/opt/FJSVcluster/sys/devmmbmond
/etc/opt/FJSVcluster/sys/devmmbmonitord
/etc/opt/FJSVcluster/sys/devmalogd
PRIMEQUEST 3000 series
/etc/opt/FJSVcluster/sys/devirmcd
/etc/opt/FJSVcluster/sys/devirmcmonitord
/etc/opt/FJSVcluster/sys/devmalogd
Utilized ports
Remarks
(*1) These ports are used when SA_mmbp and SA_mmbr are set in the shutdown facility on PRIMEQUEST 2000 series.
(*2) These ports are used when SA_irmcp, SA_irmcr, and SA_irmcf are set in the shutdown facility on PRIMEQUEST 3000 series.
fjsvcldev-clirmcmonctl.service
Function
Operation of iRMC asynchronous monitoring.
Effect if stopped
None.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
None.
Startup daemon
None.
Utilized ports
None.
- 481 -
Remarks
This service operates only when the clirmcmonctl command is executed and is always in the "inactive (dead)" state.
fjsvcldev-clmmbmonctl.service
Function
Operation of MMB asynchronous monitoring.
Effect if stopped
None.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
None.
Startup daemon
None.
Utilized ports
None.
Remarks
This service operates only when the clmmbmonctl command is started and is always in the "inactive (dead)" state.
fjsvcllkcd.service
Function
Checking the definition file for kdump.
Effect if stopped
None.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
None.
Startup daemon
None.
- 482 -
Utilized port
None.
Remarks
There is no effect if it is stopped because this service operates only at the startup and the daemon does not reside.
fjsvclonltrc.service
Function
Beginning of online trace of the Cluster Resource Management facility (1).
Effect if stopped
Information necessary for the trouble investigation cannot be collected.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
None.
Startup daemon
None.
Utilized port
None.
Remarks
None.
fjsvclprmd.service
Function
Startup of process monitoring facility.
Effect if stopped
Applications using the process monitoring functions will not work.
Dependence with other Units
Requires
None.
Wants
None.
Before
smawrrms.service
After
fjsvclctrl.service
- 483 -
Startup daemon
/etc/opt/FJSVcluster/FJSVclapm/daemons/prmd
Utilized port
None.
Remarks
Exclusive for PRIMECLUSTER products.
fjsvclrmgr.service
Function
Startup of Cluster Resource Management facility (2).
Effect if stopped
The cluster cannot be started.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
fjsvcldbm.service
Startup daemon
/etc/opt/FJSVcluster/FJSVcldbm/daemons/clrmd
Utilized port
None.
Remarks
None.
fjsvclrmgr2.service
Function
Startup of Cluster Resource Management facility (3).
Effect if stopped
The cluster cannot be started.
Dependence with other Units
Requires
None.
Wants
None.
Before
smawrrms.service
- 484 -
After
fjsvclctrl.service
Startup daemon
None.
Utilized port
None.
Remarks
None.
fjsvclrms.service
Function
Beginning of online trace of the Cluster Resource Management facility (3).
Effect if stopped
The cluster cannot be started.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
fjsvclonltrc.service
Startup daemon
None.
Utilized port
None.
Remarks
None.
fjsvclrwz.service
Function
Setting of cluster applications.
Effect if stopped
Cluster applications cannot be configured correctly, or will not work correctly.
Dependence with other Units
Requires
None.
Wants
None.
- 485 -
Before
None.
After
fjsvclctrl.service
Startup daemon
None.
Utilized port
None.
Remarks
None.
fjsvwvbs.service
Function
Startup of daemons on Web-Based Admin View management server or monitoring nodes.
Effect if stopped
Settings and monitoring via the GUI provided by Web-Based Admin View will not be available.
Dependence with other Units
Requires
None.
Wants
None.
Before
fjsvwvcnf.service
After
network.target
Startup daemon
[For nodes working as primary or secondary management servers]
/opt/SMAW/SMAWcj2re/jre/bin/java
/opt/FJSVwvbs/etc/bin/wvAgent (2 processes)
/etc/opt/FJSVwvfrm/sbin/wvClEventd (0 to 2 processes)
/etc/opt/FJSVwvfrm/sbin/wvFaultEventd (0 to 2 processes)
[For nodes other than those described above]
/opt/FJSVwvbs/etc/bin/wvAgent (2 processes)
/etc/opt/FJSVwvfrm/sbin/wvClEventd (0 to 2 processes)
/etc/opt/FJSVwvfrm/sbin/wvFaultEventd (0 to 2 processes)
Utilized port
- 486 -
Port Protocol Send/ Network Target Communication target
Receive
Port Target
9797 (*3) TCP s, r Administrative Administrative ANY Local and remote
LAN server (*5) nodes
9796 (*4) UDP s, r Administrative Administrative ANY Local and remote
LAN server (*5) nodes
Remarks
(*1) No. 9799 is set to support the service name "fjwv_c."
(*2) No. 9798 is set to support the service name "fjwv_s."
(*3) No. 9797 is set to support the service name "fjwv_n."
(*4) No. 9796 is set to support the service name "fjwv_g."
(*5) Including concurrent use with cluster nodes.
(*6) PC
fjsvwvcnf.service
Function
WWW server for sending Java applets, Java classes, and HTML contents to clients.
Effect if stopped
Settings and monitoring via the GUI provided by Web-Based Admin View will not be available.
Dependence with other Units
Requires
fjsvwvbs.service
Wants
None.
Before
None.
After
fjsvwvbs.service
Startup daemon
/opt/FJSVwvcnf/bin/wvcnfd
Utilized port
Remarks
(*1) No. 8081 is set to support the service name "fjwv-h."
(*2) Including concurrent use with cluster nodes.
(*3) PC
For wvcnfd of the Web-Based Admin View process, there is an additional child process of the same name while processing a request
from a client. This process, however, terminates immediately after processing the request.
- 487 -
fjsvgfsfsrm.service
Function
Startup control for monitoring facility of GFS shared file system, mount control for GFS shared file system.
Effect if stopped
Functions of GFS shared file system cannot be used.
Dependence with other Units
Requires
None.
Wants
None.
Before
smawrrms.service
After
fjsvclctrl.service
fjsvclrmgr2.service
WantedBy
multi-user.target
Startup daemon
/usr/lib/fs/sfcfs/sfcpncd
/usr/lib/fs/sfcfs/sfcprmd
/usr/lib/fs/sfcfs/sfchnsd
/usr/lib/fs/sfcfs/sfcfrmd
/usr/lib/fs/sfcfs/sfcfsd
/usr/lib/fs/sfcfs/sfcfsmg
Utilized ports
Remarks
(*1) No. 9300 is set to support the service name "sfcfsrm."
(*2) From No. 9200 to No. 9263 are set to support the service names from sfcfs-1 to sfcfs-64.
fjsvgfsfsrm2.service
Function
Stop control for monitoring facility of the GFS shared file system, unmount control for GFS shared file system.
Effect if stopped
The GFS shared file system cannot be stopped normally when the system is stopped.
- 488 -
Dependence with other Units
Requires
None.
Wants
None.
Before
smawrrms.service
After
fjsvclctrl.service
fjsvclrmgr2.service
fjsvgfsfsrm.service
Startup daemon
None.
Utilized port
None.
Remarks
None.
fjsvhanet.service
Function
Starting the daemon, activating the virtual interface, and starting the line monitoring function.
Effect if stopped
LAN cannot be duplicated by using the Redundant Line Control function.
Dependence with other Units
Requires
None.
Wants
None.
Before
network.target
After
network.service
Startup daemon
/opt/FJSVhanet/etc/sbin/hanetctld
/opt/FJSVhanet/etc/sbin/hanetselect (*1) (*2)
/opt/FJSVhanet/etc/sbin/hanetpathmd (*2)
/opt/FJSVhanet/etc/sbin/hanetmond (*3)
Utilized port (*4)
- 489 -
Remarks
(*1) This daemon is started by hanetctld only when NIC switching mode or GS linkage mode is used. The start timing of the daemon
depends on the configuration.
(*2) Availability of startup and the number of processes rely on the configuration. Also, this may be suspended according to the
monitoring status.
(*3) This daemon is started only when the self-checking function is used.
(*4) The port is used only for the GS linkage mode.
fjsvsdx.service
Function
Basic part of GDS.
Effect if stopped
GDS functions cannot be used.
Dependence with other Units
Requires
None.
Wants
None.
Before
fjsvclctrl.service
fjsvsdx2.service
After
iscsi.service
iscsi-shutdown.service
target.service (*1)
Startup daemon
/usr/sbin/sdxlogd
/usr/sbin/sdxexd
/usr/sbin/sdxservd
Utilized port
None.
Remarks
(*1) The target.service has a dependency with other units only when the mirroring among servers is used.
fjsvsdx2.service
Function
Basic part of GDS.
Effect if stopped
GDS functions cannot be used.
Dependence with other Units
Requires
None.
Wants
None.
- 490 -
Before
fjsvsdxmon.service
After
fjsvsdx.service
fjsvclctrl.service
Startup daemon
None.
Utilized port
None.
Remarks
None.
fjsvsdxmon.service
Function
Monitoring GDS.
Effect if stopped
GDS cannot be restarted when it ends abnormally.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
fjsvsdx2.service
Startup daemon
/usr/sbin/sdxmond
Utilized port
None.
Remarks
None.
poffinhibit.service
Function
Initializing kdump shutdown agent.
Effect if stopped
Forcible stop by kdump shutdown agent is disabled.
Dependence with other Units
Requires
None.
- 491 -
Wants
None.
Before
None.
After
None.
Startup daemon
None.
Utilized port
None.
Remarks
Enabled only in physical environment.
smawcf.service
Function
Loading the CF driver and the CIP driver.
Effect if stopped
The cluster cannot be started.
Dependence with other Units
Requires
None.
Wants
fjsvcldev.service
Before
smawrrms.service
After
network.target
Startup daemon
/opt/SMAW/SMAWcf/bin/cfregd
Utilized port
None.
Remarks
None.
smawrhv-to.service
Function
Initializing RMS.
Effect if stopped
The RMS function cannot be used.
- 492 -
Dependence with other Units
Requires
None.
Wants
None.
Before
smawrrms.service
After
None.
Startup daemon
None.
Utilized port
None.
Remarks
None.
smawrrms.service
Function
Startup of RMS.
Effect if stopped
Operation cannot be monitored or controlled by the cluster. The operation will be stopped if this Unit is stopped during the operation.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
network.target
Startup daemon
/opt/SMAW/SMAWRrms/bin/bm
/opt/SMAW/SMAWRrms/bin/hvdet_xxxx
(Detectors and applications used in cluster applications will start.)
Utilized ports
- 493 -
Remarks
(*1) No. 9786 is set to support the service name "rmshb."
If the port number overlaps with another application, change the number used in the application to resolve the conflict.
smawsf.service
Function
Startup of Shutdown Facility.
Effect if stopped
Shutdown Facility cannot be used.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
smawcf.service
fjsvcldev.service
Startup daemon
/opt/SMAW/SMAWsf/bin/rcsd
Utilized ports
Remarks
These ports are used to prevent split brain.
(*1) No. 9382 is set to support the service name "sfadv."
(*2) This port is used when SA_ipmi is set in the shutdown facility on PRIMERGY.
(*3) This port is used when SA_blade is set in the shutdown facility on a blade server.
smawsf-sdtool-debugoff.service
Function
Operation of the shutdown facility.
Effect if stopped
None.
- 494 -
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
None.
Startup daemon
None.
Utilized port
None.
Remarks
This service operates only when the sdtool command is started and is always in the "inactive (dead)"state.
smawsf-sdtool-debugon.service
Function
Operation of the shutdown facility.
Effect if stopped
None.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
None.
Startup daemon
None.
Utilized port
None.
Remarks
This service operates only when the sdtool command is started and is always in the "inactive (dead)"state.
smawsfex.service
Function
Starting the configuration update service for SA.
- 495 -
Effect if stopped
None.
Dependence with other Units
Requires
None.
Wants
None.
Before
smawsf.service
After
smawcf.service
Startup daemon
None.
Utilized port
None.
Remarks
The configuration update service for SA works when the node is started only if it is activated by the sfsacfgupdate command.
smawsfmon.service
Function
Monitoring of shutdown facility.
Effect if stopped
If shutdown facility terminates abnormally, it will not be restarted.
Dependence with other Units
Requires
None.
Wants
None.
Before
None.
After
smawcf.service
smawsf.service
Startup daemon
/opt/SMAW/SMAWsf/bin/rcsd_monitor
Utilized port
None.
Remarks
None.
- 496 -
K.3 Necessary Services for PRIMECLUSTER to Operate
Necessary services other than PRIMECLUSTER for PRIMECLUSTER to operate are as follows:
- crond.service
- ipmi.service (*1)
- iscsi.service (*2)
- libvirtd.service (*3)
- ntpd.service, or chronyd.service
- radvd.service (*4)
- rsyslog.service
- target.service (*2)
(*1) The ipmi.service is necessary when SA_ipmi is set in the shutdown facility on PRIMERGY.
(*2) The iscsi.service and the target.service are necessary when using the mirroring among servers.
(*3) The libvirtd.service is necessary for the KVM environment.
(*4) The radvd.service is necessary only if Fast switching mode is used as the redundant line control method of GLS, and IPv6
communication is used.
- 497 -
Appendix L Using Firewall
When using Firewall, perform either of the following procedures because the cluster may not operate normally.
See
- For details on firewalld, see the man manual or other related documentation for the firewalld(1) or firewall-cmd(1) command.
- For details on iptables, see the man manual or other related documentation for the iptables(8) command.
- For details on ip6tables, see the man manual or other related documentation for the ip6tables(8) command.
- firewalld
The option of the firewall-cmd command which changes the settings of firewalld differs in the following two situations. One is for when
an interface which is not registered in the zone is added to "zone=trusted". The other is for when an interface which is registered in
another zone is changed to "zone=trusted".
Add interface cip0 which is not originally registered in the zone to zone=trusted
Change zone of interface cip0 which is originally registered in another zone to trusted
- iptables or ip6tables
Format: -A INPUT -i <input-interface> -j ACCEPT
-A OUTPUT -o <output-interface> -j ACCEPT
- firewalld
Allow communication to specific port number
- 498 -
Example: firewall-cmd --permanent --zone=public --add-port=9383/tcp
Format: firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p <tcp/udp> --sport
<source-port-number> -j ACCEPT
Example: firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p tcp --sport 9383
-j ACCEPT
IPv6
Format: firewall-cmd --permanent --direct --add-rule ipv6 filter INPUT 0 -p <tcp/udp> --sport
<source-port-number> -j ACCEPT
Example: firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p tcp --sport 9383
-j ACCEPT
- iptables or ip6tables
Format: -A <INPUT/OUTPUT> -p <tcp/udp> -m <tcp/udp> --dport <destination-port-number> -j ACCEPT
-A <INPUT/OUTPUT> -p <tcp/udp> -m <tcp/udp> --sport <destination-port-number> -j ACCEPT
Note
- If you changed the configuration of firewalld by the '--permanent' option of firewall-cmd, restart the firewalld service.
- If you changed the configuration of iptables, perform one of the following operations instead of restarting the iptables service.
- Restarting the cluster node
- Reflecting the change by iptables-restore
- If you changed the configuration of ip6tables, perform one of the following operations instead of restarting the ip6tables service.
- Restarting the cluster node
- Reflecting the change by ip6tables-restore
- When using the state module in iptables or ip6tables, configure settings to allow communications of PRIMECLUSTER before the state
module settings.
In the following example, communications of cluster interconnects are allowed before the state module settings.
- 499 -
Appendix M Cloning the Cluster System Environment
PRIMECLUSTER allows you to configure a new cluster system by cloning an already configured cluster system.
Note
- Make sure that the sizes of disks managed by GDS are the same at both copy source and copy destination.
- Before starting up the copy destination system, make sure that the NIC cables are disconnected or the copy source is stopped, or connect
from the copy source system to an isolated network, taking care that there are no IP addresses in duplicate with the copy source system.
- When you carry out cloning, you should follow the conditions of the cloning software/function to be used.
Here, the cloning procedure is explained with the cases of cloning a cluster system of standby operation and a two-node cluster in the
physical environment.
- 500 -
Procedure for Configuration by Cloning
The procedure for configuration by cloning in PRIMECLUSTER is as follows.
Note
If mirroring of the system disk using GDS is set in the cluster system of the copy source, system disk mirroring must be canceled temporarily
either in the source or in the destination system of copying.
This cloning method is particularly recommendable when there are multiple copy destination systems.
1. As described in "M.1 Preparation," cancel a system disk mirroring on the copy source.
2. After the procedure described in "M.2 Copying System Image Using the Cloning Function," mirror the system disk again on the
copy source system.
3. As described in "M.3 Changing Cluster System Settings," make the settings for the system disk mirroring on the copy destination
system.
1. After the procedure described in "M.2 Copying System Image Using the Cloning Function," restart OS using the installation CD
of the OS on the copy destination system in "M.3 Changing Cluster System Settings."
- 501 -
3. After booting from the system disk, make the settings for the system disk mirroring.
The description of the steps in the following execution example, is given for building a cluster system with the following configuration.
M.1 Preparation
This part describes the preliminary operation executed before cloning is applied.
1. Back up the management partition information of the GFS Shared File System from the copy source server.
Execute the following command on any running node.
# sfcgetconf _backup_file_
In the above example, sfcgetconf(8) generates a shell script named _backup_file_ in the current directory.
Note
Execute the above procedure if you are going to copy data from a shared disk.
#!/bin/sh
# This file is made by:
# sfcgetconf _backup_file_
# Thu May 26 09:23:04 2014
#---- fsid : 1 ----
# MDS primary (port) : host2 (sfcfs-1)
# MDS secondory (port) : host3 (sfcfs-1)
# MDS other :
# AC : host2, host3
- 502 -
# options :
# device : /dev/sfdsk/gfs01/dsk/volume01
sfcadm -m host2,host3 -g host2,host3 -p sfcfs-1,sfcfs-1 /dev/sfdsk/gfs01/dsk/volume01
...
[After change]
#!/bin/sh
# This file is made by:
# sfcgetconf _backup_file_
# Thu May 26 09:23:04 2014
#---- fsid : 1 ----
# MDS primary (port) : host4 (sfcfs-1)
# MDS secondory (port) : host5 (sfcfs-1)
# MDS other :
# AC : host4, host5
# options :
# device : /dev/sfdsk/gfs01/dsk/volume01
sfcadm -m host4,host5 -g host4,host5 -p sfcfs-1,sfcfs-1 /dev/sfdsk/gfs01/dsk/volume01
...
Note
If there are multiple file systems, there also are multiple lines in the execution procedure of the "sfcadm" command. Modify the node
names in all lines.
# sfcsetup -m
wait_bg
Note
This procedure is unnecessary when mirroring among servers is used.
1. Back up the local class and shared class object configurations for GDS on the copy source server.
Execute the following procedure on any node of the copy source server. If there are multiple classes, perform this operation for all
classes.
Example: The object configuration data of class Class1 is output to file /var/tmp/Class1.conf.
2. Save the GDS configuration data in a file on the copy source server. Output the class configuration data of all GDS classes to files..
Example: The data of class Class1 is output to the /var/tmp/Class1.info file
- 503 -
See
For procedure for canceling mirroring of system disks, see "PRIMECLUSTER Global Disk Services Configuration and Administration
Guide."
Note
This procedure is unnecessary if you carry out cloning while system disk mirroring is active.
Note
- Before starting up the copy destination system, make sure that the NIC cables are disconnected or the copy source is stopped, or connect
from the copy source system to an isolated network, taking care that there are no IP addresses in duplicate with the copy source system.
- The MAC addresses of the copy source system and destination system NICs are different. Depending on the cloning software/function
you are using, update the MAC addresses either by initializing the NIC settings when cloning, or by modifying the NIC settings
manually after cloning.
2. Copy the disks that are registered in a local class or a shared class of GDS.
The disks registered in local or shared classes of GDS can be copied by one of the following methods:
a. Copy the whole data of the disk including the GDS private slice.
b. Copy the data of the GDS private slice only.
c. Copy the data of the volume area only.
d. Do not copy any of the disk data.
Determine the copy range by the specifications of the cloning software or function you use for data copying (data of which area can
be copied) and by the need of copying the data from the volume area.
Note
When using the mirroring among servers, copy the local disk data used by the mirroring among servers in the range of a. or b.
- 504 -
See
For the setting up procedure, see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
Note
This procedure is unnecessary if you carry out cloning while system disk mirroring is active.
See
For the method of deleting it, see "Resolution" of "System cannot be booted. (Failure of all boot disks)" in "System Disk Abnormality [EFI]"
of "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
Note
- After the end of the rescue mode in this procedure, when you boot the system, start it up in single-user mode.
- This procedure is unnecessary if you carried out cloning with temporarily canceled system disk mirroring.
3. Change the primary management sever, secondary management server, httpip, and mip in the Web-Based Admin View.
1. Set the IP addresses of the primary management server and the secondary management server.
2. Set httpip.
3. Set mip.
4. Change the CF node name, CIP/SysNode name, and the cluster name.
- 505 -
Note
For the naming conventions (cluster name and CF node name), see "5.1.1 Setting Up CF and CIP."
1. Change the string of the CF node name within the CF node name and the CIP/SysNode name that are described in /etc/cip.cf.
[Before change]
fuji2 fuji2RMS:netmask:255.255.255.0
fuji3 fuji3RMS:netmask:255.255.255.0
[After change]
fuji4 fuji4RMS:netmask:255.255.255.0
fuji5 fuji5RMS:netmask:255.255.255.0
2. Change the string of the CF node name within the CIP/SysNode name that are described in /etc/hosts.
[Before change]
192.168.0.1 fuji2RMS
192.168.0.2 fuji3RMS
[After change]
192.168.0.3 fuji4RMS
192.168.0.4 fuji5RMS
nodename fuji2
clustername PRIMECLUSTER1
device eth2
device eth3
[After change]
nodename fuji4
clustername PRIMECLUSTER2
device eth2
device eth3
# mv /etc/opt/SMAW/SMAWsf/rcsd.cfg /etc/opt/SMAW/SMAWsf/rcsd.org
# /etc/opt/FJSVcluster/bin/clchgnodename
Note
This procedure is unnecessary when the GFS Shared File System is not being used.
Delete the information in the management partition of the GFS Shared File System. Execute the following command on all the nodes.
- 506 -
# rm /var/opt/FJSVsfcfs/sfcfsrm.conf
See
For details on the settings, see "PRIMECLUSTER Global Link Services Configuration and Administration Guide: Redundant Line
Control Function."
Note
The procedure depends on the data communication mode. The following procedure is for changing the IP address within the same
network as the configuration using the NIC switching mode.
[After change]
3. Modify the ifcfg-eth0 file to change the IP address of the primary physical interface.
For [primecl03]
[Before change]
DEVICE=eth0
BOOTPROTO=static
HOTPLUG=no
IPADDR=10.34.214.181
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
[After change]
DEVICE=eth0
BOOTPROTO=static
HOTPLUG=no
IPADDR=10.34.214.191
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
- 507 -
For [primecl04]
[Before change]
DEVICE=eth0
BOOTPROTO=static
HOTPLUG=no
IPADDR=10.34.214.182
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
[After change]
DEVICE=eth0
BOOTPROTO=static
HOTPLUG=no
IPADDR=10.34.214.192
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
For [primecl04]
See
For changing the IP address to a different network, the subnet mask of the virtual interface and the monitoring IP address of the HUB
monitoring function need to be changed. For details, see "PRIMECLUSTER Global Link Services Configuration and Administration
Guide: Redundant Line Control Function."
2. Delete all files named with the class name in the "/etc/opt/FJSVsdx/sysdb.d" directory.
# cd /etc/opt/FJSVsdx/sysdb.d
# rm Class1
- 508 -
4. Delete all the directories named with the class name in "/etc/opt/FJSVsdx/.devlabel" directory.
# cd /etc/opt/FJSVsdx/.devlabel
# rm -rf Class1
3. Change the IP address entries in /etc/tgt/targets.conf and the path of the by-id link.
Example: Change the IP address to "192.168.56.21" and "192.168.56.11", and the path of the by-id link to "/dev/disk/by-
id/scsi-3500000e111c56610".
[Before change]
[After change]
5. On all the nodes, change the IP addresses that are described in /etc/opt/FJSVsdx/.sdxnetmirror_ipaddr, which is the
configuration file of the mirroring among servers.
[Before change]
192.168.56.10
192.168.56.20
[After change]
- 509 -
192.168.56.11
192.168.56.21
[RHEL7]
The following message may be output, however, it does not affect the operation of GDS. No corrective action is required.
# cp /etc/target/saveconfig.json copy_destination_file_name
4. Change the IP address and the path of the by-id link described in the copy destination file explained in step 3 above.
Example: Change the IP address to "192.168.56.11", and the path of the by-id link to "/dev/disk/by-id/
scsi-3500000e111c56610".
[Before change]
{
"fabric_modules": [],
"storage_objects": [
{
...
"dev": "/dev/disk/by-id/scsi-3500000e111e68e00",
"name": "store1",
"plugin": " block ",
"readonly": false,
"write_back": false,
"wwn": "4a98bfb0-7d7e-4bc8-962c-0b3cf192b214"
}
...
"portals": [
{
"ip_address": "192.168.56.20",
"iser": false,
"port": 3260
}
],
...
[After change]
{
"fabric_modules": [],
"storage_objects": [
{
...
"dev": "/dev/disk/by-id/scsi-3500000e111c56610",
- 510 -
"name": "store1",
"plugin": "block",
"readonly": false,
"write_back": false,
"wwn": "4a98bfb0-7d7e-4bc8-962c-0b3cf192b214"
}
...
"portals": [
{
"ip_address": "192.168.56.21",
"iser": false,
"port": 3260
}
],
...
5. Apply the changes in the configuration information file of the iSCSI target modified in step 4 above to the target.
The following message may be output, however, it does not affect the operation of GDS. No corrective action is required.
# targetcli ls
[Output example]
o- / ............................................................................. [...]
o- backstores .................................................................. [...]
| o- block ...................................................... [Storage Objects: 1]
| | o- store1 [/dev/disk/by-id/scsi-3500000e111c56610 (16.0GiB) write-thru activated]
(1)
| o- fileio ..................................................... [Storage Objects: 0]
Point
Make sure to confirm the command output about the following item.
- Applying the changed path (Example of output(1),(2))
- Applying the changed IP address (Example of output(3))
- 511 -
7. Save the target information restored in step 5.
# targetctl save
8. On all the nodes, change the IP addresses that are described in /etc/opt/FJSVsdx/.sdxnetmirror_ipaddr, which is the
configuration file of the mirroring among servers.
[Before change]
192.168.56.10
192.168.56.20
[After change]
192.168.56.11
192.168.56.21
# hvsetenv HV_RCSTART
1 <- Check this value
- If "0" is set, the automatic startup of RMS has been restricted. Go to Step 11.
- If "1" is set, execute the following commands to restrict the automatic startup of RMS.
# hvsetenv HV_RCSTART 0
# hvsetenv HV_RCSTART
0 <- Check "0" is output
11. After completing above procedure on all the nodes of the copy destination, start up all the nodes in multi-user mode.
- 512 -
# cfconfig -g
fuji4 PRIMECLUSTER2 eth1 eth2
# ping fuji5RMS
If an error occurs in the above step a or b, check if the CF node name, CIP/SysNode name, and cluster name that are set in /etc/cip.cf, /
etc/default/cluster or /etc/hosts are correct.
If an error occurs, take the procedure below:
# /etc/opt/FJSVcluster/bin/clsetrsc -n PRIMECLUSTER2 1
# /etc/opt/FJSVcluster/bin/clsetrsc -n PRIMECLUSTER2 2
[Before change]
community-string public
management-blade-ip 10.20.30.200
fuji2 1 cycle
management-blade-ip 10.20.30.201
fuji3 1 cycle
[After change]
community-string private
management-blade-ip 10.20.30.202
fuji4 3 cycle
management-blade-ip 10.20.30.203
fuji5 3 cycle
- 513 -
2. For PRIMERGY, except for the Blade server, change the entries for the CF node names and the IP address for IPMI (BMC or
iRMC) in "/etc/opt/SMAW/SMAWsf/SA_ipmi.cfg".
Example: When changing the values as follows.
[Before change]
[After change]
# /etc/opt/FJSVcluster/bin/clmmbsetup -d fuji2
# /etc/opt/FJSVcluster/bin/clmmbsetup -d fuji3
c. Execute the "clmmbsetup -a" command and register the MMB information of the copy destination nodes.
For information on how to use the "clmmbsetup" command, see the "clmmbsetup" manual page.
# /etc/opt/FJSVcluster/bin/clmmbsetup -a mmb-user
Enter User's Password:
Re-enter User's Password:
For mmb-user and User's Password, enter the user and password created in Step a.
d. Check that the MMB asynchronous monitoring daemon has started on all the nodes.
# /etc/opt/FJSVcluster/bin/clmmbmonctl
If "The devmmbd daemon exists." is displayed, the MMB asynchronous monitoring daemon has started.
If "The devmmbd daemon does not exist." is displayed, the MMB asynchronous monitoring daemon has not started.
Execute the following command to start the MMB asynchronous monitoring daemon.
# /etc/opt/FJSVcluster/bin/clmmbmonctl start
- 514 -
4. For PRIMEQUEST 3000 series, execute the following procedure:
a. Change the setting of iRMC. For the setup instructions, see the following manual:
- "PRIMEQUEST 3000 Series Installation Manual"
You must create a user so that PRIMECLUSTER can link with iRMC. On all PRIMEQUEST 3000 instances that make
up the PRIMECLUSTER system, make sure to create a user to control iRMC.
- Both IPv4 Console Redirection Setup and IPv6 Console Redirection Setup
- PRIMEQUEST 3000 (except B model)
To create a user to control iRMC, use "set irmc user" command.
For how to use "set irmc user" command, refer to the following manual:
- "PRIMEQUEST 3000 Series Tool Reference (MMB)"
- PRIMEQUEST 3000 B model
To create a user to control iRMC, log in to iRMC Web Interface and create the user from "User Management" page
of "Settings" menu.
For how to use iRMC Web Interface, refer to the following manual page:
- "FUJITSU Server PRIMEQUEST 3000 Series Business Model iRMC S5 Web Interface"
b. Change the setting of MMB (except B model). For the setup instructions, see the following manual:
- "PRIMEQUEST 3000 Series Installation Manual"
You must create the RMCP user so that PRIMECLUSTER can link with the MMB units.
On all PRIMEQUEST 3000 instances that make up the PRIMECLUSTER system, make sure to create a user to control
the MMB units with RMCP. To create a user to control MMB with RMCP, log in to MMB Web-UI, and create the user
from "Remote Server Management" screen of "Network Configuration" menu. Create the user as shown below:
- [Privilege]: "Admin"
- [Status]: "Enabled"
For details about creating a user who uses RMCP to control the MMB units, see the following manual provided with the
unit:
- "PRIMEQUEST 3000 Series Operation and Management Manual"
# /etc/opt/FJSVcluster/bin/clirmcsetup -d fuji2
# /etc/opt/FJSVcluster/bin/clirmcsetup -d fuji3
d. Execute "clirmcsetup -a irmc" command and register the iRMC information of the copy destination node. For how to
use "clirmcsetup" command, refer to the manual page of clirmcsetup.
For irmc-user and User's Password, enter the user and password created in step a.
e. Execute "clirmcsetup -a mmb" command and register the MMB information of the copy destination node (except B
model). For how to use "clirmcsetup" command, refer to the manual page of clirmcsetup.
For mmb-user and User's Password, enter the user and password created in step b.
- 515 -
f. Check that the iRMC asynchronous monitoring daemon has started.
# /etc/opt/FJSVcluster/bin/clirmcmonctl
If "The devirmcd daemon exists." is displayed, the iRMC asynchronous monitoring daemon has started.
If "The devirmcd daemon does not exist." is displayed, the iRMC asynchronous monitoring daemon has not started.
Execute the following command to start the iRMC asynchronous monitoring daemon.
# /etc/opt/FJSVcluster/bin/clirmcmonctl start
# mv /etc/opt/SMAW/SMAWsf/rcsd.org /etc/opt/SMAW/SMAWsf/rcsd.cfg
6. Change the CF node names and the IP address of the administrative LAN (admIP) described in /etc/opt/SMAW/SMAWsf/
rcsd.cfg.
Example: When changing the values as follows
[Before change]
fuji2,weight=1,admIP=10.20.30.100:agent=SA_lkcd,timeout=25:SA_ipmi,timeout=25
fuji3,weight=1,admIP=10.20.30.101:agent=SA_lkcd,timeout=25:SA_ipmi,timeout=25
[After change]
fuji4,weight=1,admIP=10.20.30.102:agent=SA_lkcd,timeout=25:SA_ipmi,timeout=25
fuji5,weight=1,admIP=10.20.30.103:agent=SA_lkcd,timeout=25:SA_ipmi,timeout=25
7. When kdump is used to collect the crash dump in the PRIMERGY including the Blade server, set up the kdump shutdown
agent. Execute the following command on any one of the nodes.
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
panicinfo_setup: WARNING: /etc/panicinfo.conf file already exists.
(I)nitialize, (C)opy or (Q)uit (I/C/Q) ? <- Input I
# sdtool -b
# sdtool -s
By executing sdtool -s on all the nodes, the composition of the shutdown facility can be confirmed.
Note
Confirm the shutdown facility operates normally by the display result of the sdtool -s command.
There is a possibility that the mistake is found in the configuration setting of the agent or hardware when displayed as follows
though the setting of the shutdown facility is completed.
- 516 -
M.3.4 Restoring the GDS Configuration Information
Restore the GDS configuration information to the copy destination cluster system.
Note
When using the mirroring among servers, this procedure is unnecessary.
#/etc/opt/FJSVsdx/bin/sdxdcrsc -R -c Class1
# /etc/opt/FJSVcluster/bin/clgettree
...
SHD_DISK 35 SHD_Disk35 UNKNOWN
DISK 37 sdag UNKNOWN fuji4
DISK 153 sdw UNKNOWN fuji5
...
# /etc/opt/FJSVcluster/bin/cldelrsc -r 35
# /etc/opt/FJSVcluster/bin/cldelrsc -r 37
# /etc/opt/FJSVcluster/bin/cldelrsc -r 153
- 517 -
6. Change of physical disk names in the Excluded List of GDS
In environments using the Excluded List, if the physical disk names entered in the Excluded List are different in the copy source and
destination systems, change the physical disk names to those entered in the Excluded List for the copy destination system. Perform
this task on all the nodes.
For details on the Excluded List, see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
Note
After restoring with the "sdxconfig Restore" command, shared classes become local classes.
If the following message is displayed, take corrective measures with reference to "PRIMECLUSTER Global Disk Services
Configuration and Administration Guide"
ERROR: device: disk label is not matched with class class
# sdxvolume -F -c Class1
# /opt/SMAW/SMAWRrms/bin/hvgdsetup -a Class1
# sdxvolume -N -c gfs
# sdxvolume -N -c gfs01
- 518 -
Note
This procedure is required when using a GFS Shared File System on the copy source servers.
1. Reinitialize the management partition on the one node of the copy destination servers.
Example: Initializing the /dev/sfdsk/gfs/dsk/control file as the management partition.
3. On the one node of the copy destination servers, redo the settings for the startup method of the sfcfrmd daemon as recorded in "M.
1.1 Backing up the GFS Configuration Information" in Step 3.
Example: For setting the startup method of sfcfrmd daemon to wait_bg
# sfcsetup -m wait_bg
Note
This procedure is required when changing the startup method of the sfcfrmd daemon from the default value wait.
# sfcsetup -p
/dev/sfdsk/gfs/dsk/control
The registered node information can be confirmed by executing the "sfcsetup(8)" command without any option.
# sfcsetup
HOSTID CIPNAME MP_PATH
80000000 fuji4RMS yes
80000001 fuji5RMS yes
The startup method of the sfcfrmd daemon can be confirmed by executing the"sfcsetup(8)" command with the -m option.
# sfcsetup -m
wait_bg
5. Start the sfcfrmd daemon by executing the following command on all the nodes.
# sfcfrmstart
6. If you are not going to copy the data on the shared disk, create a GFS Shared File System.
See
For details on how to create a GFS Shared File System, see "Creating a file system," "Creating a file system," or "Selecting a
communication protocol" in "PRIMECLUSTER Global File Services Configuration and Administration Guide."
7. If you are going to copy the data on the shared disk, restore the information of the management partition.
Execute the shell script you edited in "M.1.1 Backing up the GFS Configuration Information" of the nodes on the copy destination
servers.
- 519 -
# sh _backup_file_
get other node information start ... end
Confirm that restoration of the management partition of GFS was successful by running the "sfcinfo(8)" command and the
"sfcrscinfo(8)" command.
# sfcinfo -a
/dev/sfdsk/gfs01/dsk/volume01:
FSID special size Type mount
1 /dev/sfdsk/gfs01/dsk/volume01(1e721) 14422 META -----
1 /dev/sfdsk/gfs01/dsk/volume01(1e721) 5116 LOG -----
1 /dev/sfdsk/gfs01/dsk/volume01(1e721) 95112 DATA -----
# sfcrscinfo -m -a
/dev/sfdsk/gfs01/dsk/volume01:
FSID MDS/AC STATE S-STATE RID-1 RID-2 RID-N hostname
1 MDS(P) stop - 0 0 0 host4
1 AC stop - 0 0 0 host4
1 MDS(S) stop - 0 0 0 host5
1 AC stop - 0 0 0 host5
See
For details on the setting procedure, see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
# hvw -n config
- 520 -
3. Select "APP1" from "Application selection menu".
Edit: Application selection menu (restricted):
1) HELP
2) QUIT
3) RETURN
4) OPTIONS
5) APP1
Application Name: 5
4. If you changed any IP addresses for GLS according to step 8 of "M.3.2 Setup in Single-User Mode," change the settings for
the takeover IP address for Gls resources.
1. Select "Gls:Global-Link-Services".
Settings of turnkey wizard "STANDBY" (APP1:consistent)
1) HELP 10) Enterprise-Postgres(-)
2) READONLY 11) Symfoware(-)
3) SAVE+EXIT 12) Procedure:SystemState3(-)
4) - 13) Procedure:SystemState2(-)
5) ApplicationName=APP1 14) Gls:Global-Link-Services(Gls_APP1)
6) Machines+Basics(app1) 15) IpAddresses(-)
7) CommandLines(-) 16) LocalFileSystems(-)
8) Procedure:Application(-) 17) Gds:Global-Disk-Services(-)
9) Procedure:BasicApplication(-)
Choose the setting to process: 14
Gls (Gls_APP1:consistent)
1) HELP 5) AdditionalTakeoverIpaddress
2) NO-SAVE+EXIT 6) TakeoverIpaddress[0]=N,10.34.214.185
3) SAVE+EXIT 7) (Timeout=60)
4) REMOVE+EXIT
Choose the setting to process: 6
1) HELP 4) FREECHOICE
2) RETURN 5) SELECTED(10.34.214.185)
3) NONE 6) 10.34.214.195
Choose a takeover IP address for Gls: 6
4. Confirm that the selected IP address has been set and then select "SAVE+RETURN".
Set a flag for takeover IP address: 10.34.214.195
Currently set:
1) HELP 5) AUTORECOVER(A)
2) -
3) SAVE+RETURN
4) DEFAULT
Choose additonally one of the flags: 3
5. Select "SAVE+EXIT" to save the settings of Gls resources and exit the menu.
Gls (Gls_APP1:consistent)
1) HELP 5) AdditionalTakeoverIpaddress
2) NO-SAVE+EXIT 6) TakeoverIpaddress[0]=N,10.34.214.195
3) SAVE+EXIT 7) (Timeout=60)
4) REMOVE+EXIT
Choose the setting to process: 3
- 521 -
5. Select "SAVE+EXIT" to return to the "Application selection menu." After that, select "RETURN" to return to the "Main
configuration menu."
3. Select "RETURN".
4. Select "Application-Edit".
5. Select "APP1".
6. Select "Machines+Basics(app1)".
7. Select "Machines[0]" and set the SysNode names based on the changed CF node name. After that, also select
"Machines[1]" simultaneously.
8. Select "SAVE+EXIT" > "SAVE+EXIT" > "RETURN" to return to the menu immediately after hvw command was
stared.
9. After selecting "RMS-RemoveMachine", select unnecessary SysNode names in sequence to let the SysNode names,
created based on the changed CF node name "Current set", only be displayed, and then select "RETURN".
7. Execute "Configuration-Generate" and "Configuration-Activate" in sequence and make sure that each operation ended
properly.
See
For details on the setting contents in the "hvipalias" file, see "6.7.3.6 Setting Up Takeover Network Resources."
<node name> : Change the value in this filed to the changed CF node name.
<takeover> : If you changed any host names associated with takeover IP addresses, change this
host name.
- 522 -
2. Changing the cluster application information
1. In order to change these settings with the RMS Wizard, execute the "hvw" command on any node.
# hvw -n config
4. Change the settings for the host names in the takeover network resources.
If, at this point, the screen does not display the "Adr_APP1" resource in lower-case characters, select "OPTIONS" and then
"ShowAllAvailableWizards". Set "Adr_APP1" to be displayed on the screen, and then select it.
2. When the "Ipaddresses and ipaliase" menu is displayed, select the "Interfaces[X]" in which the host name to be changed
is set.
- 523 -
8) Interfaces[0]=V:tussd2af
Choose the setting to process:
3. From the displayed menu, select the changed name of the host associated with the takeover IP address..
(All host names in the "/etc/hosts" file are displayed in the menu.)
3. Select "RETURN".
4. Select "Application-Edit".
5. Select "APP1".
6. Select "Machines+Basics(app1)".
7. Select "Machines[0]" and set the SysNode names based on the changed CF node name. After that, also select
"Machines[1]" simultaneously.
8. Select "SAVE+EXIT" > "SAVE+EXIT" > "RETURN" to return to the menu immediately after the "hvw" command was
stared.
9. After selecting "RMS-RemoveMachine", select unnecessary SysNode names in sequence to let the SysNode names,
created based on the changed CF node name "Current set", only be displayed, and then select "RETURN".
7. Execute the "Configuration-Generate" and "Configuration-Activate" in sequence to check that each operation ended properly.
8. Select "QUIT" to exit the "hvw" command.
3. Execute the following commands on all the nodes as required to set the automatic startup of RMS.
# hvsetenv HV_RCSTART 1
# hvsetenv HV_RCSTART
1 <- Check "1" is output.
- 524 -
2. Change the SysNode that configures a cluster application.
1. Select "RMS-CreateMachine".
2. After selecting "ALL-CF-HOSTS", make sure that all the SysNode names, created based on the changed CF node name
"Current set", are displayed.
At this point, the SysNode names, created based on the original CF node name, are also displayed simultaneously;
however, unnecessary SysNode names are deleted in Step 9.
3. Select "RETURN".
4. Select "Application-Edit".
5. Select "APP1".
6. Select "Machines+Basics(app1)".
7. Select "Machines[0]" and set the SysNode names based on the changed CF node name. After that, also select
"Machines[1]" simultaneously.
8. Select "SAVE+EXIT" > "SAVE+EXIT" > "RETURN" to return to the menu immediately after hvw command was
stared.
9. After selecting "RMS-RemoveMachine", select unnecessary SysNode names in sequence to let the SysNode names,
created based on the changed CF node name "Current set", only be displayed, and then select "RETURN".
3. Execute "Configuration-Generate" and "Configuration-Activate" in sequence and make sure that each operation ended
properly.
- 525 -
Appendix N Changes in Each Version
This chapter explains the changes made to the specifications of PRIMECLUSTER 4.5A10.
The changes are listed in the following table.
- 526 -
Category Item Version
Client Environment for Web- (Before change) PRIMECLUSTER 4.3A00 or earlier
Based Admin View
(After change) PRIMECLUSTER 4.5A10
Changes of the Behavior of CF (Before change) PRIMECLUSTER 4.3A00 or earlier
Startup
(After change) PRIMECLUSTER 4.5A10
HV_CONNECT_TIMEOUT (Before change) PRIMECLUSTER 4.3A00 or earlier
(After change) PRIMECLUSTER 4.5A10
Changes of the ports used by (Before change) PRIMECLUSTER 4.3A10 or earlier
RMS
(After change) PRIMECLUSTER 4.5A10
Configuring the IPMI (Before change) PRIMECLUSTER 4.2A00 or later - 4.3A20 or
shutdown agent earlier
(After change) PRIMECLUSTER 4.5A10
Changing the port number (Before change) PRIMECLUSTER 4.3A20 or earlier
used by the shutdown facility
(After change) PRIMECLUSTER 4.5A10
Setting up the Host OS failover (Before change) PRIMECLUSTER 4.3A10 or later - 4.3A40 or
function when using it in the earlier
PRIMEQUEST KVM
(After change) PRIMECLUSTER 4.5A10
environment
Changes of the target node to (Before change) PRIMECLUSTER 4.3A20 or earlier
forcibly shut down when a
(After change) PRIMECLUSTER 4.5A10
heartbeat failure occurs
Displaying Fault Traces of (Before change) PRIMECLUSTER 4.3A30 or earlier
Resources
(After change) PRIMECLUSTER 4.5A10
Change of /etc/cip.cf file (Before change) PRIMECLUSTER 4.3A30 or earlier
(After change) PRIMECLUSTER 4.5A10
Changes in CF over IP setting (Before change) PRIMECLUSTER 4.3A40 or earlier
window of CF Wizard
(After change) PRIMECLUSTER 4.5A10
Setting up the migration (Before change) PRIMECLUSTER 4.3A40
function when using it in KVM
(After change) PRIMECLUSTER 4.5A10
environment
Changing "turnkey wizard (Before change) PRIMECLUSTER 4.1A40 or later - 4.5A00 or
"STANDBY"" of hvw earlier
command
(After change) PRIMECLUSTER 4.5A10
Incompatible Changes off the RMS message (Before change) PRIMECLUSTER 4.3A00 or earlier
message
(After change) PRIMECLUSTER 4.5A10
Changes off the importance of (Before change) PRIMECLUSTER 4.3A00 or earlier
the message in the RMS wizard
(After change) PRIMECLUSTER 4.5A10
Changes of RMS console (Before change) PRIMECLUSTER 4.3A20 or earlier
message
(After change) PRIMECLUSTER 4.5A10
Changes off the response (Before change) PRIMECLUSTER 4.3A20 or earlier
message for the operator
(After change) PRIMECLUSTER 4.5A10
intervention request
- 527 -
N.1 Changes in PRIMECLUSTER 4.5A10 from 4.0A20
Incompatible commands
The following commands of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.0A20.
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.0A20.
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.0A20.
- 528 -
N.1.1 clgettree(1) command
Details on incompatibilities
Cluster class resource names, which are output with the "clgettree(1)" command, are changed.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
A fixed string "cluster" is displayed when the resource management facility is configured.
After upgrading [PRIMECLUSTER 4.5A10]
The cluster class uses the same name as the CF cluster when the resource management facility is configured.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
StateDetails information is not displayed.
After upgrading [PRIMECLUSTER 4.5A10]
StateDetails information is displayed.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
Symbolic links of the ciptool commands are made for /usr/bin.
After upgrading [PRIMECLUSTER 4.5A10]
Symbolic links of the ciptool commands are not made for /usr/bin.
Note
Specify /opt/SMAW/SMAWcf/bin/ciptool by full passing when you use the ciptool command.
- 529 -
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
- 530 -
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
Read through "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
- 531 -
Details on incompatibilities
"SF node weight" is added as a new setting item for the shutdown facility of PRIMECLUSTER 4.5A10. For the same survival priority as
PRIMECLUSTER 4.0A20, specify 1 in "SF node weight" for all the nodes. For details, see "5.1.2 Setting up the Shutdown Facility."
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
Survival priority is determined by "ShutdownPriority of userApplication."
After upgrading [PRIMECLUSTER 4.5A10]
Survival priority is determined by "ShutdownPriority of userApplication" and "SF node weight."
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
If a resource for support hot-standby operation is available, a standby state transition will be made.
After upgrading [PRIMECLUSTER 4.5A10]
A standby state transition will be performed only when a resource to support hot-standby operation is available and "ClearFaultRequest|
StartUp|SwitchRequest" is set to the StandbyTransitions attribute.
Note
None.
- Application-Create
- Application-Edit
- Application-Remove
- Configuration-Activate
To change the cluster application, you have to stop RMS before you execute the "hvw" command. This is the same with PRIMECLUSTER
4.0A20.
- 532 -
If you execute the "hvw" command after stopping RMS, the same menus as PRIMECLUSTER 4.0A20 will be displayed.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
If you execute the "hvw" command while RMS is running, the following menus will be displayed.
- Application-Create
- Application-Edit
- Application-Remove
- Configuration-Activate
After upgrading [PRIMECLUSTER 4.5A10]
If you execute the "hvw" command while RMS is running, the following menus will not be displayed.
- Application-Create
- Application-Edit
- Application-Remove
- Configuration-Activate
Note
To change the cluster application, you need to stop RMS before you execute the "hvw" command. This is the same with PRIMECLUSTER
4.0A20.
For details on changing the cluster application, see "10.3 Changing the Cluster Configuration."
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
If the configuration of the node is invalid, "UNCONFIGURED" or "UNKNOWN" will be displayed as the state of the local node on the
main CF table.
After upgrading [PRIMECLUSTER 4.5A10]
If the node configuration is invalid, "INVALID" will be displayed as the state of the local node on the main CF table.
Note
None.
- 533 -
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
The state of SysNode not joining the cluster is shown as Offline.
After upgrading [PRIMECLUSTER 4.5A10]
The state of SysNode not joining the cluster is shown as Faulted.
See "StateDetails" information displayed by hvdisp (1M) command or RMS main window to identify whether Faulted state is due to
not joining the cluster or abnormal shutdown of the node by a panic, for example.
State StateDetails
When a node is not joining the cluster Faulted Shutdown
Abnormal shutdown by panic. etc Faulted Killed
Note
None.
Changes
1. Creating a userApplication for standby operation
Before upgrading [PRIMECLUSTER 4.0A20]
To create a userApplication for standby operation, select "CRM" from the "Application type selection menu."
- 534 -
2. Creating a userApplication for scalable operation
Before upgrading [PRIMECLUSTER 4.0A20]
To create a userApplication for scalable operation, select "Controller" from the "Application type selection menu."
- 535 -
3. Creating a procedure resource
Before upgrading [PRIMECLUSTER 4.0A20]
To create or change a procedure resource, select "CRM" from the "turnkey wizard "CRM"" menu, and then select the resource class
name.
Note: Shown below is an example of registering a procedure resource of the BasicApplication class to a userApplication.
- 536 -
After upgrading [PRIMECLUSTER 4.5A10]
The "turnkey wizard "CRM"" menu is not displayed.
To create or change a procedure resource, select "Procedure:resource-class-name" from the "turnkey wizard "STANDBY"" menu.
Note: Shown below is an example of registering a procedure resource of the BasicApplication class to a userApplication.
- 537 -
5. Changing the priority in a resource class of a procedure resource.
Before upgrading [PRIMECLUSTER 4.0A20]
To change the priority in a resource class, select "Priority[0]=priority."
- 538 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
The mount point was defined in /etc/fstab.
After upgrading [PRIMECLUSTER 4.5A10]
It is necessary to define the mount point in /etc/fstab.pcl.
For details, see "6.7.3.2 Setting Up Fsystem Resources."
Note
None
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
Linux(R) is supported as a client environment for Web-Based Admin View.
After upgrading [PRIMECLUSTER 4.5A10]
Linux(R) is not supported as a client environment for Web-Based Admin View.
- 539 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
CF does not start unless all of the network interfaces for the cluster interconnects are recognized.
After upgrading [PRIMECLUSTER 4.5A10]
CF starts if at least one of the network interfaces for the cluster interconnects is recognized.
Note
If there are any network interfaces that are not recognized on CF startup, the following message appears:
CF: <NIC>: device not found.
<NIC> will be the name of the network interface (e.g. eth0).
This message is also available in 4.0A20.
N.1.18 HV_CONNECT_TIMEOUT
Details on incompatibilities
The default value of the RMS local environment variables HV_CONNECT_TIMEOUT is changed.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
The default value of HV_CONNECT_TIMEOUT is 5 (seconds).
After upgrading [PRIMECLUSTER 4.5A10]
The default value of HV_CONNECT_TIMEOUT is 30 (seconds).
Note
For details on HV_CONNECT_TIMEOUT, see "E.3 Local environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
- 540 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
Port number: The port number "2316" is used.
Note
None.
N.1.21 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
- 541 -
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Fault Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung.
- 542 -
After upgrading [PRIMECLUSTER 4.5A10]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung. An operator intervention is required.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
WARNING: cannot grab mount lock for dostat() check_getbdev(), returning previous state
After upgrading [PRIMECLUSTER 4.5A10]
NOTICE: cannot grab mount lock for dostat() check_getbdev(), returning previous state
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
- 543 -
N.1.28 Changes of the response message for the operator intervention
request
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption.
You should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster,
manually shutdown any nodes where it is not started and then perform it.For a forced online,
there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Changes
Before upgrading [PRIMECLUSTER 4.0A20]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource".
The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption.
You should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource".
The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster,
manually shutdown any nodes where it is not started and then perform it.
For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
- 544 -
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.1A20.
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.1A20.
- 545 -
N.2.1 clgettree(1) command
Details on incompatibilities
Cluster class resource names, which are output with the "clgettree(1)" command, are changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
A fixed string "cluster" is displayed when the resource management facility is configured.
After upgrading [PRIMECLUSTER 4.5A10]
The cluster class uses the same name as the CF cluster when the resource management facility is configured.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
StateDetails information is not displayed.
After upgrading [PRIMECLUSTER 4.5A10]
StateDetails information is displayed.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
Symbolic links of the ciptool commands are made for /usr/bin.
After upgrading [PRIMECLUSTER 4.5A10]
Symbolic links of the ciptool commands are not made for /usr/bin.
Note
Specify /opt/SMAW/SMAWcf/bin/ciptool by full passing when you use the ciptool command.
- 546 -
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
- 547 -
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
- 548 -
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
- 549 -
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The nodes that are separated from the cluster show Offline state.
After upgrading [PRIMECLUSTER 4.5A10]
The nodes that are separated from the cluster show Faulted state.
See details on the state of the nodes (the value of the StateDetails attribute) that are displayed by hvdisp (1M) command or output on
the RMS main window to identify if Faulted state is due to the nodes being separated from the cluster or due to an abnormal shutdown
of the nodes by a panic or other errors.
State StateDetails
When a node is not joining the cluster Faulted Shutdown
Abnormal shutdown by panic. etc Faulted Killed
Note
None.
Changes
1. Creating a userApplication for standby operation
Before upgrading [PRIMECLUSTER 4.1A20]
To create a userApplication for standby operation, select "CRM" from the "Application type selection menu."
- 550 -
2. Creating a userApplication for scalable operation
Before upgrading [PRIMECLUSTER 4.1A20]
To create a userApplication for scalable operation, select "Controller" from the "Application type selection menu."
- 551 -
3. Creating a procedure resource
Before upgrading [PRIMECLUSTER 4.1A20]
To create or change a procedure resource, select "CRM" from the "turnkey wizard "CRM"" menu, and then select the resource class
name.
Note: Shown below is an example of registering a procedure resource of the BasicApplication class to a userApplication.
- 552 -
After upgrading [PRIMECLUSTER 4.5A10]
The "turnkey wizard "CRM"" menu is not displayed.
To create or change a procedure resource, select "Procedure:resource-class-name" from the "turnkey wizard "STANDBY"" menu.
Note: Shown below is an example of registering a procedure resource of the BasicApplication class to a userApplication.
- 553 -
5. Changing the priority in a resource class of a procedure resource.
Before upgrading [PRIMECLUSTER 4.1A20]
To change the priority in a resource class, select "Priority[0]=priority."
- 554 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The mount point was defined in /etc/fstab.
After upgrading [PRIMECLUSTER 4.5A10]
It is necessary to define the mount point in /etc/fstab.pcl.
For details, see "6.7.3.2 Setting Up Fsystem Resources."
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
Linux(R) is supported as a client environment for Web-Based Admin View.
After upgrading [PRIMECLUSTER 4.5A10]
Linux(R) is not supported as a client environment for Web-Based Admin View.
- 555 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
CF does not start unless all of the network interfaces for the cluster interconnects are recognized.
After upgrading [PRIMECLUSTER 4.5A10]
CF starts if at least one of the network interfaces for the cluster interconnects is recognized.
Note
If there are any network interfaces that are not recognized on CF startup, the following message appears:
CF: <NIC>: device not found.
<NIC> will be the name of the network interface (e.g. eth0).
This message is also available in 4.1A20.
N.2.15 HV_CONNECT_TIMEOUT
Details on incompatibilities
The default value of the RMS local environment variables HV_CONNECT_TIMEOUT is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The default value of HV_CONNECT_TIMEOUT is 5 (seconds).
After upgrading [PRIMECLUSTER 4.5A10]
The default value of HV_CONNECT_TIMEOUT is 30 (seconds).
Note
For details on HV_CONNECT_TIMEOUT, see "E.3 Local environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
- 556 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
Port number: The port number "2316" is used.
Note
None.
N.2.18 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
- 557 -
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Fault Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung.
- 558 -
After upgrading [PRIMECLUSTER 4.5A10]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung. An operator intervention is required.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
WARNING: cannot grab mount lock for dostat() check_getbdev(), returning previous state
After upgrading [PRIMECLUSTER 4.5A10]
NOTICE: cannot grab mount lock for dostat() check_getbdev(), returning previous state
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
- 559 -
N.2.25 Changes of the response message for the operator intervention
request
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
- 560 -
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.1A30.
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.1A30.
- 561 -
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
Symbolic links of the ciptool commands are made for /usr/bin.
After upgrading [PRIMECLUSTER 4.5A10]
Symbolic links of the ciptool commands are not made for /usr/bin.
Note
Specify /opt/SMAW/SMAWcf/bin/ciptool by full passing when you use the ciptool command.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
- 562 -
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
- 563 -
Message No Message overview
2700 Recovering from a resource failure
2701 Recovering from a node failure
6750 Resource failure
6751 Node failure
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
- 564 -
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
1. Creating a userApplication for standby operation
Before upgrading [PRIMECLUSTER 4.1A30]
To create a userApplication for standby operation, select "CRM" from the "Application type selection menu."
- 565 -
2. Creating a userApplication for scalable operation
Before upgrading [PRIMECLUSTER 4.1A30]
To create a userApplication for scalable operation, select "Controller" from the "Application type selection menu."
- 566 -
3. Creating a procedure resource
Before upgrading [PRIMECLUSTER 4.1A30]
To create or change a procedure resource, select "CRM" from the "turnkey wizard "CRM"" menu, and then select the resource class
name.
Note: Shown below is an example of registering a procedure resource of the BasicApplication class to a userApplication.
- 567 -
After upgrading [PRIMECLUSTER 4.5A10]
The "turnkey wizard "CRM"" menu is not displayed.
To create or change a procedure resource, select "Procedure:resource-class-name" from the "turnkey wizard "STANDBY"" menu.
Note: Shown below is an example of registering a procedure resource of the BasicApplication class to a userApplication.
- 568 -
5. Changing the priority in a resource class of a procedure resource.
Before upgrading [PRIMECLUSTER 4.1A30]
To change the priority in a resource class, select "Priority[0]=priority."
- 569 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
The mount point was defined in /etc/fstab.
After upgrading [PRIMECLUSTER 4.5A10]
It is necessary to define the mount point in /etc/fstab.pcl.
For details, see "6.7.3.2 Setting Up Fsystem Resources."
Note
None.
Details on incompatibilities 2
The dedicated monitoring disk area is not required when using a shared disk device.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
Note
None.
- 570 -
N.3.10 Client Environment for Web-Based Admin View
Details on incompatibilities
Linux(R) is not supported as a client environment for Web-Based Admin View by PRIMECLUSTER 4.5A10.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
Linux(R) is supported as a client environment for Web-Based Admin View.
After upgrading [PRIMECLUSTER 4.5A10]
Linux(R) is not supported as a client environment for Web-Based Admin View.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
CF does not start unless all of the network interfaces for the cluster interconnects are recognized.
After upgrading [PRIMECLUSTER 4.5A10]
CF starts if at least one of the network interfaces for the cluster interconnects is recognized.
Note
If there are any network interfaces that are not recognized on CF startup, the following message appears:
CF: <NIC>: device not found.
<NIC> will be the name of the network interface (e.g. eth0).
This message is also available in 4.1A30.
N.3.12 HV_CONNECT_TIMEOUT
Details on incompatibilities
The default value of the RMS local environment variables HV_CONNECT_TIMEOUT is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
The default value of HV_CONNECT_TIMEOUT is 5 (seconds).
After upgrading [PRIMECLUSTER 4.5A10]
The default value of HV_CONNECT_TIMEOUT is 30 (seconds).
Note
For details on HV_CONNECT_TIMEOUT, see "E.3 Local environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
- 571 -
N.3.13 Changes of the ports used by RMS
Details on incompatibilities
The port used by RMS is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
Port number: The port number "2316" is used.
Note
None.
N.3.15 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
- 572 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
- 573 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung.
After upgrading [PRIMECLUSTER 4.5A10]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung. An operator intervention is required.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
WARNING: cannot grab mount lock for dostat() check_getbdev(), returning previous state
After upgrading [PRIMECLUSTER 4.5A10]
NOTICE: cannot grab mount lock for dostat() check_getbdev(), returning previous state
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
- 574 -
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A30]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
- 575 -
Changes
Before upgrading [PRIMECLUSTER 4.1A20]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.1A40.
- 576 -
- N.4.17 Changing "turnkey wizard "STANDBY"" of hvw command
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.1A40.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
- 577 -
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
- 578 -
Message No Message overview
2700 Recovering from a resource failure
2701 Recovering from a node failure
6750 Resource failure
6751 Node failure
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The operator intervention request, is always enabled.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request, is enabled only when the AppWatch parameter is set to ON with clsetparam. The default value of
AppWatch set when the cluster was installed is set to OFF, and the operator intervention request, will not work with this default value.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
- 579 -
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restartt all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The mount point was defined in /etc/fstab.
After upgrading [PRIMECLUSTER 4.5A10]
It is necessary to define the mount point in /etc/fstab.pcl.
For details, see "6.7.3.2 Setting Up Fsystem Resources."
Note
None.
Details on incompatibilities 2
The dedicated monitoring disk area is not required when using a shared disk device.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
Linux(R) is supported as a client environment for Web-Based Admin View.
After upgrading [PRIMECLUSTER 4.5A10]
Linux(R) is not supported as a client environment for Web-Based Admin View.
Note
None.
- 580 -
N.4.9 Changes of the Behavior of CF Startup
Details on incompatibilities
CF starts even if some of the network interfaces for the cluster interconnects are not recognized.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
CF does not start unless all of the network interfaces for the cluster interconnects are recognized.
After upgrading [PRIMECLUSTER 4.5A10]
CF starts if at least one of the network interfaces for the cluster interconnects is recognized.
Note
If there are any network interfaces that are not recognized on CF startup, the following message appears:
CF: <NIC>: device not found.
<NIC> will be the name of the network interface (e.g. eth0).
This message is also available in 4.1A40.
N.4.10 HV_CONNECT_TIMEOUT
Details on incompatibilities
The default value of the RMS local environment variables HV_CONNECT_TIMEOUT is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The default value of HV_CONNECT_TIMEOUT is 5 (seconds).
After upgrading [PRIMECLUSTER 4.5A10]
The default value of HV_CONNECT_TIMEOUT is 30 (seconds).
Note
For details on HV_CONNECT_TIMEOUT, see "E.3 Local environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
Note
None.
- 581 -
N.4.12 Changes of the port number used by the shutdown facility
Details on incompatibilities
The port number used by the shutdown facility is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
Port number: The port number "2316" is used.
Note
None.
N.4.13 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
- 582 -
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
- 583 -
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung.
After upgrading [PRIMECLUSTER 4.5A10]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung. An operator intervention is required.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
WARNING: cannot grab mount lock for dostat() check_getbdev(), returning previous state
After upgrading [PRIMECLUSTER 4.5A10]
NOTICE: cannot grab mount lock for dostat() check_getbdev(), returning previous state
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
- 584 -
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
- 585 -
Changes
Before upgrading [PRIMECLUSTER 4.1A40]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.2A00.
- 586 -
- N.5.16 Change of /etc/cip.cf file
- N.5.17 Changes in CF over IP setting window of CF Wizard
- N.5.18 Changing "turnkey wizard "STANDBY"" of hvw command
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.2A00.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
- 587 -
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
- 588 -
Details on incompatibilities
If a failure occurs in the resource or if the resource recovers from a failure, the failure or recovery of the resource can be posted by sending
the message shown below to syslogd. The default setting at installation is that notification of a resource failure or recovery is posted with
PRIMECLUSTER 4.5A10. For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
- 589 -
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The mount point was defined in /etc/fstab.
After upgrading [PRIMECLUSTER 4.5A10]
It is necessary to define the mount point in /etc/fstab.pcl.
For details, see "6.7.3.2 Setting Up Fsystem Resources."
Note
None.
Details on incompatibilities 2
The dedicated monitoring disk area is not required when using a shared disk device.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
Note
None.
- 590 -
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
Linux(R) is supported as a client environment for Web-Based Admin View.
After upgrading [PRIMECLUSTER 4.5A10]
Linux(R) is not supported as a client environment for Web-Based Admin View.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
CF does not start unless all of the network interfaces for the cluster interconnects are recognized.
After upgrading [PRIMECLUSTER 4.5A10]
CF starts if at least one of the network interfaces for the cluster interconnects is recognized.
Note
If there are any network interfaces that are not recognized on CF startup, the following message appears:
CF: <NIC>: device not found.
<NIC> will be the name of the network interface (e.g. eth0).
This message is also available in 4.2A00.
N.5.10 HV_CONNECT_TIMEOUT
Details on incompatibilities
The default value of the RMS local environment variables HV_CONNECT_TIMEOUT is changed.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The default value of HV_CONNECT_TIMEOUT is 5 (seconds).
After upgrading [PRIMECLUSTER 4.5A10]
The default value of HV_CONNECT_TIMEOUT is 30 (seconds).
Note
For details on HV_CONNECT_TIMEOUT, see "E.3 Local environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
- 591 -
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The following settings were unnecessary to use the IPMI shutdown agent.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
Port number: The port number "2316" is used.
Note
None.
- 592 -
N.5.14 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
- 593 -
N.5.17 Changes in CF over IP setting window of CF Wizard
Details on incompatibilities
From PRIMECLUSTER 4.5A10, "Auto Subnet Grouping" checkbox is deleted from CF over IP setting window. Instead, "Use Network
Broadcast" checkbox is newly added.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung.
After upgrading [PRIMECLUSTER 4.5A10]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung. An operator intervention is required.
Note
None.
- 594 -
N.5.20 Changes of the importance of the message in the RMS wizard
Details on incompatibilities
The importance of the following message in the RMS wizard has been changed.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
WARNING: cannot grab mount lock for dostat() check_getbdev(), returning previous state
After upgrading [PRIMECLUSTER 4.5A10]
NOTICE: cannot grab mount lock for dostat() check_getbdev(), returning previous state
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
- 595 -
Details on incompatibilities
Message No.1421 of the operator intervention request has changed.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
- 596 -
N.6 Changes in PRIMECLUSTER 4.5A10 from 4.2A30
Incompatible commands
The following commands of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.2A30.
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.2A30.
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.2A30.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
- 597 -
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
- 598 -
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
- 599 -
N.6.6 Operator Intervention Request
Details on incompatibilities 1
In the forced startup of a cluster application is issued, data corruption may occur if you start cluster applications when nodes without running
RMS exist in the cluster.
Therefore, to deal with issue, the function is added. This function forcibly shuts down the nodes without running RMS before forced start
the cluster application.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
- 600 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
Linux(R) is supported as a client environment for Web-Based Admin View.
After upgrading [PRIMECLUSTER 4.5A10]
Linux(R) is not supported as a client environment for Web-Based Admin View.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
CF does not start unless all of the network interfaces for the cluster interconnects are recognized.
After upgrading [PRIMECLUSTER 4.5A10]
CF starts if at least one of the network interfaces for the cluster interconnects is recognized.
Note
If there are any network interfaces that are not recognized on CF startup, the following message appears:
CF: <NIC>: device not found.
<NIC> will be the name of the network interface (e.g. eth0).
This message is also available in 4.2A30.
N.6.10 HV_CONNECT_TIMEOUT
Details on incompatibilities
The default value of the RMS local environment variables HV_CONNECT_TIMEOUT is changed.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
The default value of HV_CONNECT_TIMEOUT is 5 (seconds) in the RHEL-AS environment, and 30 (seconds) in the RHEL5
environment.
After upgrading [PRIMECLUSTER 4.5A10]
The default value of HV_CONNECT_TIMEOUT is 30 (seconds).
- 601 -
Note
There are no incompatibilities when upgrading PRIMECLUSTER from 4.2A30 for RHEL5 to 4.5A10.
For details on HV_CONNECT_TIMEOUT, see "E.3 Local environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
The following settings were unnecessary to use the IPMI shutdown agent.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
Port number: The port number "2316" is used.
- 602 -
sfadv 2316/udp # SMAWsf package
Note
None.
N.6.14 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
- 603 -
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
- 604 -
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung.
After upgrading [PRIMECLUSTER 4.5A10]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung. An operator intervention is required.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
WARNING: cannot grab mount lock for dostat() check_getbdev(), returning previous state
After upgrading [PRIMECLUSTER 4.5A10]
NOTICE: cannot grab mount lock for dostat() check_getbdev(), returning previous state
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
- 605 -
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.2A30]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
- 606 -
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.3A00.
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.3A00.
- 607 -
- N.7.21 Changes of RMS console message
- N.7.22 Changes of the response message for the operator intervention request
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
- 608 -
N.7.3 hvswitch command
Details on incompatibilities
In the forced startup (when using -f option) of a cluster application is issued, data corruption may occur if you start cluster applications when
nodes where RMS is not running exist in the cluster. Therefore, to deal with this issue, the function is added. This function forcibly shuts
down the nodes where RMS is not running before forced startup of cluster applications.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
- 609 -
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
- 610 -
N.7.7 Setting Up Fsystem Resources
Details on incompatibilities
The dedicated monitoring disk area is not required when using a shared disk device.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
Note
None
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
Linux(R) is supported as a client environment for Web-Based Admin View.
After upgrading [PRIMECLUSTER 4.5A10]
Linux(R) is not supported as a client environment for Web-Based Admin View.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
CF does not start unless all of the network interfaces for the cluster interconnects are recognized.
After upgrading [PRIMECLUSTER 4.5A10]
CF starts if at least one of the network interfaces for the cluster interconnects is recognized.
Note
If there are any network interfaces that are not recognized on CF startup, the following message appears:
CF: <NIC>: device not found.
<NIC> will be the name of the network interface (e.g. eth0).
This message is also available in 4.3A00.
- 611 -
N.7.10 HV_CONNECT_TIMEOUT
Details on incompatibilities
The default value of the RMS local environment variables HV_CONNECT_TIMEOUT is changed.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
The default value of HV_CONNECT_TIMEOUT is 5 (seconds) in the RHEL-AS environment, and 30 (seconds) in the RHEL5
environment.
After upgrading [PRIMECLUSTER 4.5A10]
The default value of HV_CONNECT_TIMEOUT is 30 (seconds).
Note
There are no incompatibilities when upgrading PRIMECLUSTER from 4.3A00 for RHEL5 to 4.5A10.
For details on HV_CONNECT_TIMEOUT, see "E.3 Local environment variables" in "PRIMECLUSTER Reliant Monitor Services (RMS)
with Wizard Tools Configuration and Administration Guide."
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
The following settings were unnecessary to use the IPMI shutdown agent.
- 612 -
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
Port number: The port number "2316" is used.
Note
None.
N.7.14 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
- 613 -
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
- 614 -
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung.
After upgrading [PRIMECLUSTER 4.5A10]
(SYS, 8): ERROR: RMS failed to shut down the host <host> via a Shutdown Facility, no further kill functionality is available.
The cluster is now hung. An operator intervention is required.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
WARNING: cannot grab mount lock for dostat() check_getbdev(), returning previous state
After upgrading [PRIMECLUSTER 4.5A10]
NOTICE: cannot grab mount lock for dostat() check_getbdev(), returning previous state
Note
None.
- 615 -
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A00]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
- 616 -
N.7.22.2 Message: 1423
Details on incompatibilities
Message No.1423 of the operator intervention request has changed.
Changes
Before upgrading [PRIMECLUSTER 4.2A00]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.3A10.
- 617 -
- N.8.13 Display of the resource fault trace
- N.8.14 Change of /etc/cip.cf file
- N.8.15 Changes in CF over IP setting window of CF Wizard
- N.8.16 Changing "turnkey wizard "STANDBY"" of hvw command
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.3A10.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
A fixed string "cluster" is displayed when the resource management facility is configured.
The number of characters displayed by "Agent" of "sdtool -s" is 14 characters (including spaces).
The number of characters displayed by "Admin IP" of "sdtool -C" is 16 characters (including spaces).
After upgrading [PRIMECLUSTER 4.5A10]
The cluster class uses the same name as the CF cluster when the resource management facility is configured.
The number of characters displayed by "Agent" of "sdtool -s" is 21 characters (including spaces).
When an IPv6 address is used for the administrative LAN of the shutdown facility, the number of characters displayed by "Admin IP"
of "sdtool -C" is 40 characters (including spaces). When an IPv4 address is used, the number of characters is not changed.
Note
None.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource. This is because the resource is started on multiple nodes at the same time.
- 618 -
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
- 619 -
Details on incompatibilities
If a failure occurs in the resource or if the resource recovers from a failure, the failure or recovery of the resource can be posted by sending
the message shown below to syslogd. The default setting at installation is that notification of a resource failure or recovery is posted with
PRIMECLUSTER 4.5A10. For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
- 620 -
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
Note
None
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
The port number "11111" is used.
After upgrading [PRIMECLUSTER 4.5A10]
The port number "11111" is not used.
Note
None.
- 621 -
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
The following settings were unnecessary to use the IPMI shutdown agent.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
Port number: The port number "2316" is used.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
When using the Host OS failover function in the PRIMEQUEST KVM environment, the shutdown facility was set on a guest OS (node).
After upgrading [PRIMECLUSTER 4.5A10]
When using the Host OS failover function in the PRIMEQUEST KVM environment, the setting of the shutdown facility is required not
only on the guest OS (node) but also on the Host OS (node). This will enable you to reduce the cluster failover time between guest OSes
if a failure occurs on the Host OS.
For details on the setting, see "5.1.2.6.6 Setting up the Host OS Failover Function to the Host OS (PRIMEQUEST only)."
- 622 -
Note
None.
N.8.12 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
- 623 -
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
- 624 -
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
- 625 -
Changes
Before upgrading [PRIMECLUSTER 4.3A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.3A20.
- 626 -
Incompatible messages
The following messages of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.3A20.
Point
A resource in a cluster application does not stop and may remain running because the RMS ends abnormally when the hvshut command
times out.
In this situation, data corruption may occur when RMS and cluster application with the resource is forcibly started on another node, if shared
disk is controlled by the resource.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default and the shutdown processing
of a resource by the hvshut command has not been completed in 900 (seconds), the command times out and then RMS ends abnormally.
The resource does not stop and remains running at this time.
After upgrading [PRIMECLUSTER 4.5A10]
In the environment where the environment variable RELIANT_SHUT_MIN_WAIT remains in default, the hvshut command does not
time out even when the shutdown processing of a resource by the command has not been completed.
Note
When using RMS, make sure to change this environment variable to suite the configuration setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
When using -f option, RMS performs forced startup of cluster applications even if nodes where RMS is not running exist in the cluster
and it may lead to data corruption.
After upgrading [PRIMECLUSTER 4.5A10]
In the use of -f option, when nodes where RMS is not running exist in the cluster, RMS performs the forced startup cluster applications
after forcibly shutting down the nodes for reducing the risk of data corruption. However, if RMS failed to the forced shutdown, the forced
startup of cluster applications are not performed.
- 627 -
Note
When using -f option, confirm "7.5.1 Notes on Switching a Cluster Application Forcibly " and then execute the command.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
- 628 -
Therefore, to deal with issue, the function is added. This function forcibly shuts down the nodes without running RMS before forced start
the cluster application.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
Note
None
- 629 -
Details on incompatibilities
The setting procedure to use the IPMI shutdown agent is added.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
The following settings were unnecessary to use the IPMI shutdown agent.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
Port number: The port number "2316" is used.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
When using the Host OS failover function in the PRIMEQUEST KVM environment, the shutdown facility was set on a guest OS (node).
- 630 -
After upgrading [PRIMECLUSTER 4.5A10]
When using the Host OS failover function in the PRIMEQUEST KVM environment, the setting of the shutdown facility is required not
only on the guest OS (node) but also on the Host OS (node). This will enable you to reduce the cluster failover time between guest OSes
if a failure occurs on the Host OS.
For details on the setting, see "5.1.2.6.6 Setting up the Host OS Failover Function to the Host OS (PRIMEQUEST only)."
Note
None.
N.9.10 Changes of the target node to forcibly shut down when a heartbeat
failure occurs
Details on incompatibilities
The selecting method of the target node, which is forcibly shut down when a heartbeat failure occurs by temporary causes such as the
overloaded, is changed.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility
determines the node to forcibly shut down according to the setup policy for survival priority.
After upgrading [PRIMECLUSTER 4.5A10]
If CF becomes temporarily disabled by the overloaded or other causes, and then a heartbeat failure occurs, the shutdown facility forcibly
stops the node on which CF cannot perform regardless of the setup policy for survival priority.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
- 631 -
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
- 632 -
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
Do you wish to proceed ? (default: no) [yes, no]:
After upgrading [PRIMECLUSTER 4.5A10]
The use of the -f (force) flag could cause your data to be corrupted and could cause your node to be killed. Do not continue if the result
of this forced command is not clear.
The use of force flag of hvswitch overrides the RMS internal security mechanism. In particular RMS does no longer prevent resources,
which have been marked as "ClusterExclusive", from coming Online on more than one host in the cluster. It is recommended to double
check the state of all affected resources before continuing.
IMPORTANT: This command may kill nodes on which RMS is not running in order to reduce the risk of data corruption!
Ensure that RMS is running on all other nodes. Or shut down OS of the node on which RMS is not running.
Do you wish to proceed ? (default: no) [yes, no]:
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1421 The userApplication "userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In
order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
- 633 -
N.9.16.2 Message: 1423
Details on incompatibilities
Message No.1423 of the operator intervention request has changed.
Changes
Before upgrading [PRIMECLUSTER 4.3A20]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Do you want to force the userApplication online on the SysNode "SysNode"?
Message No.: number
Do you want to do something? (yes/no)
Warning: Forcing a userApplication online ignores potential error conditions. Used improperly, it can result in data corruption. You
should not use it unless you are certain that the userApplication is not running anywhere in the cluster.
After upgrading [PRIMECLUSTER 4.5A10]
1423 On the SysNode "SysNode", the userApplication "userApplication" has the faulted resource "resource". The userApplication
"userApplication" did not start automatically because not all of the nodes where it can run are online.
Forcing the userApplication online on the SysNode "SysNode" is possible.
Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster, manually shutdown any nodes where
it is not started and then perform it. For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.
In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped.
Are you sure wish to force online? (no/yes)
Message No.: number
Note
For details, see the relevant message in "PRIMECLUSTER Messages."
Incompatible functions
The following functions of PRIMECLUSTER 4.5A10 are incompatible with PRIMECLUSTER 4.3A30.
- 634 -
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
The default work directory is /tmp.
After upgrading [PRIMECLUSTER 4.5A10]
The default work directory is /var/tmp.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
Notification of a resource failure or recovery will be not posted in the default setting of cluster installation.
The default value of AppWatch at cluster installation is OFF and notification of the resource failure or recovery will not be posted.
After upgrading [PRIMECLUSTER 4.5A10]
Notification of a resource failure or recovery will be posted in the default setting of cluster installation.
A resource failure or recovery will not be posted only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
In the forced startup of a cluster application is issued, even if the nodes without running RMS exist in the cluster and it may cause the
data corruption, forcibly starts the cluster application according to the user's operation.
- 635 -
After upgrading [PRIMECLUSTER 4.5A10]
For reducing the risk of data corruption in the forced startup of a cluster application is issued, forcibly starts the cluster application after
forcibly shuts down the nodes without running RMS.
Note
For details, see "4.2 Operator Intervention Messages" in "PRIMECLUSTER Messages."
Details on incompatibilities 2
With the default settings made when the cluster was installed, the operator intervention request is always enabled.
For details, see "5.2 Setting up Fault Resource Identification and Operator Intervention Request."
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
The operator intervention request will not work with the default setting at installation.
The default value of AppWatch set when the cluster was installed is set to OFF, and the operator intervention request will not work with
this default value.
After upgrading [PRIMECLUSTER 4.5A10]
The operator intervention request will work with the default setting at installation.
The operator intervention request, is disabled only when the AppWatch parameter is set to OFF with clsetparam.
Note
After you have changed the AppWatch parameter with clsetparam, you have to restart all the nodes to validate the setting.
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
To secure the dedicated monitoring disk area was required.
After upgrading [PRIMECLUSTER 4.5A10]
The dedicated monitoring disk area is not required to be registered to the userApplication as an Fsystem resource. But when migration
from earlier version, it is available to register the dedicated monitoring disk area as an Fsystem resource.
Note
None
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
Only the root user can log in to the guest OS via SSH.
- 636 -
After upgrading [PRIMECLUSTER 4.5A10]
The root user or any specified user can log in to the guest OS via SSH.
For details, see "5.1.2.6.6 Setting up the Host OS Failover Function to the Host OS (PRIMEQUEST only)."
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
After the Offline processing of the failed resource is completed, nothing is displayed in StateDetails of the failed resource object.
After upgrading [PRIMECLUSTER 4.5A10]
After the Offline processing of the failed resource is completed, "Faulted Occurred" is displayed in StateDetails of the failed resource
object.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
When setting IPv4 address, option specified for the setting command such as ifconfig can be specified for CIP interface.
After upgrading [PRIMECLUSTER 4.5A10]
When setting IPv4 address, only IP address and netmask value can be specified for CIP interface.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
- 637 -
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A30]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
- N.11.1 Setting up the Host OS failover function when using it in KVM environment
- N.11.2 Changes in CF over IP setting window of CF Wizard
- N.11.3 Setting up the migration function when using it in KVM environment
- N.11.4 Changing "turnkey wizard "STANDBY"" of hvw command
Changes
Before upgrading [PRIMECLUSTER 4.3A40]
Only the root user can log in to the guest OS via SSH.
After upgrading [PRIMECLUSTER 4.5A10]
The root user or any specified user can log in to the guest OS via SSH.
For details, see "5.1.2.6.6 Setting up the Host OS Failover Function to the Host OS (PRIMEQUEST only)."
Note
None.
- 638 -
N.11.2 Changes in CF over IP setting window of CF Wizard
Details on incompatibilities
From PRIMECLUSTER 4.5A10, "Auto Subnet Grouping" checkbox is deleted from CF over IP setting window. Instead, "Use Network
Broadcast" checkbox is newly added.
Changes
Before upgrading [PRIMECLUSTER 4.3A40]
You can select to use or not to use the auto subnet grouping by checking/unchecking "Auto Subnet Grouping" checkbox on CF over IP
setting window of CF Wizard.
After upgrading [PRIMECLUSTER 4.5A10]
You can select to use or not to use the network broadcast on CF over IP by checking/unchecking "Use Network Broadcast" checkbox
on CF over IP setting window of CF Wizard.
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A40]
Only the root user can log in to the guest OS via SSH.
After upgrading [PRIMECLUSTER 4.5A10]
The root user or any specified user can log in to the guest OS via SSH.
For details, see "G.2.2 Using the Host OS failover function."
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.3A40]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
- 639 -
N.12 Changes in PRIMECLUSTER 4.5A10 from 4.4A00
Incompatible functions
The following function of PRIMECLUSTER 4.5A10 is incompatible with PRIMECLUSTER 4.4A00.
Changes
Before upgrading [PRIMECLUSTER 4.4A00]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
Changes
Before upgrading [PRIMECLUSTER 4.5A00]
Enterprise-Postgres resource is not displayed in "turnkey wizard "STANDBY"".
After upgrading [PRIMECLUSTER 4.5A10]
Enterprise-Postgres resource is displayed in "turnkey wizard "STANDBY"".
Note
None.
- 640 -
Appendix O Release Information
This appendix lists the locations and the descriptions changed in PRIMECLUSTER 4.5.
- 641 -
No Edition Location Description
N.3.8 Operation Procedures and Displayed Items
for Cluster Application Setup and Modification
13 4.5.2 6.7.3.2 Setting Up Fsystem Resources Changed the description of forcible file system
6.12.3.1 Corrective Actions for the Forced File check.
System Check
14 4.5.2 6.7.7 Exclusive Relationships Between Cluster Added the information when the cluster
Applications application transits to Standby state.
15 4.5.2 6.9 Checking the Cluster Environment Added the execution procedure of clchkcluster
command.
16 4.5.2 7.1.3.1 RMS Tree Added the note about RMS tree.
17 4.5.2 9.2.1 Changing the IP Address of the Public LAN Changed the description of operation procedure.
9.2.2 Changing the IP Address of the
Administrative LAN
18 4.5.2 9.2.3 Changing the IP Address of CF over IP Added the note when the administrative LAN is
shared with the pubic LAN. Changed the
description of how to edit /etc/default/cluster per
network segment to the description per
environment.
19 4.5.2 10.5 Deleting a Resource Changed the descriptions of the setting if the
10.5.1 Settings made when deleting a Gds resource Gds resources are deleted.
20 4.5.2 10.5 Deleting a Resource Added the note about the VMware environment
10.6.2 Changing the Devices of File systems where the I/O fencing function is used.
Controlled by the Fsystem Resource
21 4.5.2 Appendix A PRIMECLUSTER Products Deleted the following products from the
PRIMECLUSTER products:
- Symfoware Server Hot Standby Option
- Symfoware Server Parallel Cluster Option
22 4.5.2 Appendix A PRIMECLUSTER Products Added FUJITSU Software Enterprise Postgres
Advanced Edition to PRIMECLUSTER
products.
23 4.5.2 C.3.1 Output Destination for core Files Added the process names of core output to each
directory.
24 4.5.2 C.3.3 Log Volume When Changing Log Levels Added the information about the log volume
increased per day when Primesoft Server is
installed.
25 4.5.2 H.1 Cluster Systems in a VMware Environment Changed the description of forcible stop
functions.
26 4.5.2 H.2.1.1 Installation and Configuration of Related Changed the descriptions of "Setting up shared
Software disks (when using the I/O fencing function)."
27 4.5.2 H.2.4.1 Setting Up I/O Fencing Function Added how to set up the function to advertise the
route information from the switching
destination node.
28 4.5.2 Appendix I Using PRIMECLUSTER in RHOSP Added and changed the descriptions when using
Environment PRIMECLUSTER in RHOSP environment.
29 4.5.2 M.3.3 Changing the Settings in Multi-User Mode Added the setting procedures in
PRIMEQUEST3000 series.
- 642 -
No Edition Location Description
30 4.5.2 Appendix N Changes in Each Version Added the following incompatible function:
- Changing "turnkey wizard "STANDBY"" of
hvw command.
- 643 -
Glossary
AC (Access Client)
See Access Client.
Access Client
GFS kernel module on each node that communicates with the Meta Data Server and provides simultaneous access to a shared file system.
See also Meta Data Server.
application (RMS)
A resource categorized as userApplication used to group resources into a logical collection.
attribute (RMS)
The part of an object definition that specifies how the base monitor acts and reacts for a particular object type during normal operations.
availability
Availability describes the need of most enterprises to operate applications via the Internet 24 hours a day, 7 days a week. The relationship
of the actual to the planned usage time determines the availability of a system.
BM (base monitor)(RMS)
The RMS module that maintains the availability of resources. The base monitor is supported by daemons and detectors. Each host being
monitored has its own copy of the base monitor.
CB
Clustering Base
- 644 -
CF (Cluster Foundation or Cluster Framework)
See Cluster Foundation.
child (RMS)
A resource defined in the configuration file that has at least one parent. A child can have multiple parents, and can either have children
itself (making it also a parent) or no children (making it a leaf object).
See also resource, object, parent, and leaf object.
CIM
Cluster Integrity Monito
CIP
Cluster Interconnect Protocol
class (GDS)
See disk class.
CLI
command-line interface
cluster
A set of computers that work together as a single computing source. Specifically, a cluster performs a distributed form of parallel
computing.
See also RMS configuration.
Cluster Foundation
The set of PRIMECLUSTER modules that provides basic clustering communication services.
See also base cluster foundation.
cluster partition
The state in which communication with some of the nodes that constitute the cluster has been stopped.
concatenation (GDS)
The linking of multiple physical disks. This setup allows multiple disks to be used as one virtual disk that has a large capacity.
- 645 -
configuration file (RMS)
The RMS configuration file that defines the monitored resources and establishes the interdependencies between them. The default name
of this file is config.us.
CRM
Cluster Resource Management
daemon
A continuous process that performs a specific function repeatedly.
detector (RMS)
A process that monitors the state of a specific object type and reports a change in the resource state to the base monitor.
DLPI
Data Link Provider Interface
DOWN (CF)
A node state that indicates that the node is unavailable (marked as down). A LEFTCLUSTER node must be marked as DOWN before
it can rejoin a cluster.
See also UP, LEFTCLUSTER, node state.
- 646 -
EE
Enterprise Edition
Ethernet
LAN standard that is standardized by IEEE 802.3. Currently, except for special uses, nearly all LANs are Ethernets. Originally the
expression Ethernet was a LAN standard name for a 10 megabyte per second type LAN, but now it also used as a general term that
includes high-speed Ethernets and gigabyte Ethernets.
- 647 -
GFS shared file system
A shared file system that allows simultaneous access from multiple Linux(R) systems that are connected to shared disk units, while
maintaining data consistency, and allows processing performed by a node to be continued by other nodes even if the first node fails.
A GFS shared file system can be mounted and used concurrently from multiple nodes.
graph (RMS)
See system graph.
group (GDS)
See disk group.
HA (high availability)
This concept applies to the use of redundant resources to avoid single points of failure.
hub
Star-type wiring device used for LAN or fibre channels.
ICF
Internode Communication Facility
interconnect (CF)
See cluster interconnect.
- 648 -
internode communication facility
Communication function between cluster nodes that are used by PRIMECLUSTER CF. Since this facility is designed especially for
communication between cluster nodes, the overhead is less than that of TCP/IP, and datagram communication services that also
guarantee the message arrival sequence can be carried out.
IP address
See Internet Protocol address.
IP aliasing
This enables several IP addresses (aliases) to be allocated to one physical network interface. With IP aliasing, the user can continue
communicating with the same IP address, even though the application is now running on another host.
See also Internet Protocol address.
I/F
Interface
I/O
input/output
latency (RMS)
Time interval from when a data transmission request is issued until the actual response is received.
LEFTCLUSTER (CF)
A node state that indicates that the node cannot communicate with other nodes in the cluster. That is, the node has left the cluster. The
purpose for the intermediate LEFTCLUSTER state is to avoid the network partition problem.
See also UP, DOWN, network partition, node state.
link (RMS)
Designates a child or parent relationship between specific resources.
local host
The host from which a command or process is initiated.
See also remote host.
- 649 -
log file
The file that contains a record of significant system events or messages. The base monitor, wizards, and detectors can have their own
log files.
MA
Monitoring Agents
MAC address
Address that identifies the office or node that is used by the MAC sublayer of a local area network (LAN).
message
A set of data transmitted from one software process to another process, device, or file.
message queue
A designated memory area which acts as a holding place for messages.
MIB
Management Information Base
mirroring (GDS)
A setup that maintains redundancy by writing the same data to multiple slices. Even if an error occurs in some of the slices, this setup
allows access to the volume to continue as long as a normal slice remains.
monitoring agent
Component that monitors the state of a remote cluster node and immediately detects if that node goes down. This component is separate
from the SA function.
mount point
The point in the directory tree where a file system is attached.
- 650 -
native operating system
The part of an operating system that is always active and translates system calls into activities.
network adapter
A LAN network adapter.
NIC
network interface card
node
A host which is a member of a cluster. A computer node is a computer.
NSM
Node State Monitor
object (RMS)
In the configuration file or a system graph, this is a representation of a physical or virtual resource.
See also leaf object, object definition, node state, object type.
- 651 -
See also generic type.
online maintenance
The capability of adding, removing, replacing, or recovering devices without shutting or powering off the host.
parent (RMS)
An object in the configuration file or system graph that has at least one child.
See also child, configuration file, and system graph.
PAS
Parallel Application Services
patrol diagnosis
A function that periodically diagnoses hardware faults.
physical IP address
IP address that is assigned directry to the interface (for example, hme0) of a network interface card.
physical machine
A server configured with actual hardware. This is used in contrast with a virtual machine, and is also referred to as a physical server.
- 652 -
PS
Parallel Server
public LAN
The local area network (LAN) by which normal users access a machine.
quorum
State in which integrity is maintained among the nodes that configure the cluster system. Specifically, the CF state in all the nodes that
configure the cluster system is either UP or DOWN (there is no LEFCLUSTER node).
RAO
RMS-Add on
redundancy
This is the capability of one object to assume the resource load of any other object in a cluster, and the capability of RAID hardware and/
or RAID software to replicate data stored on secondary storage devices.
remote host
A host that is accessed through a telecommunications line or LAN.
See also local host.
remote node
See remote host.
resource (RMS)
A hardware or software element (private or shared) that provides a function, such as a mirrored disk, mirrored disk pieces, or a database
server. A local resource is monitored only by the local host.
See also private resource, shared resource.
- 653 -
resource state (RMS)
Current state of a resource.
RMS command
Commands that enable RMS resources to be administered from the command line.
RMS configuration
A configuration in which two or more nodes are connected to shared resources. Each node has its own copy of operating system and RMS
software, as well as its own applications.
Rolling update
Update method used to fix an application or maintenance within the cluster system. Fix application is enabled by applying fixes to each
node sequentially without stopping jobs.
route
In the PRIMECLUSTER Concepts Guide, this term refers to the individual network paths of the redundant cluster interfaces that connect
the nodes to each other.
SA
Shutdown Agent. SA forcibly stops the target node by receiving instructions from the Shutdown Facility.
SC
Scalability Cluster
scalability
The ability of a computing system to dynamically handle any increase in work load. Scalability is especially important for Internet-based
applications where growth caused by Internet usage presents a scalable challenge.
scope (GDS)
The range of nodes that can share objects in the shared type disk class.
- 654 -
script (RMS)
A shell program executed by the base monitor in response to a state transition in a resource. The script may cause the state of a resource
to change.
SD
Shutdown Daemon
SF
Shutdown Facility
shared resource
A resource, such as a disk drive, that is accessible to more than one node.
See also private resource, resource.
Shutdown Facility
A facility that forcibly stops a node in which a failure has occurred. When PRIMECLUSTER decides that system has reach a state in
which the quorum is not maintained, it uses the Shutdown Facility (SF) to return the cluster system to the quorum state.
shutdown request
Instruction that forcibly stops the specified node so that the quorum is restored.
- 655 -
spare disk (GDS)
A spare disk for restoring the mirroring state in place of a failed disk.
state
See resource state.
striping (GDS)
Dividing data into fixed-size segments, and cyclically distributing and writing the data segments to multiple slices. This method
distributes I/O data to multiple physical disks and issues I/O data at the same time.
switching mode
A name of the redundant line control methods of LAN presented by GLS.
switchover
The process by which a user application transfers processes and data inherited from an operating node to a standby node, based on a user
request.
switchover (RMS)
The process by which RMS switches control of userApplication over from one monitored host to another.
See also automatic switchover, directed switchover, failover, and symmetrical switchover.
- 656 -
synchronized power control
When the power of one node is turned in the cluster system configured with PRIMEPOWER, this function turns on all other powered-
off nodes and disk array unit that are connected to nodes through RCI cables.
template
See application template.
type
See object type.
UP (CF)
A node state that indicates that the node can communicate with other nodes in the cluster.
See also DOWN, LEFTCLUSTER, node state.
user group
A group that limits the environment setup, operation management, and other operations presented by Web-Based Admin View and the
Cluster Admin GUI. There are four user groups: wvroot, clroot, cladmin, and clmon. Each user ID is registered in an appropriate user
group by the operation system administrator of the management server.
VIP
Virtual Interface Provider
Virtual disk
A disk accessible from a virtual machine.
volume (GDS)
See logical volume (GDS).
Wizard (RMS)
An interactive software tool that creates a specific type of application using pretested object definitions. An enabler is a type of wizard.
WK
Wizard Kit
WT
Wizard Tools
- 657 -
Index
[Numbers] Checking the BMC or iRMC IP Address and the Configuration
11 standby.................................................................................31 Information of the Shutdown Agent.......................................398
2-tier model...............................................................................41 Checking the Cluster Environment.........................................224
3-tier model...............................................................................42 Checking the Configuration....................................................396
child (RMS)............................................................................ 645
[A] Clash dump............................................................................... 10
AC...........................................................................................644 class (GDS).............................................................................645
Access Client.......................................................................... 644 Clear fault................................................................................... 7
Activating Configuration Update Service for SA...................396 Clearing the Wait State of a Node.......................................... 275
Activating the Cluster Interconnect..........................................89 Client........................................................................................ 41
Adding, Deleting, and Changing Hardware........................... 296 cluster......................................................................................645
Adding Hardware................................................................... 296 Cluster Admin.........................................................................100
API..........................................................................................644 Cluster Admin functions.........................................................100
application (RMS).................................................................. 644 Cluster Application Operations.............................................. 274
Application building procedure and manual reference locations Cluster application setup........................................................ 224
................................................................................................ 151 Cluster Foundation................................................................. 645
Application Program Interface............................................... 644 Cluster interconnect..................................................................10
application template (RMS)....................................................644 cluster interconnect (CF)........................................................ 645
Assigning Users to Manage the Cluster................................... 90 Cluster nodes............................................................................ 41
attribute (RMS).......................................................................644 Cluster partition...................................................................... 645
Attributes................................................................................ 216 Cluster Resource Management facility...................................645
automatic switchover (RMS)..................................................644 Cluster states...........................................................................264
AutoRecover...........................................................................256 Cluster Systems in a VMware Environment.......................... 419
AutoSwitchOver.............................................................. 224,225 Cmdline.................................................................................. 234
availability.............................................................................. 644 Common................................................................................... 98
concatenated virtual disk........................................................ 645
[B]
concatenation (GDS).............................................................. 645
base cluster foundation (CF)...................................................644
Concurrent Viewing of Node and Cluster Application States280
BLADE shutdown agent...........................................................12
Configuration Change.............................................................463
BM(base monitor) (RMS)...................................................... 644
Configuration change of Cluster Applications....................... 329
BMC (Baseboard Management Controller)........................... 644
configuration file (RMS)........................................................ 646
Bringing Faulted Cluster Application to available state.........275
Configuration information or object attributes.......................272
Build Flow.................................................................................. 2
Configuration of Configuration Update Service for SA.........395
Building a cluster....................................................................103
Configuration Update Service for SA.................................... 391
Building Cluster Applications................................................ 151
Confirming Web-Based Admin View Startup..........................93
[C] Corrective Action for Failed Resources................................. 286
Cancellation of Configuration Update Service for SA...........400 Corrective Action in the event of a resource failure...............284
Cascade (using one cluster application)................................... 34 Crash Dump............................................................................377
CF.................................................................................... 101,645 Crash dump collection facility................................................646
CF Main Window................................................................... 262 Creating Scalable Cluster Applications..................................194
Changing a CIP Address.........................................................317 Creating Standby Cluster Applications.................................. 188
Changing a Node Name..........................................................312 CRM....................................................................................... 101
Changing a Procedure Resource.............................................387 CRM Main Window............................................................... 263
Changing Blade Settings........................................................ 325 custom detector (RMS)...........................................................646
Changing Hardware................................................................307 custom type (RMS).................................................................646
Changing iRMC Settings........................................................322
[D]
Changing the cluster system configuration............................ 296
daemon....................................................................................646
Changing the MMB IP Address............................................. 319
database node (SIS)................................................................ 646
Changing the Network Environment......................................313
Deactivating Configuration Update Service for SA............... 400
Changing the operation attributes of a userApplication......... 355
Deleting a cluster application................................................. 331
Changing the RMS environment variables.............................360
Deleting a Procedure Resource...............................................389
Changing the User Name and Password for Controlling the MMB
Deleting a resource................................................................. 344
with RMCP............................................................................. 320
Deleting a userApplication..................................................... 331
Checking PRIMECLUSTER designsheets...............................89
Deleting Hardware..................................................................301
- 658 -
Deleting the Hardware Resource ...........................................331 Global Disk Services.............................................................. 648
Detaching Resources from Operation.....................................364 Global File Services................................................................648
Detailed resource information................................................ 267 Global Link Services.............................................................. 648
Detecting a Failed Resource................................................... 378 GLS setup............................................................................... 153
Detector.................................................................................. 256 graph (RMS)........................................................................... 648
detector (RMS)....................................................................... 646 graphical user interface...........................................................648
Determining the Cluster System Operation Mode................... 30 group (GDS)........................................................................... 648
Determining the Failover Timing of Cluster Application........ 44 Guest OS setup............................................................... 65,73,82
Determining the Web-Based Admin View Operation Mode... 41 GUI......................................................................................... 648
Development...............................................................................5
DHCP........................................................................................11 [H]
directed switchover (RMS).....................................................646 HaltFlag.................................................................................. 224
disk class (GDS)..................................................................... 646 Heartbeat error........................................................................229
disk group (GDS)....................................................................646 high availability...................................................................... 648
Displayed resource types........................................................ 264 highest-order group (GDS).....................................................648
Displaying environment variables.......................................... 283 hub.......................................................................................... 648
Double fault............................................................................ 224 HV_APPLICATION.............................................................. 250
DOWN (CF)........................................................................... 646 HV_AUTORECOVER...........................................................250
Dynamic Reconfiguration................................................298,304 HV_FORCED_REQUEST.....................................................250
HV_INTENDED_STATE......................................................251
[E] HV_LAST_DET_REPORT................................................... 251
ENS (CF)................................................................................ 647 HV_NODENAME..................................................................251
Entering maintenance mode for Cluster Application............. 275 HV_SCRIPT_TYPE...............................................................251
environment variable (RMS).................................................. 647
Environment variables............................................................182 [I]
error detection (RMS).............................................................647 Initial Cluster Setup................................................................ 103
Ethernet...................................................................................647 Initial Configuration Setup..................................................... 142
Event Notification Services (CF)........................................... 647 Initial GFS Setup.................................................................... 172
exclusive relationships between cluster applications............. 216 Initial GLS setup.....................................................................153
Executing Standby Restoration for the Operating Job .......... 365 Initial RMS Setup................................................................... 153
Executing the fjsnap or pclsnap Command............................ 375 Initial setup of the cluster resource management facility....... 141
Exiting the Web-Based Admin View Screen......................... 101 Initial setup of the operation management server.....................92
Initial setup of Web-Based Admin View................................. 92
[F] Installation.................................................................... 3,425,447
Failed Resource Message....................................................... 378 Installation and environment setup of applications.................. 87
failover........................................................................................7 Installation procedure and manual reference sections................5
Failover...................................................................................224 interconnect (CF)....................................................................648
failover (RMS, SIS)................................................................647 Internet Protocol address........................................................ 648
Failure Detection and Cause Identification if a Failure Occurs284 internode communication facility...........................................649
Fast switching mode............................................................... 647 IP address................................................................................649
Fault Resource List.................................................................382 IP aliasing............................................................................... 649
fault tolerant network..............................................................647 IPMI shutdown agent................................................................12
Feature Description of Configuration Update Service for SA391
File System Creation.............................................................. 174 [K]
File system setup.................................................................... 172 kdump shutdown agent.............................................................12
Flow of Maintenance ............................................................. 364 Kernel parameter...................................................................... 10
fsck..........................................................................................258 keyword (reserved words)...................................................... 649
Fsystem...................................................................................256
[L]
Function Selection.................................................................... 16
LAN........................................................................................ 649
[G] latency (RMS).........................................................................649
gateway node (SIS).................................................................647 leaf object (RMS)................................................................... 649
GDS Configuration Setup.......................................................157 LEFTCLUSTER (CF)............................................................ 649
Generate and Activate............................................................ 215 link (RMS)..............................................................................649
generic type (RMS)................................................................ 647 local area network...................................................................649
GFS Shared File System.........................................................173 local host.................................................................................649
GFS shared file system........................................................... 648 log file.....................................................................................650
Global Cluster Services menu functions................................ 100 logical volume (GDS).............................................................650
- 659 -
low-order group (GDS).......................................................... 650 OSD (CF)................................................................................652
Other resource states...............................................................265
[M] Output Message (syslog)........................................................ 403
MAC address.......................................................................... 650
Maintenance............................................................................463 [P]
Maintenance Types.................................................................364 parent (RMS).......................................................................... 652
Management server...................................................................41 patrol diagnosis.......................................................................652
Manual...................................................................................... 98 physical IP address................................................................. 652
Manual Pages..........................................................................370 physical machine.................................................................... 652
MDS........................................................................................650 Planning...................................................................................... 2
message...................................................................................650 Preparation Prior to Building a Cluster.................................... 88
message queue........................................................................ 650 Preparations for starting the Web-Based Admin View screen. 89
Meta Data Server(GFS).......................................................... 650 Preparing the client environment..............................................91
mirrored volume (GDS)..........................................................650 primary host (RMS)................................................................652
mirror group (GDS)................................................................ 650 PRIMECLUSTER.................................................................. 298
mirroring (GDS)..................................................................... 650 PRIMECLUSTER Clustering Base..........................................15
monitoring agent.....................................................................650 PRIMECLUSTER Enterprise Edition...................................... 15
Monitoring Cluster Control Messages....................................284 PRIMECLUSTER HA Server.................................................. 15
Monitoring the PRIMECLUSTER System............................ 278 PRIMECLUSTER Installation................................................. 85
Monitoring the State of a Cluster Application........................279 PRIMECLUSTER Lite Pack.................................................... 15
Monitoring the State of a Node.............................................. 278 PRIMECLUSTER Products................................................... 369
Mountpoint............................................................................. 259 PRIMECLUSTER Product Selection.......................................15
mount point.............................................................................650 PRIMECLUSTER services (CF)............................................652
Mutual standby......................................................................... 32 PRIMEQUEST 2000 series...................................................... 51
Priority transferring (application of N 1 standby).................... 35
[N] private network address.......................................................... 652
N 1 standby...............................................................................33 private resource (RMS).......................................................... 652
native operating system.......................................................... 651 Product Selection......................................................................15
network adapter...................................................................... 651 public LAN............................................................................. 653
network interface card............................................................ 651
network partition (CF)............................................................ 651 [Q]
Network segment......................................................................12 queue.......................................................................................653
NIC switching mode...............................................................651 quorum....................................................................................653
node........................................................................................ 651
Node failure............................................................................ 224 [R]
node state (CF)........................................................................651 redundancy..............................................................................653
Node states..............................................................................265 Registering, Changing, and Deleting State Transition Procedure
NODE_SCRIPTS_TIME_OUT............................................. 251 Resources for PRIMECLUSTER Compatibility....................386
Notes on script creation.......................................................... 181 Registering a Procedure Resource..........................................386
NTP server................................................................................10 Registering Hardware Devices............................................... 144
Reliant Monitor Services (RMS)............................................653
[O] remote host............................................................................. 653
object (RMS).......................................................................... 651 remote node............................................................................ 653
object definition (RMS)..........................................................651 Replacement test.........................................................................8
object type (RMS).................................................................. 651 reporting message (RMS).......................................................653
online maintenance................................................................. 652 Reserved word........................................................................ 234
operating system dependent (CF)........................................... 652 resource (RMS).......................................................................653
Operating the PRIMECLUSTER System...............................273 resource database (CF)........................................................... 653
Operation and Maintenance........................................................9 resource definition (RMS)...................................................... 653
Operation Check by Restarting the System............................399 Resource failure......................................................................224
Operation Check for Configuration Update Service for SA...399 Resource Fault History.................................................... 100,379
Operation Environment of Configuration Update Service for SA Resource icons........................................................................264
................................................................................................ 394 resource label (RMS)..............................................................653
Operation menu functions........................................................ 98 resource state (RMS).............................................................. 654
Operation Mode Change.............................................................9 Resource states....................................................................... 264
Operations.................................................................262,265,461 Restoration Method When Correct Information is not Distributed
OPS.........................................................................................652 to All Nodes............................................................................400
Oracle Parallel Server.............................................................652 Restoration of Configuration Update Service for SA.............400
- 660 -
Restoring the Startup Configuration of the IPMI Service...... 400 simple virtual disk.................................................................. 655
RMS.................................................................................101,654 Single-Node Cluster Operation................................................ 38
RMS command....................................................................... 654 single disk (GDS)................................................................... 655
RMS configuration................................................................. 654 single volume (GDS).............................................................. 655
RMS graphs............................................................................ 282 SIS................................................................................... 101,655
RMS Main Window................................................................268 Site Preparation.........................................................................15
RMS Operation.......................................................................273 Software Installation and Setup................................................46
RMS Tree............................................................................... 268 Software Maintenance ........................................................... 365
RMS Wizard kit......................................................................654 Spanning Tree Protocol............................................................ 10
RMS Wizard Tools.................................................................654 spare disk (GDS).................................................................... 656
Rolling update.........................................................................654 Standby Operation.................................................................... 31
route........................................................................................ 654 Starting a Cluster Application................................................ 274
Starting RMS.......................................................................... 273
[S] Starting RMS Wizard............................................................. 188
Sample scripts.........................................................................180 Starting the Web-Based Admin View screen........................... 95
SAN........................................................................................ 654 Startup Configuration for the IPMI Service........................... 395
scalability................................................................................654 Startup Configuration for Update Service for SA.................. 396
Scalable Internet Services (SIS)............................................. 654 Startup test.................................................................................. 7
Scalable Operation....................................................................36 state.........................................................................................656
scope (GDS)........................................................................... 654 state transition procedure........................................................656
script (RMS)........................................................................... 655 Stop.............................................................................................8
SDX disk (GDS).....................................................................655 Stopping a Cluster Application...............................................274
SDX object (GDS)..................................................................655 Stopping RMS........................................................................ 273
service node (SIS)...................................................................655 Storage Area Network............................................................ 656
Setting Java...............................................................................95 striped group (GDS)............................................................... 656
Setting the Web-Based Admin View Language.......................93 striped virtual disk.................................................................. 656
Setting up CF and CIP............................................................ 104 striped volume (GDS).............................................................656
Setting Up Cluster Applications............................................. 184 stripe width (GDS)..................................................................656
Setting Up Cmdline Resources...............................................201 striping (GDS)........................................................................ 656
Setting Up Disk Units...............................................................47 Subsystem hang...................................................................... 231
Setting up fault resource identification and operator intervention Switching a Cluster Application.............................................274
request.....................................................................................148 switching mode.......................................................................656
Setting Up Fsystem Resources............................................... 204 Switchlogs and application logs............................................. 272
Setting Up Gds Resources...................................................... 208 switchover............................................................................7,656
Setting Up Gls Resources....................................................... 209 switchover (RMS)...................................................................656
Setting Up Hardware Monitoring with ServerView.................48 symmetrical switchover (RMS)..............................................656
Setting Up NTP........................................................................ 47 synchronized power control....................................................657
Setting Up Online/Offline Scripts.......................................... 179 System configuration modification........................................ 295
Setting Up Procedure Resources............................................ 213 System Design.......................................................................... 16
Setting Up Resources..............................................................200 System dump............................................................................ 10
Setting Up Shared Disks.........................................................164 system graph (RMS)...............................................................657
Setting Up System Disk Mirroring.........................................158
Setting Up Takeover Network Resources.............................. 210 [T]
Setting Up the Application Environment............................... 179 template.................................................................................. 657
Setting up the browser.............................................................. 95 Test............................................................................................. 6
Setting Up the Cluster High-Speed Failover Function.............48 Test for forced shutdown of cluster nodes..................................9
Setting Up the Network............................................................ 47 Time synchronization............................................................... 10
Setting Up the RMS Environment..........................................224 Troubleshooting......................................................................375
Setting up the shutdown facility............................................. 106 type......................................................................................... 657
Setting Up userApplication.................................................... 188
Setting up Web-Based Admin View when GLS is used........ 157 [U]
sfsacfgupdate.......................................................................... 402 UP (CF)...................................................................................657
shared disk connection confirmation......................................655 user group............................................................................... 657
shared resource....................................................................... 655 User groups...............................................................................90
Shared resource states.............................................................265
[V]
Shutdown Facility...................................................................655
Viewing application logs........................................................ 281
shutdown request.................................................................... 655
Viewing Detailed Resource Information................................282
- 661 -
Viewing Logs Created by the PRIMECLUSTER System..... 281
Viewing switchlogs................................................................ 281
Viewing the PRIMECLUSTER system operation management
screens.................................................................................... 262
virtual interface (VIP).............................................................657
Virtual Machine Function.........................................................17
volume (GDS).........................................................................657
Volume setup..........................................................................165
[W]
watchdog timer monitoring.................................................... 657
Web-Based Admin View........................................................657
Web-Based Admin View screen...............................................97
When not Using the Virtual Machine Function........................46
When Using the Virtual Machine Function..............................59
Wizard (RMS)........................................................................ 657
Work process continuity.............................................................8
- 662 -