Guide To IBM PowerHA System
Guide To IBM PowerHA System
Katharina Probst
Dino Quintero Matt Radford
Alex Abderrazag Bjorn Roden
Bernhard Buehler Michael Schmut
Primitivo Cervantes Isac Silva
Bharathraj Keshavamurthy Yefei Song
Kunal Langer Ben Swinney
Luciano Martins Ashraf Ali Thajudeen
Ashish Nainwal Marian Tomescu
Minh Pham Sascha Wycisk
ibm.com/redbooks
International Technical Support Organization
August 2014
SG24-8167-00
Note: Before using this information and the product it supports, read the information in Notices on
page ix.
This edition applies to IBM AIX 7.1 TL3 SP1, IBM PowerHA SystemMirror 7.1.2 SP3, IBM PowerHA
SystemMirror 6.1 running on IBM AIX 6.1, IBM PowerHA SystemMirror 7.1.2 running on IBM AIX 7.1.2, IBM
DB2 9.7.0.8, GSKit 8.0.50.10, TDS 6.3.0.24, IBM Tivoli Monitoring 6.2.2.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Now you can become a published author, too . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter 1. Introduction to IBM PowerHA SystemMirror for AIX 7.1.3, Standard and
Enterprise Editions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 How IBM PowerHA SystemMirror helps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Disaster recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 High-availability criteria for designing your systems deployment . . . . . . . . . . . . . . 5
1.2.2 Differences in disaster recovery solution tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Storage replication and mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 4. Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 PowerHA SystemMirror 7.1.3 requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Contents v
8.4 HyperSwap reference architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.4.1 In-band storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.4.2 AIX support for HyperSwap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.4.3 AIX view of HyperSwap disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.5 HyperSwap functions on PowerHA SystemMirror 7.1.3, Enterprise Edition . . . . . . . . 221
8.6 Limitations and restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.7 HyperSwap environment requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.8 Planning a HyperSwap environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.9 Configuring HyperSwap for PowerHA SystemMirror. . . . . . . . . . . . . . . . . . . . . . . . . . 224
8.10 HyperSwap storage configuration for PowerHA node cluster . . . . . . . . . . . . . . . . . . 225
8.11 HyperSwap Metro Mirror Copy Services configuration . . . . . . . . . . . . . . . . . . . . . . . 225
8.12 HyperSwap PowerHA SystemMirror cluster node configuration . . . . . . . . . . . . . . . . 227
8.12.1 Change the multipath driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8.12.2 Change Fibre Channel controller protocol device attributes . . . . . . . . . . . . . . . 229
8.13 Configure disks for the HyperSwap environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
8.14 Node-level unmanage mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.15 Single-node HyperSwap deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.15.1 Single-node HyperSwap configuration steps . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.15.2 Oracle single-instance database with Automatic Storage Management in
single-node HyperSwap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.16 Dynamically adding new disk in ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
8.17 Testing HyperSwap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.18 Single-node HyperSwap tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.18.1 Single-node HyperSwap: Planned HyperSwap. . . . . . . . . . . . . . . . . . . . . . . . . 258
8.18.2 Single-node HyperSwap: Storage migration . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
8.18.3 Single-node HyperSwap: Unplanned HyperSwap . . . . . . . . . . . . . . . . . . . . . . 274
8.19 System mirror group: Single-node HyperSwap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
8.19.1 Planned swap system mirror group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
8.19.2 Unplanned swap of a system mirror group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8.20 Oracle Real Application Clusters in a HyperSwap environment . . . . . . . . . . . . . . . . 283
8.20.1 Oracle Real Application Clusters: PowerHA Enterprise Edition stretched cluster
configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
8.20.2 Adding new disks to the ASM configuration: Oracle RAC HyperSwap . . . . . . . 294
8.20.3 Planned HyperSwap: Oracle RAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
8.20.4 Unplanned HyperSwap: Failure of Storage A nodes in Site A . . . . . . . . . . . . . 300
8.20.5 Unplanned HyperSwap: Storage A unavailable for both sites . . . . . . . . . . . . . 306
8.20.6 Tie breaker considerations: Oracle RAC in a HyperSwap environment . . . . . . 311
8.20.7 Unplanned HyperSwap: Site A failure, Oracle RAC . . . . . . . . . . . . . . . . . . . . . 312
8.20.8 CAA dynamic disk addition in a HyperSwap environment . . . . . . . . . . . . . . . . 317
8.21 Online storage migration: Oracle RAC in a HyperSwap configuration . . . . . . . . . . . 322
8.21.1 Online storage migration for Oracle RAC in a HyperSwap configuration . . . . . 323
8.22 Troubleshooting HyperSwap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Contents vii
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results
obtained in other operating environments may vary significantly. Some measurements may have been made
on development-level systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX HACMP Redbooks
BladeCenter HyperSwap Redpaper
DB2 IBM Redbooks (logo)
developerWorks Lotus System p
Domino Parallel Sysplex System p5
DS8000 POWER System Storage
eServer Power Systems System x
FileNet POWER6 System z
GDPS POWER7 SystemMirror
Geographically Dispersed Parallel PowerHA Tivoli
Sysplex PowerVM WebSphere
Global Technology Services PureFlex XIV
GPFS Rational
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
This IBM Redbooks publication for IBM Power Systems with IBM PowerHA
SystemMirror Standard and Enterprise Editions (hardware, software, practices, reference
architectures, and tools) documents a well-defined deployment model within an IBM Power
Systems environment. It guides you through a planned foundation for a dynamic
infrastructure for your enterprise applications.
This information is for technical consultants, technical support staff, IT architects, and IT
specialists who are responsible for providing high availability and support for the IBM
PowerHA SystemMirror Standard and Enterprise Editions on IBM POWER systems.
Authors
This book was produced by a team of specialists from around the world working at the
International Technical Support Organization, Poughkeepsie Center.
Dino Quintero is a Complex Solutions Project Leader and an IBM Senior Certified IT
Specialist with the ITSO in Poughkeepsie, NY. His areas of knowledge include enterprise
continuous availability, enterprise systems management, system virtualization, technical
computing, and clustering solutions. He is an Open Group Distinguished IT Specialist. Dino
holds a Master of Computing Information Systems degree and a Bachelor of Science degree
in Computer Science from Marist College.
Alex Abderrazag is a Consulting IT Specialist on the worldwide IBM education events team.
Alex has more than 20 years experience working with UNIX systems and has been
responsible for managing, teaching, and developing the IBM Power, IBM AIX, and Linux
education curriculum. Alex is a Chartered Member of the British Computer Society and a
Fellow of the Performance and Learning Institute. Alex holds a BSc (Honors) degree in
Computer Science and has many AIX certifications, including IBM Certified Advanced
Technical Expert and IBM Certified Systems Expert for IBM PowerHA (exam developer).
Kunal Langer is a Technical Solutions Architect for Power Systems in STG Lab Services,
India. He has more than seven years of experience in IBM Power Systems, with expertise in
the areas of PowerHA, PowerVM, and AIX security. He conducts PowerHA Health Checks,
AIX Health Checks, and IBM PowerCare Services Availability Assessments for IBM clients in
the ISA and EMEA regions. He has co-authored a few IBM Redbooks publications and written
articles for IBM developerWorks and IBM Systems Magazine. He holds a Bachelors in
Engineering degree in Computer Science and Technology.
Ashish Nainwal is a Managing Consultant for Power Systems in Systems Technology Group
Lab Services, ASEAN. With almost nine years of experience in IBM Power Systems, he has
expertise in the Power suite, including PowerHA, PowerVM, and AIX Security. He is a vetted
PowerCare Services consultant and has conducted performance, availability, and security
engagements for IBM clients in all countries of ASEAN region. He has published articles on
IBM developerWorks and holds an IBM patent for security architecture. Ashish holds a
Bachelors in Technology degree from the National Institute of Technology, India, and an MBA
from Symbiosis International University, in India.
Minh Pham is a Development Support Specialist for PowerHA and Cluster Aware AIX in
Austin, Texas. She has worked for IBM for 13 years, including six years in System p
microprocessor development and seven years in AIX development support. Her areas of
expertise include core and chip logic design for IBM System p and AIX with PowerHA. Minh
holds a Bachelor of Science degree in Electrical Engineering from the University of Texas at
Austin.
Katharina Probs is a member of the IBM/SAP porting team from the IBM Lab in Bblingen,
Germany. Her focus is the enablement and optimization of SAP solutions on the IBM
infrastructure. She is a recognized worldwide expert on high availability and disaster recovery
solutions for SAP landscapes.
Matt Radford is the Team Leader for the Front Office European Support team for PowerHA.
He has seven years of experience in AIX support and PowerHA. He holds a Bsc (Honors)
degree in Information Technology from the University of Glamorgan. He is co-authored
previous Redbooks publications on PowerHA versions 7.10 and 7.12.
Michael Schmut is a a Software Engineer at SAP, working on AIX development. His focus is
on running SAP on PowerHA installations. Before he started at the IBM development lab, he
worked on many client projects related to SAP and AIX in the role of Software Architect.
Isac Silva is an IT Specialist and IT Infrastructure Architect with more than 14 years of
experience in IBM Power Systems. His areas of expertise are IBM AIX and IBM PowerHA. He
is a Technical Leader and, along those years of experience, he has had roles and
responsibility in support, deployment, development, installation, problem determination,
disaster recovery, infrastructure, and networking (TCP/IP, firewall, QoS). He is currently a
Development Support Specialist at IBM in Austin, Texas. He is working with the worldwide
Level 2 IBM PowerHA and Cluster Aware AIX technology.
Ben Swinney is a Senior Technical Specialist for IBM Global Technology Services in
Melbourne, Australia. He has more than 14 years of experience in AIX, IBM Power Systems,
PowerVM, and PowerHA. He has worked for IBM for more than six years, both within IBM UK
and IBM Australia. His areas of expertise include infrastructure design and implementation,
high availability solutions, server administration, and virtualization.
Marian Tomescu has 15 years of experience as an IT Specialist and currently works for IBM
Global Technologies Services in Romania. Marian has nine years of experience in Power
Systems. He is a certified specialist for IBM System p Administration, PowerHA for AIX, and
for Tivoli Storage Management Administration Implementation, an Oracle Certified
Associated, an IBM eServer and Storage Technical Solutions Certified Specialist, and a
Cisco Information Security Specialist. His areas of expertise include Tivoli Storage Manager,
PowerHA, PowerVM, IBM System Storage, AIX, General Parallel File System (GPFS), IBM
VMware, Linux, and Microsoft Windows. Marian has a Masters degree in Electronics Images,
Shapes and Artificial Intelligence from Polytechnic University - Bucharest, Electronics and
Telecommunications, in Romania.
Preface xiii
Special acknowledgement goes to Ken Fleck and his team at the Poughkeepsie Benchmark
Center for lending the team hardware to create sample scenarios for this publication.
Esdras Cruz-Aguilar, Dwip Banerjee, Shawn Bodily, Michael Coffey, Glen Corneau, Paul
Desgranges, Zhi-Wei Dai, Ken Fleck, P. I. Ganesh, Michael Herrera, Kam Lee, Gary Lowther,
Robert Luther, Bill Miller, Philip Moore, Paul Moyer, Suriyan Ramasami, Brian Rousey, and
Ravi Shankar
IBM USA
Peter Juerss, Stephen Lutz, Michael Mueller, Donal O'Connell, Jorge Rodriguez, and
Ingo Weber
IBM Germany
Jon Kowszun
IBM Switzerland
Xiao Er Li
IBM China
Hiroyuki Tanaka
IBM Japan
Philippe Herms
IBM France
Tony Steel
IBM Australia
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us.
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form:
ibm.com/redbooks
Send your comments by email:
[email protected]
Mail your comments:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xv
xvi Guide to IBM PowerHA SystemMirror for AIX, Version 7.1.3
1
This book is helpful to IBM Power Systems specialists who use the PowerHA SystemMirror
solution for high availabiity and went to align their resources with the best disaster recovery
model for their environment. With each technology refresh and new server consolidation, it is
not unreasonable to consider using your existing servers in your recovery environment. If you
are looking for an entry point into a high-availability solution that incorporates disaster
recovery, you can use your existing hardware and select the replication mechanism that fits
your needs.
One of the PowerHA SystemMirror main goals is to help continuous business services
operations even after one (or more) components fails. Unexpected failures can be related to
human errors or other errors. Either way, the PowerHA SystemMirror design phase is
intended to remove any single point of failure (SPOF) from the environment by using
redundant components and automated PowerHA SystemMirror procedures.
It is important to remember that any hardware component can fail and cause application
disruptions. So, when you plan a high availability environment, you must check all
components, from disk access to power circuits, for redundancy.
Replication of data between sites is a good way to minimize business disruption because
backup restores can take too long to meet business requirements or equipment might be
damaged, depending on the extent of the disaster, and not available for restoring data.
Recovery options typically range in cost, with the least expensive involving a longer time for
recovery to the most expensive providing the shortest recovery time and being the closest to
having zero data loss. A fully manual failover normally requires many specialists to coordinate
and perform all of the necessary steps to bring the services up to another site. Even with a
good disaster recovery plan, it can take longer than business requirements allow. High
availability software minimizes downtime of services by automating recovery actions when
failures are detected on the various elements of the infrastructure.
Figure 1-1 on page 3 shows an example of an environment that has no redundant hardware
components, so it would not tolerate failure of any component.
With this configuration, if any component fails, for example the data center network switch or
the SAN switch, the application that runs on the IBM Power 750 server becomes unavailable
becaused it lacks redundancy, or a failover alternative. The IBM Power 750 server
experiences a disruption until all failing components are replaced or fixed. Depending on
which component fails, it can take from hours to weeks to fix it, which affects service
availability and, in the worst case, data availability.
Figure 1-2 on page 4 shows a sample client environment with redundant network connections
via dual network switches to ensure connectivity between server and storage.
Chapter 1. Introduction to IBM PowerHA SystemMirror for AIX 7.1.3, Standard and Enterprise Editions 3
Figure 1-2 Environment with a redundant network switch
The configuration in Figure 1-2 enables the IBM Power 750 server to be more resilient in
response to environmental issues. This resiliency keeps business services available even
with failures in parts of the company infrastructure.
Note: High availability solutions help eliminate SPOFs through appropriate design,
planning, selection of hardware, configuration of software, and carefully controlled change
management discipline. High availability does not mean that there is no interruption to the
application. Therefore, it is called fault resilient rather than fault tolerant.
Documentation: For more information, see the IBM PowerHA SystemMirror for AIX V7.1
documentation in the IBM Knowledge Center:
http://ibm.co/1t5pZ9p
As a result of these outages, your business might incur losses due to damages to the
infrastructure and costs to restore systems to operation. Even a bigger cost is the data loss
caused by outages. Millions of bytes of valuable information can never be restored if there is
no proper planning during the design phase of the deployment. Proper planning might include
making regular data backups, synchronizing or replicating data with different data storage
systems in different geographical zones, and planning for a redundant data storage system
that comes online if the primary node goes down due to outages.
Note: In some instances, the application might also manage the replication of the data to
the disaster recovery site.
PowerHA SystemMirror 7.1.3 for AIX, Standard and Enterprise Editions, helps automate the
recovery actions when failures are detected on the nodes.
There are certain questions to ask when planning a disaster recovery solution to achieve an
adequate return on investment. For example, does it account for the time for planned
maintenance? If so, have you backtracked to make sure that you understand the planned
maintenance or downtime window?
The Five Nines of Availability (Table 1-1) give us performance criteria only for unplanned
downtime, but it is essential to plan for planned downtime each year, too. Version 7 of
SystemMirror does not support a nondisruptive upgrade. Therefore, you must consider the
impact of other service interruptions in the environment that often require the services to go
offline for a certain amount of time, such as upgrades to the applications, the IBM AIX
operating system, and the system firmware. These must be included in the planned downtime
considerations. For more information on the difference between planned and unplanned
downtime, see the shaded box titled Planned downtime versus unplanned downtime, which
follows.
Chapter 1. Introduction to IBM PowerHA SystemMirror for AIX 7.1.3, Standard and Enterprise Editions 5
The Standard and Enterprise Editions of PowerHA SystemMirror for AIX 7.1.3 reliably
orchestrate the acquisition and release of cluster resources from one site to another. They
also provide quick failover if there is an outage or natural disaster.
Solutions in the other tiers can all be used to back up data and move it to a remote location,
but they lack the automation that the PowerHA SystemMirror provides. By looking over the
recovery time axis (Figure 1-3 on page 7), you can see how meeting an RTO of less than four
hours can be achieved with the implementation of automated multisite clustering.
Unplanned downtime is when all system operations shut down after a catastrophe or
accident, such as fires, power outages, earthquakes, or hurricanes. Unplanned downtimes
are unexpected and incur undetermined repair costs and revenue losses due to service
unavailability. Unplanned downtime can occur any time during any period for many
reasons. Therefore, infrastructure architects should include unplanned downtime during
the design and deployment phases of IT solutions.
High availability and disaster recovery requires a balance between recovery time
requirements and cost. Various external studies are available that cover dollar loss estimates
for every bit of downtime that is experienced as a result of service disruptions and unexpected
outages. Decisions must be made about what parts of the business are important and must
remain online to continue business operations.
Beyond the need for secondary servers, storage, and infrastructure to support the replication
bandwidth between two sites, it is important to answer the following questions:
Where does the staff go in the event of a disaster?
What if the technical staff that manages the environment is unavailable?
Are there facilities to accommodate the remaining staff, including desks, phones, printers,
desktop PCs, and so on?
Is there a documented disaster recovery plan that can be followed by non-technical staff, if
necessary?
Figure 1-3 Tiers of disaster recovery solutions, IBM PowerHA SystemMirror 7.1.3 Enterprise Edition
Replicating the data addresses only one problem. In a well-designed disaster recovery
solution, a backup and recovery plan must also exist. Tape backups, snapshots, and flash
memory copies are still an integral part of effective backup and recovery. The frequency of
these backups at both the primary and remote locations must also be considered for a
thorough design.
Tip: An effective backup and recovery strategy should leverage a combination of tape and
point-in-time disk copies to protect unexpected data corruption. Restoration is very
important, and regular restore tests need to be performed to guarantee that the disaster
recovery is viable.
Chapter 1. Introduction to IBM PowerHA SystemMirror for AIX 7.1.3, Standard and Enterprise Editions 7
There are two types of storage replication: synchronous and asynchronous:
Synchronous replication considers only the I/O completed after the write is done on both
storage repositories. Only synchronous replication can guarantee that 100% of
transactions were correctly replicated to the other site. But because this can add a
considerable amount of I/O time, the distance between sites must be considered for
performance criteria.
This is the main reason that asynchronous replication is used between distant sites or with
I/O-sensitive applications. In synchronous mirroring, both the local and remote copies
must be committed to their respective subsystems before the acknowledgment is returned
to the application. In contrast, asynchronous transmission mode allows the data
replication at the secondary site to be decoupled so that primary site application response
time is not affected.
Asynchronous transmission is commonly selected when it is known that the secondary
sites version of the data might be out of sync with the primary site by a few minutes or
more. This lag represents data that is unrecoverable in the event of a disaster at the
primary site. The remote copy can lag behind in its updates. If a disaster strikes, it might
never receive all of the updates that were committed to the original copy.
Although every environment differs, the farther that the sites reside from each other, the more
contention and disk latency are introduced. However, there are no hard-set considerations
that dictating whether you need to replicate synchronously or asynchronously. It can be
difficult to provide an exact baseline for the distance to delineate synchronous versus
asynchronous replication.
For both Standard and Enterprise Editions, the IBM Systems Director server can be enabled
to manage clusters with its integrated GUI by installing the PowerHA plug-in which was
enhanced to support the disaster recovery enablement features added in PowerHA
SystemMirror version 7.1.2 Enterprise Edition (for example, storage replication). The
PowerHA SystemMirror Enterprise Edition gives you the ability to discover the existing
PowerHA SystemMirror clusters, collect information and a variety of reports about the state
and configuration of applications and clusters, and receive live and dynamic status updates
for clusters, sites, nodes, and resource groups. A single sign-on capability gives you full
access to all clusters with only one user ID and password, access and search log files. You
can display a summary page where you can view the status of all known clusters and
resource groups, create clusters, add resource groups with wizards, and apply updates to the
PowerHA SystemMirror Agent by using the Systems Director Update Manager.
One of the main goals of the PowerHA SystemMirror is to provide continuous business
services even after multiple component failures. Unplanned or unexpected failures can occur
at any time. They can be related to human errors, or not. Either way, the intention of the
PowerHA design phase is to remove any single point of failure (SPOF) by using redundant
components wherever possible.
It is important to understand that any component can fail and cause application disruptions.
When planning a high availability environment, you must provide redundancy and check all
components.
Figure 2-1 on page 11 shows an environment without fault tolerance or redundancy. If any
component fails (for example, a network switch or a SAN switch), workloads running on an
IBM Power server become unavailable.
System Storage
2 0 0 5 -B 1 6
Power 75 0
0 1 2 3 8 9 1 0 1 1
4 5 6 7 1 2 1 3 1 4 1 5
D3 D4 D5 D6 D7 D8 D9 D1 0
SAN Switch
POWER7 Server
Company Network
If a failure occurs in this environment, users experience disruption in services until all failing
components are replaced or fixed. Depending on which component has failed, it might take
from a few hours to a few weeks to fix, so it impacts service or data availability.
Figure 2-2 on page 12 shows a sample cluster environment with redundant network
connections, and dual SAN switches for disk access. This configuration enables the Power
server to be more resilient to failures, keeping business services available even with some
service issues in part of the company infrastructure.
System Storage
2 0 0 5 -B 1 6 2 0 0 5 -B 1 6
0 1 2 3 8 9 1 0 1 1 0 1 2 3 8 9 1 0 1 1
4 5 6 7 1 2 1 3 1 4 1 5 4 5 6 7 1 2 1 3 1 4 1 5
Power 75 0
D3 D4 D5 D6 D7 D8 D9 D1 0
POWER7 Server
Company Network
Even without using PowerHA, this configuration (Figure 2-2) is resilient to some possible
failures. If an IP network switch goes down, the server has a secondary network connection
on a redundant switch. If a SAN switch goes down, the server can get storage access through
a secondary SAN switch. This makes the customer environment more resilient and flexible to
unexpected issues, allowing business services to be active and continue.
PowerHA SystemMirror for AIX requires redundancy for most of its components, for example:
Network access
SAN disk access
Local disk
SAN disk formatting (RAID)
Note: A high availability solution, such as PowerHA SystemMirror, ensures that the failure
of any component of the solution, whether hardware, software, or other, does not cause the
application and its data to be inaccessible. This is achieved through the elimination or
masking of both planned and unplanned downtime. High availability solutions must
eliminate all single points of failure wherever possible through design, planning, selection
of hardware. and carefully controlled change management.
Note: For more information about PowerHA architecture and concepts, download the
PowerHA SystemMirror Concepts document from the PowerHA SystemMirror 7.1 for AIX
PDFs page in the IBM Knowledge Center:
http://ibm.co/1nTups9
Important: Starting with PowerHA 7.1.0, the RSCT topology services subsystem is
deactivated, and all of its functions are performed by Cluster Aware AIX (CAA) topology
services.
RSCT
Layer
LVM TCP/IP
Layer Layer
Figure 2-3 RSCT placement on an IBM AIX server
Even though CAA is primarily intended to provide a reliable layer of clustering infrastructure to
high-availability software, such as PowerHA, you can directly use the CAA layer functions to
aid your management tasks in your own computer environment.
In PowerHA 7.1.0, if a repository disk fails, the nodes shut down automatically. In PowerHA
7.1.1, enhancements were implemented for CAA, and a new feature called repository disk
resilience was introduced to enable administrators to perform cluster maintenance tasks even
after the failure of the repository disk.
CAA also supports online repository disk replacement with no cluster impact. Release 7.1.2
of PowerHA introduced the concept of a backup repository disk, which allows administrators
to define an empty disk to be used for rebuilding the cluster repository in case the current
repository disk encounters any failure. For more information about repository disk resilience
or backup repository, see the IBM Redbooks publications titled IBM PowerHA SystemMirror
Standard Edition 7.1.1 for AIX Update, SG24-8030, and the IBM PowerHA SystemMirror
7.1.2 Enterprise Edition for AIX, SG24-8106.
Figure 2-4 shows a high-level architectural view of how PowerHA uses the Reliable Scalable
Clustering Technology (RSCT) and the CAA architecture.
Note: The cluster repository disk can be re-created but cannot make cluster changes if the
disk is not available. Implement mirroring if you want to make changes to the cluster while
the disk is not available.
One possibility is to make the repository disk highly available by mirroring it at the hardware
level over multiple storage servers, as shown in Figure 2-5.
caavg_privat
3 4
Storage-Cluster
4 3
provides 1 mirrored
LUN to AIX for caavg_privat
Copy 1 Copy 2
PowerHA cluster
A cluster is set of computer systems connected together and sharing application data. They
can all be in the same geographic place, in the same data center. or they can be in distant
places, even worldwide.
By adopting cluster technologies, companies can increase service availability and reliability to
their customers or even make disasters not visible to their customers. A clustered
environment can help presentyour business as a better service providers.
When a PowerHA cluster is set up, a logical name (cluster name) must be assigned, as
shown in Figure 2-6. This name is used by PowerHA procedures to deal with specific groups
of servers, services, and information.
COMMAND STATUS
[TOP]
Cluster Name: oracluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk2
Cluster IP Address: 228.1.1.30
There are 2 node(s) and 2 network(s) defined
NODE sapnfs1:
Network net_ether_01
oracle_svc1 172.16.21.65
[MORE...21]
Figure 2-6 shows the cluster topology. It can be checked by using the smitty sysmirror
command and selecting Cluster Nodes and Networks Manage the Cluster Display
PowerHA SystemMirror Configuration.
FC card1 FC card1
Disk
FC card2 subsystem
FC card2
Company network
Figure 2-7 Standard two-node PowerHA cluster hardware
Figure 2-7 shows a standard cluster configuration with two nodes and redundant network and
SAN access. The data is shared with the use of a shared disk subsystem.
PowerHA networks
For PowerHA, networks are paths through which cluster nodes communicate with each other
and with the outside world. CAA heartbeat messages are also sent.
When defining a network, you can choose any name for the network, making it easier to
identify it within PowerHA architecture. If you do not specify a name, PowerHA automatically
assigns a network name by using the net_ether_XX pattern, as shown in Figure 2-8 on
page 19.
Starting with PowerHA 7.1.1, the networks can be public or private. The main difference
between public and private networks is that CAA does not perform heartbeat operations over
a private network.
Change/Show a Network
[Entry Fields]
* Network Name net_ether_01
New Network Name []
* Network Type [ether]
+
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.254.0]
* Network attribute public
+
Private network
service ent0 ent0 service
address private private address
persistent persistent
address address
Company network
Also, PowerHA treats applications use the same approach, as explained previously. Because
each application can have specific procedures for startup and shutdown, PowerHA requires
specific shell scripts to perform applications start and stop operations. This PowerHA control
structure is called application controller scripts.
You need to specify which scripts will be used to start and stop the application services when
brought up or down by the PowerHA cluster, as shown in Figure 2-10.
[Entry Fields]
* Application Controller Name [application01]
* Start Script [/fs1/app01_start.ksh]
* Stop Script [/fs1/app01_stop.ksh]
Application Monitor Name(s) +
Application startup mode [background] +
PowerHA applications can be created by using the smitty sysmirror command and selecting
Cluster Nodes and Networks Cluster Applications and Resources Resources
Configure User Applications (Scripts and Monitors) Application Controller
Scripts Add Application Controller Scripts or by using fast path:
smitty cm_add_app_scripts
A resource is any component that is required to bring one service application up. Using
PowerHA, the resource is able to move from one cluster node to another. A resource can be
any of the following components:
File systems
Raw devices
Volume groups
IP addresses
NFS shares
Applications
Workload partitions (WPARs)
Custom-defined component
To start one application, a set of these components is usually required, and they need to be
grouped together. This logical entity (combined set of resources) in PowerHA is known as a
resource group.
Figure 2-11 on page 22 shows a sample cluster with shared components (IP address, file
systems, volume groups, and so on.).
Shared VG
Shared Filesystems
FC card1 FC card1
Disk
FC card2 subsystem
FC card2
Company network
When defining a resource group by using System Management Interface Tool (SMIT), you
see the panel shown in Figure 2-12.
[Entry Fields]
* Resource Group Name [rg1]
* Participating Nodes (Default Node Priority) [node1 node2] +
Figure 2-12 shows three types of resource group policies that must be configured during
cluster implementation.
Also, regarding resource group startup, there is a parameter that can be customized called
the settling time. When using the settling time, any cluster node waits for the configured time
to make sure that any other higher-priority node is not about to join the cluster. It is an
interesting parameter that can be helpful to use when you have a multiple node cluster, and
all of them start simultaneously.
Fallover policy
The second mandatory policy is called the fallover policy (or failover policy). In a running
cluster, this policy defines the behavior of resource groups when the resource group that
owns the node fails. These are the options for the fallover policy:
1. Fallover to the next priority node in the list: When the node that owns an online
resource group fails, if the resource group is not online on all available nodes, it is brought
online on the next node according to the resource groups participant nodes list
(Figure 2-12 on page 22).
2. Fallover using dynamic node priority: When the node that owns an online resource
group fails, the resource group is moved to another node according to the dynamic node
priority policy that is defined. These policies are based on RSCT variables, such as the
node with the most memory available. Keep in mind that if you choose this option without a
dynamic node priority policy defined, you will encounter an error when you synchronize a
cluster configuration.
3. Bring offline (on the error node only): When the node that owns an online resource
fails, no fallover action will be taken. If the resource group is online at one node per time,
the services will be unavailable until an administrator action. When the resource group is
online on all available nodes, the resource will be offline only on the failing node, and the
resource continues to work properly on all another nodes.
Note: For fallback timer policy configurations, use smitty sysmirror, and then select
Cluster Applications and Resources Resource Groups Configure Resource
Group Run-Time Policies Configure Delayed Fallback Timer Policies or use the
smitty cm_timer_menu fast path.
PowerHA allows customization of predefined cluster events and creation of new events. When
creating new events, it is important to check first whether there is any standard event that
covers the action or situation.
All standard cluster events have their own meanings and functions. Table 2-2 on page 25 lists
examples of cluster events.
Note: All events have detailed use description in the script files. All standard events are in
the /usr/es/sbin/cluster/events directory.
Standby configuration
The simplest cluster configuration is when a physical node is running all services for a
resource group while the other nodes are idle, ready to host resource group services in case
of a main node failure.
Figure 2-13 on page 27 shows that when the sample standby cluster starts, all DB Prod RG
resource group services are brought online at Node 1. However, Node 2 remains idle with no
production service running on it. It is only in the case of a Node 1 failure that the DB Prod RG
resource group will be automatically moved to Node 2.
DB Prod DB Prod
RG Shared VG RG
(active) Shared Filesystems (standby)
FC card1 FC card1
Disk
FC card2 subsystem
FC card2
Company network
Takeover configuration
This allows a more efficient hardware use when all cluster nodes are running parts of the
production workload. A takeover configuration can be split into two possible
sub-configurations: One-sided takeover or mutual takeover. Details of these possibilities are
shown in Figure 2-14 on page 28 and in Figure 2-15 on page 29.
DB Prod DB Prod
RG RG
Shared VG
(active) (standby)
Shared Filesystems
Presentation Server
FC card1 FC card1
Disk
FC card2 subsystem
FC card2
Company network
Note: PowerHA does not use the shared disk capability of CAA.
DB Prod Web
RG RG
(active) Web Shared VG (active) DB Prod
RG Shared Filesystems RG
(standby) (standby)
FC card1 FC card1
Disk
FC card2 subsystem
FC card2
Company network
As shown in Figure 2-15, in a mutual takeover cluster configuration, all application parts are
highly available and managed by resource groups (DB Prod RG and Web RG). When Node 1
has services running in it and that node fails, its services are moved automatically to Node 2.
And in a Node 2 failure, services will be brought online automatically on Node 1. So, any kind
of node crash can be covered by the PowerHA cluster structure with minimal impact to users.
To avoid these issues, PowerHA provides a way to facilitate administrative tasks on all nodes
inside a PowerHA cluster. This is called the Cluster Single Point of Control (C-SPOC).
Using C-SPOC, you can do the following tasks on all cluster nodes:
Control PowerHA services:startup and shutdown
Manage cluster resource groups and applications
Manage cluster nodes communication interfaces
Manage file collections
View and manage logs
Manage AIX user and groups across all cluster nodes
Perform Logical Volume Manager (LVM)tasks
Handle IBM General Parallel File System (IBM GPFS) file system tasks
Open a smitty session on any specific node
http://ibm.co/1s4CRe1
PowerHA SmartAssists
SmartAssists are PowerHA tools that help system administrators include applications in a
cluster infrastructure. Using SmartAssists, you can configure the application in a highly
available cluster and manage the availability of the application with start and stop scripts.
There are many SmartAssist versions available, as shown in Figure 2-16 on page 31.
+--------------------------------------------------------------------------+
| Select a Smart Assist From the List of Available Smart Assists |
| |
| Move cursor to desired item and press Enter. |
| |
| DB2 UDB non-DPF Smart Assist # smass1 smass2 |
| DHCP Smart Assist # smass1 smass2 |
| DNS Smart Assist # smass1 smass2 |
| Lotus Domino smart assist # smass1 smass2 |
| FileNet P8 Smart Assist # smass1 smass2 |
| IBM HTTP Server Smart Assist # smass1 smass2 |
| SAP MaxDB Smart Assist # smass1 smass2 |
| Oracle Database Smart Assist # smass1 smass2 |
| Oracle Application Server Smart Assist # smass1 smass2 |
| Print Subsystem Smart Assist # smass1 smass2 |
| SAP Smart Assist # smass1 smass2 |
| Tivoli Directory Server Smart Assist # smass1 smass2 |
| TSM admin smart assist # smass1 smass2 |
| TSM client smart assist # smass1 smass2 |
| TSM server smart assist # smass1 smass2 |
| WebSphere Smart Assist # smass1 smass2 |
| WebSphere MQ Smart Assist # smass1 smass2 |
+--------------------------------------------------------------------------+
Figure 2-16 Smart Assists available in PowerHA 7.1.3
PowerHA can be used with both virtual and physical devices. It can detect hardware failures
on these servers but there are special considerations when you are designing the virtual
infrastructure:
Use a dual Virtual I/O Server (VIOS) setup for redundancy (strongly recommended).
Configure shared Ethernet adapter fallover.
Configure the netmon.cf file to check the status of the network behind the virtual switch.
Use multiple paths for network and storage devices (strongly recommended).
For cluster nodes that use virtual Ethernet adapters, there are multiple configurations
possible for maintaining high availability at the network layer. Consider these suggestions:
Configure dual VIOS to ensure high availability of virtualized network paths.
Use the servers that are already configured with virtual Ethernet settings because no
special modification is required. For a VLAN-tagged network, the preferred solution is to
use SEA fallover; otherwise, consider using the network interface backup.
One client-side virtual Ethernet interface simplifies the configuration; however, PowerHA
might miss network events. For a more comprehensive cluster configuration, configure two
virtual Ethernet interfaceson the cluster LPAR to enable PowerHA. Two network interfaces
are required by PowerHA to track network changes, similar to physical network cards. It is
recommended to have two client-side virtual Ethernet adapters that use different SEAs.
This ensures that any changes in the physical network environment can be relayed to the
PowerHA cluster using virtual Ethernet adapters, such as in a cluster with physical
network adapters.
For more information: Architectural details of some of the possible PowerHA solutions
using virtual Ethernet are mentioned in section 3.4.1 of the IBM PowerHA SystemMirror
Standard Edition 7.1.1 for AIX Update, SG24-8030.
Important: You can perform LPM on a PowerHA SystemMirror LPAR that is configured
with SAN communication. However, when you use LPM, the SAN communication is not
automatically migrated to the destination system. You must configure the SAN
communication on the destination system before you use LPM. Full details can be found
at:
http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.admn
gd/ha_admin_config_san.htm
PowerHA SystemMirror 7.1.3 supports SAN-based heartbeat within a site.The SAN heartbeat
infrastructure can be created in two ways, depending on the configuration of the nodes that
are members of the cluster:
Using real or physical adapters on cluster nodes and enabling the storage framework
capability (sfwcomm device) of the HBAs. Currently, FC and SAS technologies are
supported. See Setting up cluster storage communication in the IBM Knowledge Center
for more information about the HBAs and the required steps to set up the storage
framework communication:
http://ibm.co/1o5IxTv
In a virtual environment, where the nodes in the clusters are VIO Clients. Enabling the
sfwcomm interface requires activating the target mode (the tme attribute) on the real
adapters in the VIOS and defining a private virtual LAN (VLAN) with VLAN ID 3358 for
communication between the partitions that contain the sfwcomm interface and VIOS. The
real adapter on VIOS needs to be a supported HBA.
Using FC for SAN heartbeat requires zoning of the corresponding FC adapter ports (real
FC adapters or virtual FC adapters on VIOS).
Command output:
8. On the cluster nodes, run the cfgmgr command, and check for the virtual Ethernet adapter
and sfwcomm with the lsdev command.
9. No other configuration is required at the PowerHA level. When the cluster is configured
and running, you can check the status of SAN heartbeat by using the lscluster -i
command:
# lscluster -i sfwcomm
Note: To implement the HyperSwap functionality with the IBM PowerHA SystemMirror
Enterprise Edition 7.1.3 a DS88xx and higher is required.
For more information, see Chapter 7, Smart Assist for SAP 7.1.3 on page 131.
Connectivity for communication must already be established between all cluster nodes.
Automatic discovery of cluster information runs by default when you use the initial cluster
setup (typical) menus found under the SMIT menu Cluster Nodes and Networks. After you
have specified the nodes to add and their established communication paths, PowerHA
SystemMirror automatically collects cluster-related information and configures the cluster
The cluster configuration is stored in a central repository disk, and PowerHA SystemMirror
assumes that all nodes in the cluster have common access to at least one physical volume or
disk. This common disk cannot be used for any other purpose such as hosting application
data. You specify this dedicated shared disk when you initially configure the cluster.
In PowerHA SystemMirror 7.1.3, a new feature has been added that enables Cluster Aware
AIX environment to select IP unicast or IP multicast for heartbeat exchange. This gives
additional flexibility during the configuration of the CAA environment.
Note: Unicast is the default for new created clusters, but multicast is the heartbeat
exchange mechanism for 7.1 migrated clusters.
Note: For more information on cluster setup, see Configuring a PowerHA SystemMirror
cluster in the PowerHA SystemMirror 7.1 section of the IBM Knowledge Center:
http://ibm.co/1qhVwQp
Both of these options can be used to set the host name dynamically. For more details about
how to use it see Chapter 10, Dynamic host name change (host name takeover) on
page 359.
For the complete details of the Cluster Aware AIX (CAA) updates, see 2.2.2, Cluster Aware
AIX (CAA) on page 14.
Node names as shown in Example 3-1 on page 40 are now valid and may be used.
The output can be generated for the whole cluster configuration or limited to special
configuration items such as:
nodeinfo
rginfo
lvinfo
fsinfo
vginfo
dependencies
Tip: For a full list of available options, use the clmgr built-in help function:
clmgr view report -h
Help requirements:
The clmgr man page must be installed.
Either the more or less pager must be installed.
If verbose mode fails, the standard mode is attempted. If the standard mode fails, the original
simple help is displayed. See Example 3-2 for a syntactical help output of the clmgr command
view report.
Max DB 7.8
A cluster partition is when failures isolate a subset of cluster nodes from the rest of the
cluster, for example:
Failure of the links between sites
Multiple failures within a site (requires failures of Ethernet, SAN, and repository access)
The process of partitioning is referred to as a split
The isolated subset of nodes is referred to as a partition
The following are definitions to remember for the split and merge policies:
Split policy A cluster split event can occur between sites when a group of nodes
cannot communicate with the remaining nodes in a cluster. For
example, in a linked cluster, a split occurs if all communication links
between the two sites fail. A cluster split event splits the cluster into
two or more partitions.
When you use the SMIT interface in PowerHA SystemMirror 7.1.3 to configure split and
merge policies, you must stop and restart cluster services on all nodes in the cluster. You can
stop a cluster service before you complete the following steps, or you can configure split and
merge policies in an active cluster and restart cluster services after verification and
resynchronization of the cluster is complete.
To configure a split and merge policy in PowerHA SystemMirror 7.1.3, or later, complete the
following steps:
1. From the command line, enter smitty sysmirror.
2. In the SMIT interface, select Custom Cluster Configuration Cluster Nodes and
Networks Initial Cluster Setup (Custom) Configure Cluster Split and Merge
Policy, and press Enter.
3. Complete the fields as shown in Table 3-2 on page 45, and press Enter.
Split handling Select None, the default setting, for the partitions to operate independently of
policy each other after the split occurs.
Select Tie breaker to use the disk that is specified in the Select tie breaker field
after a split occurs. When the split occurs, one site wins the SCSI reservation
on the tie breaker disk. The site that losses the SCSI reservation uses the
recovery action that is specified in the policy setting.
Note: If you select the Tie breaker option in the Merge handling policy field,
you must select Tie breaker for this field.
Select Manual to wait for manual intervention when a split occurs. PowerHA
SystemMirror does not perform any actions on the cluster until you specify how
to recover from the split.
Note: If you select the Manual option in the Merge handling policy field, you
must select Manual for this field.
Merge handling Select Majority to choose the partition with the highest number of nodes the as
policy primary partition.
Select Tie breaker to use the disk that is specified in the Select tie breaker field
after a merge occurs.
Note: If you select the Tie breaker option in the Split handling policy field, you
must select Tie breaker for this field.
Select Manual to wait for manual intervention when a merge occurs. PowerHA
SystemMirror does not perform any actions on the cluster until you specify how
to handle the merge.
Split and merge Select Reboot to reboot all nodes in the site that does not win the tie breaker.
action plan
Select tie breaker Select an iSCSI disk or a SCSI disk that you want to use as the tie breaker disk.
Notify Interval The frequency of the notification time, in seconds, between the
message to inform the operator of the need to choose which site will continue
after a split or merge. The supported values are between 10 and 3600.
Default Surviving Site If the operator has not responded to a request for a
manual choice of surviving site on a split or merge, this site is allowed to
continue. The other site takes the action chosen under Action Plan. The time
that the operator must respond is Notify Interval times Maximum
Notifications+1.
Note: You can use the SMIT interface to configure split and merge policies in PowerHA
SystemMirror 7.1.2 or earlier as shown in the following website:
http://ibm.co/1tXTdEa
To manually respond to a cluster that goes offline and uses a split policy or a merge policy,
using the IBM Systems Director console perform the following:
1. Log in to the IBM Systems Director console.
2. On the Welcome page, click the Plug-ins tab and select PowerHA SystemMirror
Management.
3. In the Cluster Management section, click Manage Clusters.
4. Right-click the cluster that you do not want to use a split policy or a merge policy, and
select Recovery Manual response to cluster split or merge.
5. Select the site that recovers the cluster, and click OK.
Chapter 4. Migration
This chapter covers the most common migration scenarios from IBM PowerHA 6.1 or
PowerHA 7.1.x to PowerHA 7.1.3. It includes the following topics:
Introduction
PowerHA SystemMirror 7.1.3 requirements
clmigcheck explained
Migration options
Automate the cluster migration check
Note: This chapter does not cover migration from High Availability Cluster Multi-Processing
(IBM HACMP) 5.5 or earlier versions. See 4.4.1, Legacy rolling migrations to PowerHA
SystemMirror 7.1.3 on page 53 for more information on how to migrate from earlier
releases of PowerHA (HACMP).
The success of a migration depends on careful planning. There are important items to keep in
mind before starting a migration:
Create a backup of rootvg from all nodes in the cluster.
Save the existing cluster configuration.
Migrating from PowerHA SystemMirror 6.1 or earlier requires the installation of the following
AIX filesets:
bos.cluster.rte
bos.ahafs
bos.clvm.enh
devices.common.IBM.storfwork.rte
Chapter 4. Migration 51
7. WebSMIT (replaced by the IBM Systems Director plug-in, Enterprise Edition only)
Although PowerHA Enterprise Edition was never supported with WebSMIT, PowerHA
SystemMirror Enterprise Edition 7.1.2 and later is supported with the IBM Systems
Director plug-in.
Important: If your cluster is configured with any of the features listed in points 1
through 4 (above), your environment cannot be migrated. You must either change or
remove the features before migrating, or simply remove the cluster and configure a new
one with the new version of PowerHA.
Important: If nodes in a cluster are running two different versions of PowerHA, the cluster
is considered to be in a mixed cluster state. A cluster in this state does not support any
configuration changes until all of the nodes have been migrated. It is highly recommended
to complete either the rolling or non-disruptive migration as soon as possible to ensure
stable cluster functionality.
Tip: After migration is finished, the following line is added to the /etc/syslog.conf file:
*.info /var/adm/ras/syslog.caa rotate size 1m files 10
The first purpose of this utility is to validate the existing PowerHA cluster configuration. The
tool detects deprecated features, such as the network disk heartbeat, so you can decide to
either remove it before the migration or let the migration protocol remove it when the migration
is being finished.
The second purpose of this utility is to obtain the necessary information to create the
underlying CAA cluster.
Note: The last node in the cluster to run /usr/sbin/clmigcheck creates the underlying
CAA cluster.
Chapter 4. Migration 53
2. Migrate to PowerHA 7.1.3 from 6.1.
See the example for migrating from PowerHA 6.1 to 7.1.3 in the next section, 4.4.2, Rolling
migration from PowerHA SystemMirror 6.1 to PowerHA SystemMirror 7.1.3 (AIX 7.1 TL3 or
6.1 TL9) on page 54.
Note: You might also find it helpful to watch the PowerHA v6.1 to v7.1.3 Rolling Migration
demo on YouTube:
https://www.youtube.com/watch?v=MaPxuK4poUw
Example 4-1 shows that the cluster topology includes a disk heartbeat network. This type of
network is deprecated, and it is automatically removed when the very last node starts cluster
services.
HACMPnode:
name = "hacmp37"
object = "COMMUNICATION_PATH"
value = "hacmp37"
node_id = 1
node_handle = 1
version = 15
HACMPnode:
name = "hacmp38"
object = "COMMUNICATION_PATH"
value = "hacmp38"
node_id = 2
node_handle = 2
version = 15
#[hacmp37] hostname
hacmp37
#[hacmp38] hostname
hacmp38
Chapter 4. Migration 55
Note: If the value of COMMUNICATION_PATH does not match the AIX hostname
output, /usr/sbin/clmigcheck displays the following error message:
------------[ PowerHA System Mirror Migration Check ]-------------
This error requires user intervention to correct the environment before proceeding with
the migration.
<select 1>
1 = DEFAULT_MULTICAST
2 = USER_MULTICAST
3 = UNICAST
6. Per Example 4-3 on page 56, choose one of the following CAA heartbeat mechanisms:
1 DEFAULT MULTICAST
CAA will automatically assign a cluster Multicast IP address.
2 USER MULTICAST
User will assign a cluster Multicast IP address.
3 UNICAST
The unicast mechanism was introduced in PowerHA SystemMirror 7.1.3. Select this
option if the cluster network environment does not support multicast.
Example 4-4, as part of the migration steps, shows the selection of the repository disk.
1 = 000262ca102db1a2(hdisk2)
2 = 000262ca34f7ecd9(hdisk5)
Chapter 4. Migration 57
Note: The following warning message always appears when UNICAST has been
selected (if a repository disk has been assigned, the message can be ignored):
Note - If you have not completed the input of repository disks and multicast
IP addresses, you will not be able to install PowerHA System Mirror.
Additional details for this session may be found in
/tmp/clmigcheck/clmigcheck.log.
10.On hacmp38, stop cluster services with the Move Resource Groups option, and move
them over to hacmp37.
11.Verify that all nodes hostnames are included in /etc/cluster/rhosts:
# cat /etc/cluster/rhosts
hacmp37
hacmp38
12.Refresh the PowerHA cluster communication daemon clcomd:
#refresh -s clcomd
13.Run /usr/sbin/clmigcheck on node hacmp38, as shown in Example 4-6 on page 59.
Example 4-8 Verifying that unicast is in place for CAA inter-node communication
# lscluster -m
Calling node query for all nodes...
Node query number of nodes examined: 2
Chapter 4. Migration 59
Node name: hacmp37.austin.ibm.com
Cluster shorthand id for node: 1
UUID for node: b90a2f9e-611e-11e3-aa68-0011257e4371
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of clusters node is a member in: 1
CLUSTER NAME SHID UUID
cluster3738 0 b9b87978-611e-11e3-aa68-0011257e4371
SITE NAME SHID UUID
LOCAL 1 51735173-5173-5173-5173-517351735173
----------------------------------------------------------------------------
Note: The lscluster -m output on the remote node shows the reverse unicast network
direction:
-----------------------------------------------------------------------
tcpsock->01 UP IPv4 none 10.1.1.38->10.1.1.37
-----------------------------------------------------------------------
16.Install all PowerHA 7.1.3 filesets on node hacmp38 (use smitty update_all).
17.Start cluster services on node hacmp38 (smitty clstart).
18.Verify that the cluster has completed the migration on both nodes, as shown in
Example 4-9.
Note: Both nodes must show CLversion: 15. Otherwise, the migration has not
completed successfully. In that casel, call IBM Support.
Note:
The running AIX level for the following migration is AIX 7.1 TL3 SP0.
The running PowerHA Level is PowerHA 7.1.0 SP8.
Remember the requirements for PowerHA 7.1.3:
AIX 6.1 TL9 SP0 or AIX 7.1 TL3 SP0
1. On node hacmp37 (the first node to be migrated), stop cluster services (smitty clstop)
with the option to Move Resource Groups (this action moves over the resource groups to
hacmp38).
2. Install all PowerHA 7.1.3 filesets (use smitty update_all).
3. Start cluster services on node hacmp37 (smitty clstart).
4. The output of the lssrc -ls clstrmgrES command on node hacmp37 is shown in
Example 4-11 on page 62.
Chapter 4. Migration 61
Example 4-11 lssrc -ls clstrmgrES output from node hacmp37
#lssrc -ls clstrmgrES
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.118
src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1343A_hacmp713
10/21/"
build = "Oct 31 2013 13:49:41 1344B_hacmp713"
i_local_nodeid 0, i_local_siteid -1, my_handle 1
ml_idx[1]=0 ml_idx[2]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 12 <--- This means the migration still in progress !!!
local node vrmf is 7130
cluster fix level is "0"
5. On hacmp38, stop cluster services with the option Move Resource Groups.
6. Install all PowerHA 7.1.3 filesets (use smitty update_all).
7. Start cluster services on node hacmp38 (smitty clstart).
8. Verify that the cluster has completed the migration on both nodes as shown in
Example 4-12.
Updating ACD HACMPnode stanza with node_id = 2 and version = 15 for object
finishMigrationGrace: Migration is complete
Note:
In the following migration example, the nodes are running AIX 7.1 TL2.
This migration example also applies to:
Nodes running PowerHA 7.1.0 and AIX 6.1 TL8 or AIX 7.1 TL2.
Nodes running PowerHA 7.1.1 and AIX 6.1 TL8 or AIX 7.1 TL2.
1. On node hacmp37 (the first node to be migrated), stop cluster services (smitty clstop)
with the option to Move Resource Groups (moves the RGs over to node hacmp38).
2. Apply AIX TL3 - SP1 on node hacmp37, then reboot node hacmp37.
3. Install all PowerHA 7.1.3 filesets (use smitty update_all).
4. Start cluster services on node hacmp37 (smitty clstart).
5. The output of the lssrc -ls clstrmgrES command on node hacmp37 is shown in
Example 4-14.
6. On hacmp38, stop cluster services with the option to Move Resource Groups.
7. Apply AIX TL3 - SP1 on node hacmp38, and then reboot node hacmp38.
8. Install all PowerHA 7.1.3 filesets (use smitty update_all).
9. Start cluster services on node hacmp38 (smitty clstart).
10.Verify that the cluster has completed migration on both nodes, as shown in Example 4-15
on page 64.
Chapter 4. Migration 63
Example 4-15 Verifying completed migration
# odmget HACMPcluster | grep cluster_version
cluster_version = 15
Updating ACD HACMPnode stanza with node_id = 2 and version = 15 for object
finishMigrationGrace: Migration is complete
Note: Both nodes must show CLversion: 15. Otherwise, the migration has not
completed successfully. In that case, call IBM Support.
Tip: A demo of performing a snapshot migration from PowerHA v6.1 to PowerHA v7.1.3 is
available at:
https://www.youtube.com/watch?v=1pkaQVB8r88
The following sections discuss the limitations, dependencies, and steps to prepare for a
cluster migration without running the clmigcheck script.
4.5.1 Limitations
Rolling migrations are not supported to run without using the clmigcheck. This is related to
the change of the cluster service from RSCT (Reliable Scalable Cluster Technology) to CAA
(Cluster Aware AIX) during a rolling migration. The migration must be done at a specific point
in time to ensure successful migration without causing an outage.
Chapter 4. Migration 65
3. The communications path for the node must be set to hostname on all nodes.
4. When migrating to multicast:
a. Choose a free multicast address or use cluster-defined multicast address (see Section
3.1.2, Network considerations, in IBM PowerHA SystemMirror 7.1.2 Enterprise Edition
for AIX, SG24-8106.
b. Test the communication with mping.
5. Choose the future repository disks and note the pvids of the disks.
Most of these steps can be performed in parallel, because the entire cluster will be offline:
1. Stop cluster services on all nodes.
Choose to bring the resource groups offline.
2. Create a cluster snapshot if you have not previously created one, and copy it to /tmp as
backup, as shown in Example 4-17.
Note: Depending on the number of managed service addresses and aliases, it could
take several minutes to convert the snapshot. Please be patient when the snapshot is
running. If you want to ensure that the process is still working, use the proctree
command on the PID of the clconvert_snapshot several times and watch for changing
output.
13.Restore the cluster configuration from the converted snapshot with the clsnapshot
command, as shown in Example 4-19. The command also executes the mkcluster
command that creates the CAA cluster. After the command finishes, the defined hdisk
should display as caavg_private.
Retrieving data from available cluster nodes. This could take a few minutes.
a2
b2
Chapter 4. Migration 67
WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on
during PowerHA SystemMirror startup on the following nodes:
a2
b2
Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and
/etc/rc.net for IP Address Takeover on node b2.
Example 4-20 is used for a two-node cluster that is using unicast communication with
PowerHA 7.1. Unicast communication is supported with PowerHA 7.1 TL3 and later.
Example 4-21 is used for a two-node cluster that is using multicast communication, where the
operator has assigned a special multicast address.
Example 4-21 clmigcheck.txt for stretched cluster using user-defined multicast communication
CLUSTER_TYPE:STRETCHED
CLUSTER_REPOSITORY_DISK:00c4c9f2eafe5b06
CLUSTER_MULTICAST:224.10.10.65
Example 4-22 can be used for a stretched cluster with multicast communication. But this time,
the cluster itself defines the multicast address during migration. This is done by a clearly
defined process that is explained in 3.1.2 Network Considerations, in the IBM Redbooks
publication titled IBM PowerHA SystemMirror 7.1.2 Enterprise Edition for AIX, SG24-8106.
Example 4-22 clmigcheck.txt for stretched cluster using cluster defined multicast communication
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:00c0fb32fcff8cc2
CLUSTER_MULTICAST:NULL
Note: This is a preferred way for migrating several clusters of the same application type:
Use the clmigcheck script on one of the clusters to ensure compliance with the
requirements, and generate files for the other cluster.
Tip: A demo of performing an offline migration from PowerHA v6.1 to PowerHA v7.1.2 is
available on YouTube. The only difference compared to v7.1.3 is that the order of choosing
repository disk and IP address is the opposite, and there is a new option to choose unicast.
http://youtu.be/7kl0JtcL2Gk
Chapter 4. Migration 69
Choose to bring resource groups offline.
2. Upgrade AIX (if needed).
3. Install the additional requisite filesets that are listed in 4.2.1, Software requirements on
page 50.
Reboot.
4. Verify that clcomd is active:
lssrc -s clcomd
5. Update /etc/cluster/rhosts.
Enter either cluster node hostnames or IP addresses, only one per line.
6. Run Refresh -s clcomd.
7. Execute clmigcheck on one node.
a. Choose option 1 to verify that the cluster configuration is supported (assuming no
errors).
b. Then choose option 3.
i. Choose default multicast, user multicast, or unicast for heartbeat.
ii. Choose a repository disk device to be used (for each site, if applicable).
iii. Exit the clmigcheck menu.
c. Review the contents of /var/clmigcheck/clmigcheck.txt for accuracy.
8. Upgrade PowerHA on one node.
a. Install base-level mages only (apply service packs later).
b. Review the /tmp/clconvert.log file.
9. Execute clmigcheck and upgrade PowerHA on the remaining node.
When executing clmigcheck on each additional node, the menu does not appear and no
further actions are needed. On the last node, it creates the CAA cluster.
10.Restart cluster services.
1. On node hacmp37 (the first node to be migrated), stop cluster services (smitty clstop)
with the option to Unmanage Resource Groups as shown in Example 4-23.
5. On node hacmp38, stop cluster services with the option to Unmanage Resource Groups
as shown in Example 4-25.
Chapter 4. Migration 71
cluster fix level is "0"
Updating ACD HACMPnode stanza with node_id = 2 and version = 15 for object
finishMigrationGrace: Migration is complete
Note: Both nodes must show CLversion: 15. Otherwise, the migration has not
completed successfully. In that case, call IBM Support.
2. Change the heartbeat mechanism from multicast to unicast, as shown in Example 4-29.
[Entry Fields]
* Cluster Name cluster3738
* Heartbeat Mechanism Unicast
Repository Disk 000262ca102db1a2
Cluster Multicast Address 228.3.44.37
(Used only for multicast heartbeat)
3. Verify and synchronize the cluster (smitty sysmirror > Cluster Nodes and Networks >
Verify and Synchronize Cluster Configuration).
Verify that the new CAA communication mode is now set to unicast, as shown in
Example 4-30 on page 73.
Chapter 4. Migration 73
74 Guide to IBM PowerHA SystemMirror for AIX, Version 7.1.3
5
The following subsection that follow describe the main IBM Systems Director components.
Management server
The management server is the main entity of the topology, and it has the Systems Director
server packages installed. The management server works as the central point for controlling
the environment inventory, performing the operations on resources, and managing the IBM
DB2 database, where all information is stored.
Common agent
The Common Agent is the basic agent for all managed servers. The agent allows the
Systems Director server to view and manipulate server information and configuration,
including security management and deployment functions. This component is installed on all
servers that the environment administrator wants to have managed by IBM Systems Director.
Each PowerHA node is considered to be a Common Agent because they run the Common
Agent services (CAS). All PowerHA nodes must be discovered by the IBM Systems Director
running on the Management Server.
Additional plug-ins
For specific use, there are additional plug-ins that can be downloaded from the IBM Systems
Director download web page:
http://www.ibm.com/systems/director/downloads/plugins.html
Note: For more information about the IBM Systems Director installation, management, and
operations, see the IBM Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSAV7B/welcome
The information displayed on the console comes from local XML configuration files, and any
changes performed in the PowerHA console are saved into local XML configuration files. This
XML format is new with the 7.1.3 release. It is used as an interchange format between the
PowerHA console and the IBM System Director database on one side and the PowerHA base
product on the IBM AIX side (on the PowerHA node side). As explained later in this section, in
the Planning mode of the cluster simulator, the XML configuration files that are built on the
PowerHA console can be used on PowerHA nodes to deploy the corresponding configuration.
The IBM Systems Director PowerHA console works in two different modes:
Online mode: In Online mode, the console works the way that it has always worked. The
PowerHA console can be used to create a new PowerHA cluster or to manage an already
configured and running PowerHA cluster. Management tasks are performed on a real,
running PowerHA environment (real PowerHA nodes, real PowerHA cluster, and so on),
the same way as before the 7.1.3 release. But new with the 7.1.3 release, this real
configuration can now be exported to an XML configuration file.
Simulated mode: In Simulated mode, the console works as a cluster simulator. It works in
a disconnected fashion (disconnected from real PowerHA nodes) with no impact and no
risk for a real PowerHA environment. Two modes are then possible:
Offline mode: This mode is entirely artificial. All information related to hostnames, IP
addresses, volume groups, file systems, and services is fake and comes from a
hardcoded XML environment file. In this mode, you interact only with an XML
configuration file. You do not interact with a real PowerHA cluster, and you do not even
need to have connection to any PowerHA nodes. In this mode, you use the Systems
Director PowerHA console to create, display, change, or delete your PowerHA
configuration and save it to an XML configuration file, with no possible risk to
production environments. Several offline XML files are delivered, ready to use as
starting points.
Note: To run the simulator in Offline mode, only the PowerHA SystemMirror Director
Server plug-in needs to be installed. Agent nodes are not needed.
Operating system: To run the PowerHA plug-in for Systems Director, the minimum
operating system version is AIX 6.1 TL9 or later or AIX 7.1 TL3 or later. For a managed
server, any operating system supported by Systems Director 6.3 can run the plug-in.
Note: To check all supported environments to run Systems Director 6.3, see the
Operating Systems and Software Requirements section of the IBM Knowledge Center:
http://ibm.co/1uBLLRB
Systems Director server: To support the cluster simulator feature, the minimum Systems
Director server version is 6.3.2 or later.
PowerHA SystemMirror: The minimum PowerHA version supported for the cluster
simulator feature is PowerHA SystemMirror 7.1.3.
The installation is simple. After downloading and uncompressing the plug-in installation
package, for AIX, Linux, or Microsoft Windows running a Systems Director server, just run the
IBMSystemsDirector_PowerHA_sysmirror_Setup.bin binary file that is included in the
package. The installation goes as Example 5-1 shows (this example is running on an AIX 7.1
operating system).
Launching installer...
Graphical installers are not supported by the VM. The console mode will be used
instead...
===============================================================================
Choose Locale...
----------------
1- Deutsch
->2- English
3- Espanol
4- Francais
5- Italiano
6- Portuguese (Brasil)
===============================================================================
Introduction
------------
It is strongly recommended that you quit all programs before continuing with
this installation.
Respond to each prompt to proceed to the next step in the installation. If you
want to change something on a previous step, type 'back'.
===============================================================================
IBM Director Start
------------------
IBM Systems Director is currently running. Do you want IBM Systems Director to be
restarted automatically after IBM PowerHA SystemMirror is installed? Although it
does not need to be stopped in order to install IBM PowerHA SystemMirror, it will
need to be restarted before IBM PowerHA SystemMirror functions are available.
1- Yes
->2- No
ENTER THE NUMBER FOR YOUR CHOICE, OR PRESS <ENTER> TO ACCEPT THE DEFAULT:: 1
===============================================================================
Installing...
-------------
[==================|==================|==================|==================]
[------------------|------------------|------------------|------------------]
Thu Dec 19 10:07:50 CST 2013 PARMS: stop
Thu Dec 19 10:07:50 CST 2013 The lwi dir is: :/opt/ibm/director/lwi:
Thu Dec 19 10:07:50 CST 2013 localcp:
/opt/ibm/director/lwi/runtime/USMiData/eclipse/plugins/com.ibm.usmi.kernel.persist
Considering that the cluster nodes are servers controlled by the managed server, only two
packages must be installed on them:
Systems Director Common Agent 6.3.3 or later
PowerHA SystemMirror for Systems Director 7.1.3 or later
Note: Remember that only the PowerHA plug-in from version 7.1.3 or later allows the
cluster simulator feature.
Both packages can be downloaded from the IBM Systems Director download page at:
http://www.ibm.com/systems/director/downloads/plugins.html
Install the cluster.es.director.agent fileset and Common Agent on each node in the
cluster as you want.
Some subsystems are added as part of the installation: platform_agent and cimsys.
Online mode
From the server side (ISD side), you can export your real configuration to an XML file by using
either the console (see Option A: Export to XML by using the console on page 83) or the
command line (see Option B: Export to XML by using the command line on page 88).
However, the next step is mandatory when exporting the XML configuration, whether using
the console or the command line.
The Export cluster definition options panel opens, as shown in Figure 5-4.
Figure 5-5 Adding the cluster name and selecting the cluster
You can choose to work in Planning mode, but the Planning mode works only with the file in
xml/planning, not with the files in xml/export. Therefore, you must complete the following
manual steps:
1. Change to the directory:
cd
/opt/ibm/director/PowerHASystemMirror/eclipse/plugins/com.ibm.director.power.ha
.systemmirror.common_7.1.3.1/bin
4. Select the radio button to Continue a configuration from an existing planning file, as
shown in Figure 5-8 on page 87.
You are then in Planning mode with the file that you specified as input, as shown in
Figure 5-9.
Figure 5-9 Planning mode using the selected XML input file
Before running the command shown in Example 5-3, check to make sure that you have the
most recent version of the XML env file. If necessary, run the command smcli discoverenv
with the following flags:
-h|-?|--help Requests help for this command.
-v|--verbose Requests maximum details in the displayed information.
Example 5-4 shows the smcli exportcluster command and the output messages.
Notes:
You can work with the console in Planning mode using this file, but this time the exported
file is already in xml/planning (xml/planning/expclu_data_20140418_054125.xml).
Therefore, there is no need for manual copy from xml/export to xml/planning.
PowerHA console mode management can be done by using the command line. To get
help, use the smcli modemngt h v command.
Example 5-5 Mandatory steps for creating the discovered environment XML file
# smcli discoverenv
xml files dir
/opt/ibm/director/PowerHASystemMirror/eclipse/plugins/com.ibm.director.power.ha.sy
stemmirror.common_7.1.3.1/bin
Starting discovery ...
Option C: Export to XML agent side and deploy from XML agent side
This section shows how to use the command line (Example 5-6) on the agent side to export a
real configuration to XML files and then use the generated XML files to deploy the
configuration. The example is a deployment of the XML files that are generated from the
agent side, but you can deploy agent-side XML files, which would have been exported from
the server side in Planning mode.
Example 5-6 Export and deploy XML from the agent side
cd /tmp
mkdir xml
mkdir xml/planning
export PATH=/usr/java6/jre/bin:/usr/java6/bin:$PATH
ClmgrExport
ClmgrExport : Help Verbose
java -DCAS_AGENT=/var/opt/tivoli/ep -cp
/usr/es/sbin/cluster/utilities/clmgrutility.jar
com.ibm.director.power.ha.systemmirror.agent.impl.ClmgrExport --help --verbose
Usage :
Verbose usage :
ClmgrDeploy
ClmgrDeploy Help
java -DCAS_AGENT=/var/opt/tivoli/ep -cp
/usr/es/sbin/cluster/utilities/clmgrutility.jar
com.ibm.director.power.ha.systemmirror.agent.impl.ClmgrDeploy --help --verbose
ClmgrDeploy -h -v
Verbose usage :
-x|--xml : to display contents of xml files, and check them, without deploying
them.
[-i|--isd] : to indicate the command is launched from ISD.
[-D|--Debug {0|1|2|3} ]
0 for no trace info,
1 for trace to Console,
2 for trace to file /tmp/deploy_output.txt,
3 for both.
[ -L|--Level : {SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST} ]
logger level
[ -d|--dir <xmlFilesDir>] : xml files dir, default is /tmp ]
-a|--data <xmlDataFile] : xml file containing the data.
-e|--env <xmlEnvFile] : xml file containing the environment.
Note: PowerHA console XML deployment can be done using command line. To get help,
use this command:
smcli deploycluster h v
The syntax of the XML file (powerha_systemmirror.xsd) contains a full PowerHA data model
(58 enumerative types, 70 entity object types, 30 ref entity object types). The naming in the
XSD file matches the naming of the clmgr command line, and it is very legible, as shown in
Example 5-7 on page 94 and Example 5-8 on page 95.
There are two types of XML files for persistence and separation of logic:
XML env file
XML data file
</Cluster></PowerhaDataConfig></PowerhaConfig>
Note: One XML data file is linked with one XML env file, and one XML env file can be
shared by several XML data files.
A configuration can be prepared by using the console, where it can be adjusted and reviewed
to get it ready for later exporting and deployment. The configurations go into the
xml/planning directory.
The operating system for the test installation was SUSE Linux Enterprise Server 11 SP3 i586.
The 32-bit installation usually includes all libraries that are required for the Systems Director
installation. When using the 64-bit SUSE Linux Enterprise server, some 32-bit libraries are
requested during the installation process, such as this example:
/usr/lib/libstdc++.so.5
For general information regarding the Systems Director installation, see the IBM Systems
Director 6.3 Best Practices: Installation and Configuration, REDP-4932.
When the base Systems Director is installed and the latest updates have been applied, install
the PowerHA plug-in as described in the PowerHA SystemMirror for IBM Systems Director
of the IBM Knowledge Center:
http://ibm.co/1kE9i2y
Example 5-11 shows the import of the XML file into the Systems Director database to make it
available in the web GUI.
Successful
PowerHA SystemMirror Console settings :
- Configuration from /opt/ibm/director/data/powerhasystemmirror.{dat|idx} files.
- Console mode : xml planning mode.
- xml files dir :
/opt/ibm/director/PowerHASystemMirror/eclipse/plugins/com.ibm.director.power.ha.sy
stemmirror.common_7.1.3.0/bin
- With env xml file : xml/planning/mysnap_env.xml
- With data xml file : xml/planning/mysnap.xml
Now, the configuration is available in the Systems Director management GUI. Figure 5-12 on
page 99 shows the screen after logging in to the web GUI by clicking Availability
PowerHA SystemMirror from the left menu.
From the drop-down menu, choose your configuration file, as shown in Figure 5-13.
The Planning mode offers several possibilities to display and manage the cluster and
resource groups. Figure 5-14 on page 100 shows the displayed configuration in Planning
mode after the configuration has been loaded.
There is no need to manually copy the XML files to the cluster node. When running in
Planning mode, the context menu on the selected cluster has a Deploy action that copies the
XML files to the agent node and deploys the cluster.
Although this book focuses on PowerHA mechanisms and techniques, DB2 high availability is
shown here without the DB2 Disaster Recovery support module.
Note: For more information about IBM DB2 high availability and disaster recovery options,
see the IBM Redbooks publication titled High Availability and Disaster Recovery Options
for DB2 for Linux, UNIX, and Windows, SG24-7363:
http://publib-b.boulder.ibm.com/abstracts/sg247363.html?Open
DB2 installation in a cluster requires many prerequisites for proper design and operation. For
this scenario, a basic two-node cluster was built to perform all DB2 drills and high availability
testing.
As Figure 6-1 shows, DB2 services are designed to work on DB2 Server01 server while DB2
Server02 is in standby with no services running. In case of a planned outage, such as
maintenance on DB2 Server01, or an unplanned one, such as a hardware failure on DB2
Server01, PowerHA mechanisms automatically switch the DB2 services to DB2 Server02 to
reduce the service outage duration for users.
Important: Using DB2 high availability with PowerHA does not cover an outage caused by
a total data loss on the shared storage end. If the solution design requires resilience for
data loss, a storage synchronization solution or DB2 High Availability and Disaster
Recovery (DB2 HADR) must be considered.
Note: Remember that all changes must be applied on all cluster nodes.
A standard DB2 recommendation for the min_free value is 4096 for systems with less than 8
GB of RAM memory and 8192 for systems with more than 8 GB of RAM memory.
The second memory-related parameter is max_free. This parameter defines the number of
free pages on the VMM free list, where VMM stops stealing pages from memory. A generic
recommendation for this parameter is min_free + 512, which means if min_free was defined
as 4096, for example, max_free should be 40608.
With the standard values for maxperm% (90), maxclient% (90), strict_maxclient (1), minperm%
(3), and lru_file_repage (0), no changes are required unless any specific performance
behavior is detected that requires customization of these parameters.
To configure IOCP on AIX 7.1, first check the actual device state, as shown in Example 6-1.
As can be seen on Example 6-1, the initial state of the IOCP device on AIX 7.1 is as defined.
Therefore, before start installing DB2, IOCP must be configured. Type smitty iocp
Change / Show Characteristics of I/O Completion Ports to configure it. For the
STATE to be configured at system restart field, select the available option, as shown in
Example 6-2 on page 107.
Note: In case the iocp0 device stays as defined even after the procedure described above,
enter the mkdev -l iocp0 command as root and check it again. If it remains as defined,
reboot the system.
As a best practice in a DB2 system, the same paging space areas organization guidelines
can be followed:
The default paging space (hd6) stored on rootvg with at least 512 MB size.
Multiple paging spaces stored across all available disks with a size of up to 64 GB each.
It is highly recommended to have only 1 (one) paging space area per disk.
It is recommended that each of these users, due to their special rights, be in a dedicated
group to make administration of permissions easy. For this example, we created the following
groups:
db2iadm1 for the instance owner
db2fadm1 for the forced user
dasadm1 for the DAS user
Because the servers being deployed are intended to be part of a cluster sharing
configuration, it is important to create all users home directories on the shared disks to make
sure that any modifications for users data are reflected and accessible for all cluster nodes.
For this scenario, the IP addresses shown in Example 6-4 were defined as to be used in the
cluster as cluster IP addresses.
# Cluster addresses
After the disks are assigned and recognized, an enhanced capable volume group and a JFS2
file systems are created by using these shared disks. All file systems are created with no auto
mount, and the volume group is defined with auto varyon as Off, as shown in Example 6-6.
The 10 GB disk became untouched and is used by CAA during cluster creation.
Note: To avoid SCSI locking issues, it is important to define cluster shared disks with no
reservation policy:
root@blues(/)# chdev -l hdisk4 -a reserve_policy=no_reserve
hdisk4 changed
First, the DB2 installation images must be downloaded from the DB2 for Linux, UNIX and
Windows web page:
http://www.ibm.com/software/data/db2/linux-unix-windows/downloads.html
Note: In case you do not have a valid DB2 license to apply, the trial license is automatically
activated. The software then works for only 90 days.
During this installation scenario, the use of the instance owner user during the installation
process is chosen to make the process simpler. But keep the considerations in Table 6-1 on
page 110 in mind when choosing the user ID to do the installation (root or non-root ID).
After downloading and uncompressing the installation images, as instance owner user
(db2inst1), change to the directory where the images are copied and run the db2_install
installation command, as shown in Figure 6-2.
Then, from the list, click the Install New button, which is just after the DB2 Version 10.5 Fix
Pack 2 Workgroup, Enterprise, and Advanced Editions text section.
Figure 6-3 on page 111 shows the initial installation screen for the DB2 Enterprise Server
installation.
Click Next.
In the next window, the installation type must be selected. For this scenario, we chose
Typical, as shown in Figure 6-5 on page 113.
Click Next.
In the next window, you can choose either to install the server or to create a response file. For
this scenario, we chose Install DB2 Server Edition on this computer, as shown in
Figure 6-6 on page 114.
In the next window, the installation directory is chosen, as shown in Figure 6-7 on page 115.
This installation scenario was performed with a non-root user, so the installation directory is
automatically defined as the installation users home directory (/db2/db2inst1).
Click Next.
The next window shows an Installation Summary (Figure 6-8). If incorrect information
appears, click Back and correct it. If everything is fine, click Finish to begin the installation
process.
After the process is finished, the new sample database can be checked by looking at the
output of db2 list database directory command, run as the db2inst1 user, as shown in
Example 6-7.
PID NAME
-----------------------------------
100-100-01 Snow Shovel, Basic 22 inch
100-101-01 Snow Shovel, Deluxe 24 inch
100-103-01 Snow Shovel, Super Deluxe 26 inch
100-201-01 Ice Scraper, Windshield 4 inch
4 record(s) selected.
After the DB2 installation, the db2nodes.cfg file has the local hostname information, as shown
in Example 6-9.
There are many ways to accomplish the configuration through DB2, PowerHA, and AIX
mechanisms. In this section, we explain several of the options.
Basically, an alias must be added after the local hostname in the hosts file, and then the same
alias is added to the db2nodes.cfg configuration file as shown in Example 6-10 on page 119.
Now DB2 can be correctly started on all cluster nodes with no major changes as shown in
Example 6-11.
db2inst1@blues(/db2/db2inst1/sqllib)$ db2start
12/12/2013 09:28:45 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
db2inst1@jazz(/db2/db2inst1/sqllib)$ db2start
12/12/2013 09:37:08 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
Basically, the db2nodes.cfg file has a simple standard as shown in Example 6-12.
Where nodenumber represents the unique ID for a database server (default is 0), hostname
represents the server (according to the /etc/hosts file), and the logical port represents the
database partition (default is 0).
Considering a two node cluster composed by hosts blues and jazz, the PowerHA scripts
must dynamically generated these two db2nodes.cfg files variations as shown in
Example 6-13.
Example 6-14 Changing DB2 service hostname by using the db2gcf command
db2inst1@jazz(/db2/db2inst1)$ cat sqllib/db2nodes.cfg
0 blues 0
Instance : db2inst1
DB2 Start : Success
Partition 0 : Success
First, you must insert in the DB2 registers the remote shell command to be used. In
Example 6-15, SSH is chosen.
Example 6-15 DB2 register parameters for the remote shell command
db2set DB2RSHCMD=/usr/bin/ssh
After SSH is chosen, an SSH connection between all cluster nodes working without password
and the db2inst1 user are required. This is because some security policies in certain
environments deny the use of the SSH connection with no passwords.
When all requirements are met, the only changes in the PowerHA application startup scripts
include extra parameters for the db2start command, as shown in Example 6-16.
With all services stopped in the first node, umount all DB2 file systems and varyoff the DB2
volume group, as shown in Example 6-18.
Then, all Logical Volume Manager (LVM) information must be imported on the second cluster
node, as shown in Example 6-19.
Example 6-19 Importing DB2 LVM information on the second cluster node
root@jazz(/)# importvg -V 65 -y db2vg hdisk2
db2vg
0516-783 importvg: This imported volume group is concurrent capable.
Therefore, the volume group must be varied on manually.
root@jazz(/)# chvg -an db2vg
db2inst1@jazz(/db2/db2inst1)$ db2start
12/12/2013 13:02:00 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
NAME
--------------------------
Snow Shovel, Basic 22 inch
Snow Shovel, Deluxe 24 inch
Snow Shovel, Super Deluxe 26 inch
Ice Scraper, Windshield 4 inch
4 record(s) selected.
The packages installed on both cluster nodes, blues and jazz, are shown in Example 6-21.
To create an initial cluster configuration, verify that there are no file systems mounted on any
cluster node and that db2vg is set to varyoff.
This opens the panel that is shown in Figure 6-11 on page 125.
[Entry Fields]
* Cluster Namebout repository disk and cluster IP add [db2cluster]
New Nodes (via selected communication paths) [jazz] +
Currently Configured Node(s) blues
Figure 6-11 Creating DB2 cluster topology
After the cluster is created, define the repository disk for Cluster Aware AIX (CAA) by running
smitty sysmirror Define Repository Disk and Cluster IP Address and choosing the
disk to be used, as shown in Figure 6-12.
[Entry Fields]
* Cluster Name db2cluster
* Heartbeat Mechanism Unicast +
* Repository Disk [(00f623c591941681)] +
Cluster Multicast Address []
(Used only for multicast heartbeat)
Figure 6-12 Defining the repository disk for the DB2 cluster
NODE blues:
Network net_ether_01
bluespriv 172.10.10.203
Network net_ether_010
blues 129.40.119.203
NODE jazz:
Network net_ether_01
jazzpriv 172.10.10.225
Network net_ether_010
jazz 129.40.119.225
[Entry Fields]
* IP Label/Address cluster1 +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name net_ether_01
Next, create the application, which is basically the DB2 services and their start and stop
scripts. Enter or type smitty sysmirror Cluster Applications and Resources
Resources Configure User Applications (Scripts and Monitors) Application
Controller Scripts Add Application Controller Scripts, as shown in Example 6-24.
After creating all of the resources, type smitty sysmirror Cluster Applications and
Resources Resource Groups Add a Resource Group to create the DB2 resource group
shown in Example 6-25 on page 127.
With all configurations complete, do another cluster verification and synchronization with
smitty sysmirror Cluster Nodes and Networks Verify and Synchronize Cluster
Configuration.
After this synchronization operation, the DB2 cluster is fully configured and ready to be tested
and validated.
After several seconds, the output of the clRGinfo cluster command shows that the cluster is
active and the db2rg resource group is enabled on the blues cluster node, as shown in
Example 6-27 on page 128.
Node Directory
Node 1 entry:
db2 =>
Database 1 entry:
With the DB2 client properly configured, the connection to the SAMPLE database as
R_SAMPLE can be validated. Example 6-29 on page 129 shows that DB2 is working in the
cluster and answering properly to the network requests.
NAME
--------------------------------
Snow Shovel, Basic 22 inch
Snow Shovel, Deluxe 24 inch
Snow Shovel, Super Deluxe 26 inch
Ice Scraper, Windshield 4 inch
4 record(s) selected.
db2 =>
db2 =>
Waiting for the cluster to process the resource group movement request....
After the cluster stabilizes, it is time to verify that the database connection that points to the
cluster IP address is working, as shown in Example 6-31.
Example 6-31 Testing the database connection on the secondary cluster node
db2 => connect to r_sample user db2inst1
Enter current password for db2inst1:
NAME
-------------------------------
Snow Shovel, Basic 22 inch
Snow Shovel, Deluxe 24 inch
Snow Shovel, Super Deluxe 26 inch
Ice Scraper, Windshield 4 inch
4 record(s) selected.
By checking the tests results (Example 6-31), you can determine that DB2 is working on both
cluster nodes and PowerHA is working with the DB2 services.
Note: For more information about DB2 v10.5 administration, see IBM DB2 10.1 for Linux,
UNIX, and Windows documentation in the IBM Knowledge Center:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp
This chapter is an implementation guide that is based on the design for SAP NetWeaver
non-database components: SAP Central Services, enqueue replication server, application
server instances, and SAP global variables. It covers installation by using the
IBM PowerHA SystemMirror 7.1.3 installation automation tool: Smart Assist for SAP. The
installation was tested with PowerHA 7.1.3, SAP NetWeaver 7.30, and IBM DB2 10.1.
This chapter also documents customization options and deployment alternatives. It includes
the following topics:
Introduction to SAP NetWeaver high availability (HA) considerations
Introduction to Smart Assist for SAP
Installation of SAP NetWeaver with PowerHA Smart Assist for SAP 7.1.3
Install SAP NetWeaver as highly available (optional)
Smart Assist for SAP automation
OS script connector
Additional preferred practices
Migration
Administration
Documentation and related information
In 2013, SAP enhanced this functionality with the SAP HA API. The API links SAP and cluster
products. The major benefits are planned downtime reduction and operational improvements.
For more information, see Achieving High Availability for SAP Solutions:
http://scn.sap.com/docs/DOC-7848
This documentation complements but does not replace the official SAP guides.
Deployment options
Smart Assist for SAP supports the SAP Business Suite 7 for several variations, as described
in the following sections:
The infrastructure design
The software and middleware stack
The front end
Invincible Supply Chain - SAP APO Hot Standby liveCache on IBM Power Systems
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100677
Chapter 6, Implementing DB2 with PowerHA on page 103, describes additional best
practices while setting up PowerHA in general.
Although not in the scope of this chapter, the following disaster recovery (DR) technologies
should be considered (this is a general statement, because DR is not supported with Smart
Assists):
Limited distance (synchronous replication):
IBM HyperSwap (see Chapter 8, PowerHA HyperSwap updates on page 215)
IBM SAN Volume Controller (SVC) stretched cluster
Synchronous mirroring
Important: It is absolutely essential that each SAP instance is configured with its own
virtual IP. This is regardless of whether it is controlled by the cluster or not, because this is
a decision made during installation. As soon an instance is included into the SAP
Landscape Virtualization Manager (LVM) or in a cluster, this becomes a prerequisite.
Load balancing is an SAP extension that is not included in Smart Assist for SAP functions.
However, PowerHA can be used to make a SAP Web Dispatcher highly available. For help in
creating an HA design for the Web Dispatcher, there are several information pages on the
SAP Community Network and in sections in the SAP installation and planning guides.
Note: Controlling the SAP application server instances from inside PowerHA does not
remove the requirement of setting up application servers with identical capabilities on
different nodes, as documented by SAP.
For all nodes, it is essential that there is no possibility of placing multiple instances that
have the same instance numbers on the same node.
It is important to understand that for a high availability (HA) installation, there is nothing like a
traditional central instance anymore, because the entities enqueue and message server are
separated and put into a new instance called the Central Service instance (CS).
In Figure 7-2, we assume that Application Server A is responsible for spool requests. This
would require a second application server on a different node that is also responsible for
spool requests and has sufficient resources to handle the entire workload in case Application
Server A goes down. If Application Server B is responsible for batch processing, there must
be an Application Server B that can handle the workload in addition to its normal load on a
second node.
Note: SAP has a dedicated installation option in the SWPM/sapinst for HA setups. There
are special installation guides that explain the required steps.
enq ms ers
ABAP
The ERS can be enabled by three different methods that are supported by SAP:
SAP polling SAP polling is used if the cluster cannot control the ERS and locate
this instance appropriately. This mechanism depends on a
configuration in the SAP instance profiles and a script to monitor the
local node. Each node has an ERS instance running (either active or
inactive).
Cluster-controlled The cluster-controlled approach is what PowerHA supports. The
enablement is to be performed by setting certain SAP instance profiles
(see Change the instance profiles of the AS, ASCS, SCS, and ERS
on page 191 for details). With this approach, no matter how many
nodes are included, there is only one ERS. In the event of a failover,
the CS instance is automatically relocated to the active ERS node.
Hardware solution This is a unique feature of IBM System z, where the coupling facility is
used.
A cold standby brings the database into an inactive state for rollbacks until operations can be
continued. This can cause an outage of business operations for minutes or even hours.
A hot standby database can roll back the currently failed job. It can be connected by the
application for other transactions in typically less than three minutes and have full
performance capability.
SAP provides a broad selection of database solutions as well as Multiple Components in One
Database (MCOD), three- or two-tier deployments.
To address this, Smart Assist is built in a modular fashion. Just add another application to
Smart Assist for SAP by selecting the corresponding Smart Assist solution. The
documentation can be found in 7.10, Documentation and related information on page 214.
You can request guidance through the ISICC information service ([email protected]) or
through your IBM Business Partner.
The SAP transport directory is to transport updates across the SAP landscape. Therefore, it
is shared by multiple SAP systems as compared to /<sapmnt>/<SAPSID>/, which is shared
among the instances of a single system.
Figure 7-5 File system layout of an ABAP stack (ERP or SCM) without the ERS directory
In Figure 7-6, we have two SAP systems, called PRD and QAS. This figure does not describe
best practices for how to locate PRD and QAS instances and how to make the shares
available. The figure is intended to show mount requirements in the context of the SAP
landscape.
/sapmnt/QAS/
/usr/sap/trans
The SCS and ERS instances of the PRD system are configured to Host A and Host B. This
system is extremely important, so it is highly available by PowerHA. To handle the expected
workload, the PRD application server instances are divided into three hosts: Host A, Host B,
and Host C. This division requires the /sapmnt/PRD shared file system to be mounted to all
three nodes.
Due to the SAP transport landscape requirement of having to transport new developments
and upgrades from a system, the /usr/sap/trans file system must be mounted in all nodes.
Note: For special cases, SAP also provides the option of operating in a configuration
where /usr/sap/trans is not a shared file system. This must be handled according to the
regular SAP documentation. In that case, it is not relevant to clustering.
It is a prerequisite for starting an SAP instance that the SAP global directory be highly
available. Therefore, take special care of the redundancy of the components, including
storage redundancy, virtualization, and capabilities of the chosen technology.
There are different technologies that provide HA capabilities for a shared file system on IBM
AIX, which are valid for a PowerHA based cluster with SAP NetWeaver:
PowerHA crossmounts for each SAP system. This is proven PowerHA technology at no
additional license cost. There are two different deployment options:
Use Smart Assist for NFS to set up an NFS crossmount. This provides the option to set
up an NFSv4 or NFSv3.
Create an NFSv3 crossmount manually.
Check that all remaining hosts have the permission and automated setup to mount the
shared file systems.
A central, highly available NFS server that uses PowerHA (typical deployment option).
Centralizing the NFS server brings benefits in maintenance, patching, and operations.
However, an outage of this cluster has a larger effect compared to separated crossmounts
for each SAP cluster.
A storage filer, such as the V7000U (NFS) or SONAS (GPFS), can be used to provide the
shared file systems to all nodes.
GPFS as an AIX file system is a separately purchased item, but it provides robustness.
The decision for which mount strategy to use is based on what can be maintained best by the
existing skills onsite, the overall architecture, and where the SAP log files should be written.
When using a local disk approach, PowerHA does not monitor the availability of the file
system.
The storage layout must separate LUNs that are used for a local file system that is specific to
a node from those file systems that are moving between nodes. Furthermore, the file systems
being moved must be separated for each instance to allow for independent moves. Therefore,
each SAP instance should have its own LUNs and file systems.
Note: Older dual stack installations often combined both SAP Central Services instances
(ASCS and SCS) into one resource group on one disk with one IP. Also, ERS instances in
older releases were often installed with the hostname of the LPAR or system as an SAP
dependency. This setup can be migrated from a PowerHA base product. But it cannot be
moved to the new Smart Assist for SAP capabilities without first meeting the required
prerequisites.
However, to remain compatible with an earlier version, PowerHA supports all current
Business Suite 7 releases for SAP NetWeaver without the SAP HA API.
In earlier days, if an SAP system was stopped from the SAP Management Console (MMC) or
other tools, the cluster reacted with a failover, which interrupted upgrades and other SAP
maintenance tasks.
SAP provides the option to integrate cluster products with SAP for start, stop, and move
activities. Integration of start, stop, and move operations of SAP with the cluster allows SAP
operators and SAP tools to automatically perform these activities without interruption and
allows special processes to link with the cluster administrator, from a technical point of view.
SAP HA API version 1.0 is implemented with the Smart Assist for SAP 7.1.3 release. The
enablement is optional, and it can be enabled or disabled at the SAP instance level.
Note: The SAP HA certification certifies that mandatory components (SCS and ERS) are
separated and the Software Upgrade Manager (SUM) can safely operate on clustered
instances. It does not cover cluster robustness and test considerations.
You still have the freedom to create a cluster with homemade scripts or to customize
solutions. However, using Smart Assist brings four significant advantages, at no additional
cost, compared to custom solutions:
Speed of deployment (relevant to TCO):
The setup effort is reduced to a few hours, compared to weeks for a custom solution
(especially for larger applications, such as SAP).
Repeatable and proven setup:
Smart Assist is pretested, and you benefit from a lifecycle and migration support when
SAP provides new features. This can improve cluster uptime.
Three-phase approaches for deployment:
a. Discovers running application
b. Verifies setup before clustering
c. Adds cluster configuration and scripts
Full IBM product support, including scripts and cluster configuration, in addition to the
base product support.
The key element in Figure 7-7 on page 141 is the sapstartsrv process of each instance, which
requires PowerHA to start, stop, and monitor scripts to plug into this infrastructure. These
scripts can still serve all of the 7.20 kernel-based NetWeaver releases, starting with
NetWeaver 7.00.
Figure 7-8 on page 142 shows how PowerHA plugs into the framework.
By SAP design, this software covers only the SCS and ERS. IBM has added functionality to
handle application server instances. Databases are not enabled in SAP HA API version 1.0.
Each of them creates a dedicated PowerHA resource group with a share-nothing approach.
Each can move independently, as shown in Figure 7-2 on page 134 and Figure 7-4 on
page 136.
Keep the following considerations in mind while using Smart Assist for SAP:
Smart Assist naming conventions should not be changed, although this is possible and
supported.
Each SAP instance must have its own virtual IP.
Plan on making your infrastructure highly available by using the required disk layout.
Additional resource groups for other applications running on the same cluster nodes can
be created. However, any dependencies between the resource groups should be avoided.
Dependencies can easily cause side effects that can be hard to test for, because they
typically occur only in some cases.
Plan
Review the PowerHA installation prerequisites that are described in 4.2, PowerHA
SystemMirror 7.1.3 requirements on page 50.
The following are the required minimum releases of the IBM software:
AIX operating system (OS) version 6.1 or later
PowerHA version 7.1.3 or later
Smart Assist for SAP supports a traditional two and three node cluster deployment. Although
Smart Assist supports the same number of nodes in the cluster as PowerHA. Evaluate your
business requirements compared to the increased complexity. A typical PowerHA setup for
SAP NetWeaver consists of two nodes.
Ensure you have all nodes appropriately sized based on the expected workload. This
includes disk sizes, CPU and memory. For assistance you can request support from the
ISICC sizing team at [email protected].
Install
After installing the operating system, preferably on a dedicated disk and in a dedicated
volume group, the PowerHA software needs to be installed.
On each node, the following PowerHA software components must be installed as a minimum
to create the base cluster and configure an NFS crossmount and SAP NetWeaver:
cluster.adt.es
cluster.doc.en_US.es.pdf
cluster.doc.en_US.assist.sm
cluster.es.assist.
cluster.es.server
cluster.license
cluster.es.migcheck
cluster.es.cspoc
cluster.man.en_US.es.data
cluster.es.nfs
Verify that you have downloaded the latest PowerHA Service Pack, and be sure to update the
following files:
/etc/hosts: Insert all node names and all service IPs that you plan to include in the
cluster.
/etc/cluster/rhosts: Insert all nodes by IP.
Plan
The application disks are to be separated from the operating systems disk. This results in
having a set of disks for each of the following elements:
The rootvg
The SAP code under /usr/sap
The independently moving SAP instances
Additional disks might be required for the SAP global and the transport directories and for the
database.
The following sections describe the disk considerations and options, which are grouped by
categories:
Basic disks
SAP SCS and ERS
PowerHA crossmount for SAP directories and SAP application server instances
Note: The database disk layout is described in the documentation of the database
deployment in 7.10, Documentation and related information on page 214.
Basic disks
Table 7-1 shows the disks that are available for the basic installation of SAP, AIX, and
PowerHA. It is recommended to separate the components on different disks, not just different
file systems.
Table 7-1 Basic discs for an SAP NetWeaver, AIX, and PowerHA Installation
Disk Instance Mount point Nodes
A local disk stays with the node, and the file system structure must be copied to the second
node (instructions are given later in the implementation flow under SAP sapcpe copy tool on
page 207). The advantage is an easy disk attachment, because there is a 1:1 relationship
between the LUN and the host. Disadvantages include a larger storage requirement and the
inability of PowerHA to monitor the local disk in the same manner as a shared disk. Also
consider the SAP logging implication:
With a local disk approach (see Table 7-3), the SAP logs are written per node.
With a shared disk approach (see Table 7-2), SAP continuously writes to the same log.
Either of the two approaches works for the cluster. A key consideration is whether the
approach fits the available administrative skill set and your overall strategy. Also, consider the
SAP Landscape Virtualization Manager (LVM) requirements.
Although each disk option is available for each installation and can be used independently of
the others, it is highly recommended to use only one option (local or shared disk) for all of
your implementations, for consistency.
For the alternative, a local file system can be implemented where the instances can reside on
a dedicated disk. However, this implementation will consist of additional subdirectories in
/usr/sap/<SID> within the same file system.
Install
The scenario in this example uses an IBM SAN Volume Controller stretched cluster base. It
includes VDisk mirroring for data redundancy through multiple Fibre Channel attachments,
using NPIV adapters in a VIOS implementation. You can use other disk architectures that
provide similar capabilities, granularities, and high availability.
For setup details, see IBM SAN Volume Controller Stretched Cluster with PowerVM and
PowerHA, SG24-8142:
http://www.redbooks.ibm.com/abstracts/sg248142.html?Open
Verify
When the disks are made available to the operating system, the attachment can be verified by
using the lspv command and comparing the physical volume ID (pvid) between the nodes. A
shared disk displays identical pvids, but a local disk displays different pvids.
Table 7-5 shows a prepared PowerHA Smart Assist for SAP cluster where the local disks are
active and the shared disks are not.
Note: The hdisk numbers are not necessarily identical on each node for shared disks.
Table 7-5 shows examples highlighted in blue for shared disk pvids.
The size of the disks can be verified by using the getconf DISK_SIZE /dev/hdisk<x>
command.
Also browse the SAP Notes page to verify whether additional SAP notes were published after
publication of this book:
https://service.sap.com/notes
Note: Check for updates according to the referenced SAP OSS notes.
Plan
There are two significant differences in the design of PowerHA 7.1 and later, compared to
PowerHA 6.1, which should be considered in the planning stage:
Multicast IP:
PowerHA 7.1 and later uses multicasting for heartbeat and cluster communication.
Therefore, the switches in the environment should be enabled for multicast traffic. If
necessary, modify the switch settings. The mping test tool functions similarly to the
point-to-point IP test tool, ping, and it can be used to test multicast connections. Use the
mping tool first at the AIX level to make sure that the multicast packets are flowing between
the nodes. The mping tool requires that you start mping on the receive node first, to look for
a particular multicast address, and then send a packet from the other node, using mping
for that particular multicast address. Any multicast communication issues must be
resolved before starting the cluster.
This implies the all networks defined to PowerHA need to be multicast-enabled.
Also, starting with PowerHA 7.1.3, Unicast IP is again supported as with PowerHA 6.1.
~
"/etc/cluster/rhosts" [Read only] 3 lines, 29 characters
[Entry Fields]
* Cluster Name [SAP_DUAL_localdisk]
New Nodes (via selected communication paths) [as0004lx] +
Currently Configured Node(s) as0003lx
3. Select Unicast as the heartbeat mechanism and select the repository disk, as shown in
Example 7-4. In this case, no multicast address is needed.
smitty cm_setup_menu Define Repository Disk and Cluster IP Address
[Entry Fields]
* Cluster Name SAP_DUAL_localdisk
* Heartbeat Mechanism Unicast +
* Repository Disk [] +
Cluster Multicast Address []
(Used only for Multicast Heart Beating)
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| hdisk2 (00f6ecb511226888) on all cluster nodes |
| hdisk5 (00f6ecb5112266c9) on all cluster nodes |
| |
| F1=Help F2=Refresh F3=Cancel |
F1=Help| F8=Image F10=Exit Enter=Do |
F5=Rese| /=Find n=Find Next |
F9=Shel+--------------------------------------------------------------------------+
Nothing should be defined on this disk, no volume group or logical volumes. PowerHA
finds suitable disks for selection, and then creates its own volume group. This disk is used
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [as0003lx, as0004lx] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? false +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
[Entry Fields]
* Cluster Name [SAP_DUAL_SharedDisk]
New Nodes (via selected communication paths) [as0008lx]
Currently Configured Node(s) as0007lx
[Entry Fields]
* Cluster Name SAP_DUAL_SharedDisk
* Heartbeat Mechanism Multicast +
* Repository Disk [(00f6ecb5acf72d66)] +
Cluster Multicast Address [ ]
(Used only for Multicast Heart Beating)
+---------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| hdisk2 (00f6ecb5acf72d66) on all cluster nodes |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
| /=Find n=Find Next |
+---------------------------------------------------------------------+
Nothing should be defined on this disk, no volume group or logical volumes. PowerHA
finds a suitable disks for selection and then create its own volume group. This disk is used
for internal cluster information sharing and heartbeat information. It is fully reserved for the
cluster.
Example 7-7 leaves the choice to PowerHA to select the right Multicast IP.
4. From the smitty cm_setup_menu, select Setup a Cluster, Nodes and Networks and then
Verify and Synchronize Cluster Configuration.
5. Start cluster services on both nodes (clstart). This is a prerequisite for the subsequent
tasks.
This section gives instructions for the PowerHA user management option.
Plan
The required users (Table 7-7 on page 154) and groups (Table 7-6) are named according to
the SAP SID that is planned for this installation. If the database is also installed, additional
groups and users are required. In Table 7-7 on page 154, the <sid> placeholder is to be
replaced by the SID (in lowercase letters).
Make a list of all LPAR instances where the SAP system is running. For distributed user ID
management, PowerHA also provides a Smart Assist to make your LDAP directory highly
available. In addition, SAP and third-party tools provide user ID management. Select the
technology that best fits your business requirements.
Important: Do not use the SAP global file system as the home directory for the SAP users.
Smart Assist has removed all runtime dependencies on that directory to avoid disruption to
business operations in case of an NFS outage.
Note: You can find a detailed list of prerequisites for users and groups on the SAP
NetWeaver 7.4 page on the SAP website:
http://help.sap.com/nw_platform#section2
Users might change between releases, so please verify the correctness of this information.
sapinst false
sapsys false
Create OS groups
The following steps describe how to create OS groups:
1. smitty cl_usergroup Groups in a PowerHA SystemMirror cluster and then Add a
Group to the cluster.
2. Select the method to use. We used LOCAL for this scenario, as shown in Example 7-9.
3. The design is based on a modular approach, so the group needs to be created on all
nodes. Therefore, press Enter in the screen that follows without entering any details.
4. Create all groups (Example 7-10 on page 155) only with the credentials defined in
Table 7-6 on page 153. Ensure that the same group IDs are defined on all cluster nodes.
Create OS users
The following steps describe how to create OS users:
1. smitty cl_usergroup Users in a PowerHA SystemMirror cluster Add a user to
the cluster.
2. Select the method to use. For this example, we selected LOCAL.
3. The design is based on a modular approach, so the users need to be created on all nodes.
Therefore, press Enter in the screen that follows without entering any details.
4. Create all users (Example 7-11) only with the credential defined in the Table 7-7 on
page 154.
Verify
1. Verify on all LPARs that the SAP system is running the same name, ID, and tunables, as
shown in Example 7-12.
2. Make sure that the ulimit of the root user is set to -1 (unlimited) as shown in Example 7-13.
Plan
For the overall SAP architecture, especially clusters (SAP LVM and other SAP tools), the full
capabilities are achieved only when installing each instance with a dedicated virtual IP.
The network infrastructure with its different zones should be planned by an SAP architect,
because considerations range from communication aspects to security rules for specific SAP
business applications. Ensure that the layout is fully redundant from a path perspective.
Application Server 1 All or local to <sap design> Optional, for test and development,
node A non-production
Install
Prepare the virtual IPs by adding them into the /etc/hosts file on all nodes. They are brought
online later.
Verify
Ensure that the /etc/hosts is the same on all nodes.
Plan
Establishing the directory layout depends on the type of disk attachment chosen in 7.3.2,
Storage disk layout for SAP NetWeaver on page 145.
Install
The setup of disks requires three steps:
1. Set up a local file system bound to a single node.
2. Set up a shared moving file system.
3. Set up NFS crossmount (optional).
Besides the rootvg volume group, with its file systems and disks, the SAP file systems shown
in Table 7-9 need to be created. The table shows the mount point and the suggested VG and
logical volume (LV) naming conventions.
First, verify that the hdisks are not shared between nodes using the lspv command.
In Example 7-14, the AIX commands are listed for manual VG, LV, and file system creation.
On each node, the commands are executed for local VGs.
Example 7-14 Commands to manually create VG, LV, and file system
mkvg -y <vgname> -S hdisk<no>
varyonvg <vgname>
mklv -y'<lvname>' -t'jfs2' -e'x' -u4 <vgname> <490> hdisk<no>
mkdir <mnt>
crfs -A -v jfs2 -d <lvname> -m <mnt> -p 'rw' -a agblksize=<4096> -a logname=INLINE
Note: This is a sample scenario. Sizes, types, and log names might differ.
Example 7-16 Select the physical volume, for the shared disk
+--------------------------------------------------------------------------+
| Physical Volume Names |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| > 00f6ecb5112266c9 ( hdisk5 on all cluster nodes ) |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
| Enter=Do /=Find n=Find Next |
+--------------------------------------------------------------------------+
4. Configure the volume group as shown in Figure 7-9, and adjust your sizing and mirroring
prerequisites.
In our example and as a best practice, the mirroring is performed at the storage level,
using IBM SVC technology. Therefore, this is not configured here.
+--------------------------------------------------------------------------+
| Volume Group Type |
| |
| Move cursor to desired item and press Enter. |
| |
| Legacy |
| Original |
| Big |
| Scalable |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
| /=Find n=Find Next |
+--------------------------------------------------------------------------+
[Entry Fields]
Node Names as0003lx,as0004lx
Resource Group Name [] +
PVID 00f6ecb5112266c9
VOLUME GROUP name [<vgname>]
Physical partition SIZE in megabytes 4 +
Volume group MAJOR NUMBER [39] #
Enable Fast Disk Takeover or Concurrent Access Fast Disk Takeover +
Volume Group Type Scalable
CRITICAL volume group? no +
6. To create the logical volumes (one or more) for the VGs that you created, select smitty
cspoc Storage Logical Volumes Add a Logical Volume.
Inline logs have the advantage of moving automatically along with the file system
whenever the owning RG is moved. A dedicated file system log is generally a good choice,
performance-wise. See Figure 7-11 on page 161.
Rule of thumb: For database data volumes, the dedicated file system log is preferable.
For SCS and ERS, where few changes apply, inline logs are fine.
7. Create the file systems for the previously created volume groups by using C-SPOC as
follows: smitty cspoc Storage File Systems Add a File System. Select the
appropriate volume group and create the file systems as enhanced journaled file systems
according to the deployment-specific sizing. See Figure 7-12.
These are alternatives, but they are not within the scope of this chapter:
Using the NFS v4 Smart Assist for NFS provides the capability to assist in the setup.
In all other cases, ensure that the file systems are mounted or attached to all nodes and
are eligible to be automatically mounted.
4. Create the mount directories /sapmnt and /usr/sap/trans on both nodes (mkdir -p
<dir>).
5. Select smitty sysmirror Cluster Applications and Resources Make Applications
Highly Available (Use Smart Assists) NFS Export Configuration Assistant Add
a Resource Group with NFS exports. See Figure 7-14 on page 163.
[Entry Fields]
* Resource Group Name [rg_nfs]
* Primary Node [as0003lx] +
* Takeover Nodes [as0004lx] +
* Service IP Label [as0009lx] +
Netmask(IPv4)/Prefix Length(IPv6) []
* Volume Groups [vgsapmnt vgtrans] +
* Filesystems/Directories to Export (NFSv2/3) [/export/sapmnt /export/usr/sap/trans] +
* Filesystems/Directories to Export (NFSv4) [NONE] +
* Filesystems/Directories to NFS Mount [/sapmnt:/export/sapmnt] +
* Stable Storage Path (NFSv4) [AUTO_SELECT] +
6. Edit the resource group configuration and change it to Online on first available node
rather than Home node only.
7. Synchronize and verify the cluster.
Note: If this step fails, you might must remove the rg_nfs manually before retrying.
Otherwise, the following error message appears:
Verify
1. Use the follow AIX commands to identify and verify the mapping between the storage disk
and hdisk:
getconf DISK_SIZE /dev/hdisk<x>
lspv (compare pvids)
lscfg vpl hdisk<x>
lsdev Cc disk
lsdev Cc disk F name physloc
2. Ensure that the local file systems are identical on all nodes.
3. Verify LPARs external to the cluster that host additional application server instances.
Note: Check to be sure that the cluster services are running and the cluster is
synchronized and verified. NFS crossmounts must be mounted. You can use the clmgr
online cluster command to assist with this task.
On node 1
All node-specific IPs that belong to node 1 (for example, the application server instances).
All service IPs that move along with a resource group. At this point, you can activate the
IPs of SCSes or ASCSes and of ERSes on the same node. This makes the SAP
installation process more convenient. Smart Assist handles all other necessary items.
On node 2
On node 2, bring online the following IP resources:
All node-specific IPs that belong to node 2 (for example, the application server instances).
All IPs that have not been brought online on node 1 yet but are required for the SAP
installation
Install
In this section, we use a few AIX commands to configure the environment.
AIX commands
Execute the following commands on the node that the resource belongs to.
Troubleshooting varyonvg -c
In some cases, varyonvg -c does not work due to disk reservations. If you check and the
VG is not varied -on for any other node, you can recover as Example 7-17 on page 165
shows.
A list of SAP ports can be found during the installation or in the installation guide on the SAP
NetWeaver 7.4 web page:
http://help.sap.com/nw_platform
The SAP installation manuals can be found on the SAP NetWeaver 7.4 web page:
http://help.sap.com/nw_platform
The SAP software used for this chapter is the SAP NetWeaver 7.40 with an SAP Kernel of
7.20, Patch 402, and patches for the enqueue server executable file of level 423.
Start on the host where the majority of the file systems and virtual IP are brought online
during the node preparation. Then continue with the other hosts.
1. Establish an X forward. In this scenario, VNC was used. The following steps are required:
a. Install the VNC server executable on AIX.
b. Download the VNC client to your workstation and start the VNC server on the LPAR:
#vncserver
You will require a password to access your desktops.
Password:
Verify:
New X desktop is <host>:<display no e.g. 1>
Creating default startup script /home/root/.vnc/xstartup
Starting applications specified in /home/root/.vnc/xstartup
Log file is /home/root/.vnc/<host>:1.log
c. Export the display on the host:
export DISPLAY=:1
2. Connect through the VNC client:
<host>:1
3. Verify that it is working from the client. For example, execute xclock to verify whether a
window is opened.
4. Create the SAP installation directories:
a. Use at least two levels above /. Typically: /tmp/<my sapinstdir>
b. Allocate enough space to /tmp (min 5 GB free space).
For this book, we focus on the SCS and ERP along with a primary application server. Select
all options that apply to your specific setup. See Figure 7-15.
In the following two panels, select your SAP kernel CD in the Media Browser as input and
verify the parameters in the Parameter Summary overview.
Review the results and resolve any conflicts as shown in Figure 7-17.
In this case (Figure 7-17), the CPIC_MAX_CONV variable was missing in .sapenv_<>.sh or .csh
in the sidadm user home.
Central Services (CS) stands for either of the two stacks: ABAP and Java. If a dual stack is
installed for a cluster, this task is performed twice: once for ASCS and once for SCS. The
examples shown here are from an ABAP stack.
1. Start the SWPM installer:
/sapinst SAPINST_USE_HOSTNAME=<ip alias for CS instance>
2. Provide basic configuration data for the SAP system and SAP users. The next screens
require the following information:
Choose to use either the FQDN (fully qualified domain name) or short hostname
setups. All verifications based on the example installation are performed by using short
hostname installations.
In the Media Browser, provide the Kernel media.
In the Master Password panel, choose your master password for all users.
In the two SAP System Administrator panels that follow, ensure that the User and
Group IDs match the previously created IDs. If you have skipped this preparation step,
ensure that these IDs are available on all nodes, and create the users accordingly. See
Figure 7-19 on page 170.
If a dual-stack environment is installed, repeat the previous steps with dedicated IPs, ports,
and numbers. Also, configure the tuning and sizing to the specific workload that is expected.
This dedicated LPAR choice has its advantages, especially when using hot standby database
technology. Also, maintenance and administration are easier and less risky with a clear
separation between the database server and the application server (AS). This should also
help in terms of uptime and migrations.
Depending on the type and vendor of the database, you can find documentation links in 7.10,
Documentation and related information on page 214.
For more information about clusters for SAP landscapes, contact your IBM representative or
the ISICC information service ([email protected]).
The following installation is based on an ABAP stack. Perform these steps on redundant
hosts located on different physical servers.
Run sapinst from a new or empty installation directory. If you would like to keep the
installation information, empty the directory before continuing. If you encounter problems
during the installation, you will need the installation logs. We recommend keeping all
installation logs until the installation is completed successfully.
Provide basic configuration data for the SAP system and users
Enter the SAP profile directory, as shown in Figure 7-24.
Provide the database-connect users for the ABAP or Java schema. For the example, we used
the following defaults:
Provide the remaining required media as prompted by the Media Browser panel.
Figure 7-26 AS installation: Provide message server ports for SCS instance
Figure 7-27 shows the memory requirements for our example environment.
Enter the Internet Communication Manager (ICM) user password, as shown in Figure 7-28.
Enter the administrator and communication user passwords, as shown in Figure 7-30.
If the daaadm user is not in the profile, you will get the warning shown in Figure 7-35.
Figure 7-37 DAA installation: SLD destination for the diagnostic agent
Note: Repeat the same process on a second host or node to meet the required
redundancy of the SAP system.
Use the SWPM to install the agents according to the SAP installation guide as shown in
Figure 7-41.
The DAA instance installation option automatically pops up when using the SWPM on your
first AS installation.
No special actions are required for PowerHA. This is a pure SAP prerequisite, outside of any
cluster control.
Select all OS users by not checking the selection box, as shown in Figure 7-43.
Enter the SAP system ID and database parameters, as shown in Figure 7-44.
Now finalize the process and verify the users and groups created.
Note: This does not apply for the daaadm user, because the daa is per node. But it is a
recommended approach to have identical IDs.
7.5.1 Prerequisites
The logic of Smart Assist requires SAP to comply with the following prerequisites (these
prerequisites might change in future releases):
The SAP file, kill.sap, located inside the instance working directory, is built with
NetWeaver 7.30. This file must contain only the sapstart PID and no other PIDs.
Verify the SAP copy function. The sapcpe utility is configured to let the monitor and stop
scripts inside PowerHA act independently from NFS for all instances. Otherwise, set up
sapcpe as described in 7.7, Additional preferred practices on page 207.
The root user must have a PATH defined and permissions set to be able to execute
cleanipc for each NetWeaver instance under the cluster control. In the scripts, cleanipc is
invoked as this example shows:
eval "cd ${EXE_DIR}; ./cleanipc ${INSTANCE_NO} remove"
7.5.2 Prepare
This section describes the steps to prepare the environment before using Smart Assist for
SAP.
Note: Repeat this for all available instance numbers on the cluster nodes.
Note: The DAA instance belongs to the application server and, in this case, is not
copied.
Provide basic configuration data for the SAP system and users
Begin by entering the profile directory, as shown in Figure 7-47.
Enter the Java database connect user ID, as shown in Figure 7-52.
Note: The additional application server instance (see Figure 7-54) is not aware of the other
dialog instance installed on node A. If you plan to use SAP LVM or cluster application
servers, ensure that the instance number differs from the remote instance.
Typically, use items that are preselected even if they differ from what Figure 7-56 shows.
Start the sapinst installation process. The hostagend should be installed automatically along
with the instance.
Change the instance profiles of the AS, ASCS, SCS, and ERS
The SAP stack requires that you actively enable the instances for the enqueue replication
facility. This involves all three instance types. Also keep these factors in mind:
The ERS must be defined with sufficient resources to execute properly and keep its
resources available.
The SCS instance must know where to replicate the state to.
The AS instances must know that, in case of an outage, they can shortly reconnect and
should wait active.
To make the instance profile changes effective, an SAP restart is required after the changes.
The database can stay online. Any external AS instances must go through the same
procedure as the clustered AS instances.
For updates, see the Setting Up the Replication Server page on the SAP.com website:
http://bit.ly/1vbLQ0O
Note: PowerHA 7.1.3 does not support ERS polling as part of Smart Assist for SAP. The
SAP ERS enablement options are discussed in 7.1.1, SAP NetWeaver design and
requirements for clusters on page 132.
Example 7-21 shows the SCS and ERS sapstartsrv processes from the referenced
installation.
The Smart Assist discovery tool can be executed by the root user:
# /usr/es/sbin/cluster/sa/sap/sbin/cl_sapdiscover -t [GFS/AS/SCS/ERS/DB]
The discovery tool returns 0 if no instance can be found and 1 if one or more matching
instances are found. See Example 7-22 on page 194.
Note: The term SAPNW_7.0_* in the output of Example 7-22 is a legacy naming
convention in Smart Assist. It supports NetWeaver versions above 7.0.
In case the instances cannot be discovered on the node where sapstartsrv is running,
troubleshooting can be done as follows:
# cd /usr/es/sbin/cluster/sa/sap/sbin
# export VERBOSE_LOGGING="high"
# ./cl_sapdiscover -t [GFS/AS/SCS/ERS/DB]
Bring the PowerHA cluster software to the INIT state on both nodes
Ensure that the sapstartsrv processes are running, but not the instances. Then, stop the
cluster with the unmanage all RGs option to bring the cluster into ST_INIT state.
The mount command still shows that all SAP file systems are mounted, including the SAP
global file system.
The ERS, AS, and (A)SCS instances are configured similarly but with different instance
names, IP addresses, and VGs. In the following sections, only an ASCS addition is
described. The instance type differences are highlighted if there are any.
Note: To perform the addition of an ERS instance, the SCS resource groups must be
put into offline state. This is required as the SCS and ERS instances have
dependencies which must be configured in the SCS resource group and its ODM. This
is only possible when it is offline.
5. Finalize the configuration parameters. For the following example, we use the SMIT panel
as shown in Figure 7-62.
Repeat all of these steps for each SAP instance that is part of the cluster. Run a PowerHA
synchronization and verification thereafter.
3. Ensure that the following log file is created and can be written to by root after the
Discovery and Addition. To obtain the log file location, use the following command:
/usr/es/sbin/cluster/sa/sbin/clquerysaapp -a SAP_GLOBALS_HA1 | grep
LOGGER_LOGFILE
4. Ensure that the following log file is created and can be written to by root and the SAP user
<sid>adm after the discovery and addition. To get the location, use the following
commands:
a. Retrieve the SALOGFILEPATH directory:
#cat /usr/es/sbin/cluster/sa/sap/etc/SAPGlobals | grep SALOGFILEPATH | grep
-v KSSLOGFILE
SALOGFILEPATH=$(/usr/es/sbin/cluster/utilities/clodmget -n -q
"name=sapsa.log" -f value HACMPlogs)
b. Execute the clodmget command to obtain the directory.
5. The file is named as defined in /usr/es/sbin/cluster/sa/sap/etc/SAPGlobals:
OSCON_LOG_FILE=$(echo "$SALOGFILEPATH/sap_powerha_script_connector.log")
6. Review the following sections of this chapter to finalize the cluster:
7.5.5, Customize on page 198
7.6, OS script connector on page 206
7.7, Additional preferred practices on page 207
7.5.5 Customize
To address IBM clients demands, Smart Assist for SAP provides options to change between
valid behaviors. In addition, different logging and alerting mechanisms can be chosen.
In the Smart Assist menu (smitty clsa), the following highlighted SMIT panels can be used
to change behaviors and resources including verification. See Figure 7-64 on page 199.
Note: Do not attempt to use any SMIT panels outside of the Smart Assist menus for tasks
that can be performed from within those menus. If you do that, you miss the verification
processes and you have no certainty that the appropriate changes actually took place.
The SMIT panel shown in Figure 7-65 is taken from a SCS instance called ASCS00 with a
SID HA1. If the values or panels differ for AS or ERS instances, this is highlighted. All other
fields are defaults that are valid for CS, ERS, and AS instances.
Change or show the resources that are associated with your application
For the following explanation of changing resources associated with the custom resource
group, see Figure 7-66 on page 201.
1. Select smitty clsa Change/Show the Resources Associated with Your Application.
Note: At the time of writing this chapter, the SMIT panels were being updated, so they
might look slightly different from the SMIT panel shown in Figure 7-67.
Is NFS [1] +
SAPMNT Export Directory [/export/sapmnt]
NFS IP [as0009lx]
Notification Script []
Attention: Setup sapcpe to copy all appropriate information. After each SAP
upgrade, re-verify the sapcpe setup as an SAP upgrade might overwrite settings.
Attention: PowerHA 7.1.3 has a different return code behavior from previous
releases.
EXIT_CODE_MONITOR_sapstartsrv_unavailable
EXIT_CODE_MONITOR_failover_on_gw_outage:
These two variables define the return code of the PowerHA Application Monitor in case
the sapstartsrv process is not available or in case the gateway is not available.
Depending on the landscape and your needs, this might be already an issue that
requires a failover or an operation that can be continued for other landscapes.
The preceding values are per instance. The following values are effective for all instances:
Is this an NFS mountpoint?
Set to 1 in case the SAPGlobal is served by an NFS server. It does not matter whether
from within this cluster or from outside the cluster.
SAPMNT Export Directory, NFS IP:
Defines the NFS export directory and IP if Is this an NFS mount point? is set to 1.
SAPADMUSER:
This is the SAP OS <sid>adm user.
Attention: The SAP <sid>adm user will be called using the LANG C environment, in
case the env output differs between LANG C. Either the environment for the SAP
user for LANG C must be updated or a PMR request to change the ODM entry must
be opened.
LOGGER LOGFILE:
Defines the log file where advanced logging information is written.
CS OS Connector, ERS OS Connector and AS OS Connector:
Online on/off switch for the SAP HA Script connector. Default is 0. As soon as the script
connector is enabled, it must be set to 1 manually.
Not defining any startafter dependency might bring the cluster resources online in
non-specific order. But if the environment starts properly, this is the preferred way doing it.
Please test online both nodes, online single nodes, crash and reintegrate node.
If you configure startafter RG dependencies, start with the SCS instances, followed by the
ERS instances. This results in a consistent startup. However, clusters with or without
startafter dependencies can both work.
7.6.1 Plan
The planning involves verifying whether the SAP release is capable of supporting this function
and which instances should be activated for this (see 7.5.5, Customize on page 198 for
instance attribute settings for CS OS Connector, ERS OS Connector, and AS OS Connector).
Attention: Enable this function only if the following SAP prerequisites are met:
Install with a stand-alone enqueue (CS) and enqueue replication (ERS).
Minimum SAP NetWeaver, kernel, and patch level requirements are met.
Ensure that you are compliant with SAP Note 897933 (Start and stop sequence for SAP
systems).
Debug
The following SAP profile variable can be used to control the debug level of the HAlib:
service/halib_debug_level = <value> (value range 0..3)
Setting this variable to a value of 2 or higher in the SAP instance profile causes more detailed
information to be written to the sapstartsrv.log. To activate it, you need to restart
sapstartsrv.
7.6.3 Verify
Perform start/stop operations for all instances from the SAP Microsoft management console
(MMC). Verify that the SAP application behavior is as expected and verify the log files. The
log files can be found as configured in /usr/es/sabin/cluster/sa/sap/SAP/SAPGlobals.
The following sections provide the required steps for protection against such outages.
Plan
The sapcpe tool physically copies the executable from the SAP global directory
(/sapmnt/SID/) into the instance directory, based on list files, or directories, as specified in
the SAP instance profile. The list files are included with the SAP kernel. Examples are
scs.lst and ers.lst.
These list files do not copy all executables and libraries into the instance directories. To get
full protection, you must extend the function to copy all executables into the instance
directories by either creating your own .lst file, manually copying the entire executable
directory, or modifying the existing list files.
Table 7-11 on page 208 shows the sapcpe enablement methods and gives an overview of the
deployment effort and a pro-and-con decision aid for the different options. The
recommendation is to copy all executables.
Modify existing list files. Add executables to the Easy to initially enable. A kernel upgrade will typically
list files. overwrite these edited files and all
extensions will be lost.
Manually adding executables
includes the risk of missing one.
Add new list files. Create a list file and List files do not get Manually add executables includes
enable it inside the silently overwritten by the risk of missing one. List can
instance profile. a kernel upgrade. change between kernels.
Copy the entire set of Change the sapcpe Do it once. Required for each instance enabled:
executables. command in the 2.5 - 3 GB of space.
instance profile to copy
a full directory.
Install
In this section, changes in the instance profile for the recommended option to copy all
executables are described.
For each SAP instance, the following change in the instance profile (and for older SAP
releases the instance Startup profile) must be made, as shown in Example 7-26.
7.7.2 Logging
Smart Assist provides a method of fine-tuning the logging to avoid log flooding. Before
handover to production, the appropriate log levels must be defined accordingly to the space
and requirements. A full log file directory can result in outages. Therefore, alerts should be
implemented to protect from full file systems.
Besides hacmp.out, PowerHA has Smart Assist-specific logs. Of special relevance is the
/var/hacmp/log/sapsa.log. Besides the default PowerHA logging, two tunable pairs can be
used for advanced logging.
To log detailed SAP command output, select smitty clsa Change/Show the SAP
Instance Attributes.
For each instance, repeat these steps according to the requirements of the business
application.
The first tunable specifies the log level (0 - 3), and the second tunable specifies the
location to write the logs to.
Besides the standard logging, the SAP commands called to start, stop, and monitor will be
logged. For your quality assurance tests, it is recommended to set the level to 3. The
runtime default is 0. See Example 7-27.
Example 7-27 Change/show SAP ERS Instances (s) attribute details menu
Change/Show SAP ERS Instance(s) attributes Details
[TOP] [Entry Fields]
* SAP SYSTEM ID HA2
* SAP ERS Instance(s) Name(s) ERS22
* Application Name SAP_HA2_ERS22
[]
SA SAP XPLATFORM LOGGING [3] +
[]
LOGGER LOGFILE [/var/hacmp/log/SAPutils.log]
7.7.3 Notification
In addition to the standard PowerHA Application Monitor Notification methods, Smart Assist
for SAP provides the option to give advanced alerts and optimization information by enabling
internal notification methods about events. The start and stop monitor scripts can inform
about states where the cluster should continue but should be manually verified if the current
situation degrades the productivity of the business application. It also helps to optimize the
timeout values and other settings over time.
For each instance, repeat the following steps according to the relevance for the business
application. Set the notification level and specify the notification script.
The script is called with following input parameters, which can be used to define the message
inside the notification:
<my notification script>.sh Instance ${INSTANCE} of ${SID} - <description>.\n
7.8 Migration
Migrating to PowerHA 7.1.3 in an SAP landscape has two different aspects:
The first aspect is to migrate the PowerHA base product. This is described in Chapter 4,
Migration on page 49.
The second aspect, which requires more planning, is to also ensure that the business
application logic survives the migration. In general, there are two approaches:
Run the same logic and capabilities with a new cluster base product and ensure that
the transition works.
Enrich the cluster functionality by using new Smart Assist or base product capabilities.
Before considering a Smart Assist migration, verify the capability of your setup to be
rediscovered with the new SAP-integrated solution:
The SCS instances with dual stack must be split into separate resource groups. Verify that
a dedicated service IP aliases have been assigned to each of them.
Verify that each of them has its own VG in case no local file system approach is used.
The ERS instances must follow the same rules as the SCS instances:
Verify whether a dedicated service IP alias has been assigned to each of them.
Verify whether each of them has its own VG in case no local file system approach is used.
If the installation is not compliant with these requirements, you have two options:
1. Reinstall the instances to meet the IP alias and file system requirements. This affects only
the SCS and ERS instances, which can be fast and easy to reinstall. However, the
downtime is a consideration.
The detailed installation steps and Smart Assist addition are described in this chapter,
starting from 7.3, Installation of SAP NetWeaver with PowerHA Smart Assist for SAP
7.1.3 on page 143.
2. Migrate only the base PowerHA product and stay with the 7.1.2 scripts.
7.9 Administration
This section provides administration information.
Note: After the cluster shutdown and before the instance is moved back under cluster
control, ensure that it is in the same operational status (online or offline) and on the same
node as it was before the maintenance mode was enabled.
This maintenance can be performed in two ways as described in the following sections.
Maintenance mode
PowerHA has a maintenance mode for bringing the RGs into an unmanaged state and leaves
the entire infrastructure up, without any kind of cluster protection.
The effect is that no recovery action, even in the case of a node failure, is triggered. This is in
effect for all resources that are controlled in this cluster.
The process starts with the smitty cl_stop SMIT menu, as shown in Example 7-30.
When re-managing the RGs, the cluster manager puts the RGs into the same state as
before the unmanage action. This means that if the application was running, it will be put into
a running state. To do so, start the nodes with the smitty cl_start menu.
As soon the monitor is reactivated, the instance is reactivated, as well as part of the cluster
logic. This is in case the instance was stopped or brought online on a different node than the
associated RG.
In the following panels, select the PowerHA Application Monitor and the associated resource
group to deactivate them.
To enable the monitor again, select smitty cl_admin Resource Groups and
Applications Suspend/Resume Application Monitoring Resume Application
Monitoring.
The HyperSwap technology concept on IBM Power Systems has its roots on IBM System z
mainframe servers, where HyperSwap is managed through the IBM Geographically
Dispersed Parallel Sysplex (IBM GDPS). In Power Systems, PowerHA SystemMirror
Enterprise Edition is the managing software that provides the capability to handle remote
copy and automate recovery procedures for planned or unplanned outages (based on
HyperSwap function).
The HyperSwap feature swaps a large number of devices and enhances application
availability over storage errors by using the IBM DS8000 Metro Mirror Copy Services.
Currently, the HyperSwap function can handle IBM DS8000 Metro Mirror (formerly
Peer-to-Peer Remote Copy, PPRC) relationships (two-site synchronous mirroring
configurations). Additional enhancements are being considered for Global Mirror
configurations. Therefore, configurations with the IBM DS88xx storage systems can be used
for HyperSwap configurations.
The HyperSwap function provides storage swap for application input/output (I/O) if errors
occur on the primary storage. It relies on in-band communication with the storage systems by
sending control storage management commands through the same communication channel
that is used for data I/O.
To benefit from the HyperSwap function, the primary and auxiliary volume groups (LUNs) are
reachable on the same node. Traditional Metro Mirror (PPRC) can coexist. In that case, the
volume group from the primary storage is visible on one site and the secondary volume group
on the secondary site.
The AIX operating system changes for HyperSwap support are described in 8.4.2, AIX
support for HyperSwap on page 220.
Disk I/O
Mg
mt t Cmds
Cm Mgm
ds
AIX Host Enterprise
Storage
Storage Agent
(for example HMC)
Figure 8-1 Out-of-band storage system
As the storage system evolves in size and complexity, out-of-band architecture becomes
inadequate for the following reasons:
The original consideration was moving the storage management communication out of the
data path to eliminate the impact on performance of the critical data throughput. This
Therefore, it becomes necessary to replace the TCP/IP network for the storage management
to support more storage systems. In-band communication is best suited for this purpose.
Figure 8-2 shows an example of in-band management of a storage system.
Both data and storage management share the same Fibre Channel (FC) network. This offers
two key advantages:
The FC network is usually faster than a TCP network (lower latency).
The separate storage agent (for example, the storage Hardware Management Console)
that is used in the out-of-band structure is no longer needed. The management
communication between host server and storage controller becomes more direct and, as
such, more reliable and faster.
Note: Reliable Scalable Cluster Technology (RSCT) is a set of software compnents and
tools that provide a comprehensive clutering environment for AIX. It is or has been used by
products such as PowerHA and GPFS, among others.
Before configuring and enabling HyperSwap, AIX sees PPRC-paired disks hdisk1 and hdisk2,
one from each of two storage subsystems, DS8K1 and DS8K2. In our example, hdisk1 is in
DS8K1 and hdisk2 is in DS8K2. These two disks are both in available state. The AIX node
has four FC paths to hdisk1 and four FC paths to hdisk2.
A new disk attribute, migrate_disk, has been implemented for HyperSwap. When one of the
PPRC paired disks, say hdisk1, has been configured as migrate_disk, its peer-paired disk,
hdisk2, is changed to the defined state. At that point, AIX can see eight paths to hdisk1, which
is in the available state. In case the AIX node cannot access the PPRC source (hdisk1), the
disk from DS8K1, the AIX kernel extension changes the path to access to the disk on DS8K2
while still using hdisk1 in AIX. This is called HyperSwap and is usually apparent to the
application.
The previous statements are logical consequences of how mirror groups are managed in a
HyperSwap environment. You can swap the disks that belong to a mirror group as a group
and mirror groups at their turn, one by one, in case of manual swap, and all together due to an
unplanned HyperSwap.
These are the requirements for AIX and the DS8800 microcode:
AIX 7.1 TL3 or later, or AIX 6.1 TL9 or later.
PowerHA SystemMirror 7.1.3 Enterprise Edition.
If all file sets of PowerHA SystemMirror 7.1.3, Enterprise Edition, are not installed, check
that the following HyperSwap-specific file sets are installed:
cluster.es.genxd.cmds
cluster.es.genxd.rte
devices.common.IBM.storfwork.rte
devices.common.IBM.mpio.rte
When other LUNs from the same LSSes exist and some disks from the same LSSes exist in
one mirror group, you cannot activate HyperSwap. HyperSwap is the application that assures
data consistency on the target storage.
HyperSwap does not automatically transfer the SCSI reservations (if any) from the primary to
the secondary disks.
The host connect for every PowerHA SystemMirror node must be defined in the storage side
as having this profile: IBM pSeries - AIX with Powerswap support. If the host connection has
been defined in the storage system, it can be changed easily by using the chhostconnect
command at the storage level, as shown in Example 8-1.
All available profiles in the DS8800 storage system can be found with the lsportprof
storage_image_id command. The storage image ID is obtained with the lssi command.
The IBM pSeries - AIX with Powerswap support profile does not have a default association for
hostconnect HostType. Therefore, modifying the corresponding PowerHA SystemMirror node
hostconnect can be done only by using the chhostconnect -profile "IBM pSeries - AIX
with Powerswap support" <host_id> command.
For example, we configure a Metro Mirror relationship for a pair of volumes. Assuming that the
LSS planning has been done, we validate for our LSS C9 in storage IBM.2107-75NR571 and
LSS EA on storage IBM.2107-75LY981 that we have available the PPRC ports, as shown in
Example 8-2. In our example, the dscli version 6.6.0.305 is used.
Example 8-2 List of available PPRC ports between IBM.2107-75NR571 and IBM.2107-75LY981
dscli> lssi
Date/Time: February 6, 2014 4:49:03 PM CST IBM DSCLI Version: 6.6.0.305 DS: -
Name ID Storage Unit Model WWNN State ESSNet
=============================================================================
ds8k5 IBM.2107-75NR571 IBM.2107-75NR570 951 5005076309FFC5D5 Online Enabled
dscli>
dscli> lssi
Date/Time: February 6, 2014 4:48:34 PM CST IBM DSCLI Version: 6.6.0.305 DS: -
Name ID Storage Unit Model WWNN State ESSNet
=============================================================================
ds8k6 IBM.2107-75LY981 IBM.2107-75LY980 951 5005076308FFC6D4 Online Enabled
dscli> lsavailpprcport -remotedev IBM.2107-75NR571 -remotewwnn 5005076309FFC5D5 ea:c9
Date/Time: February 6, 2014 4:17:37 PM CST IBM DSCLI Version: 6.6.0.305 DS:
IBM.2107-75LY981
Local Port Attached Port Type
=============================
I0000 I0130 FCP
I0000 I0131 FCP
I0000 I0132 FCP
I0000 I0133 FCP
I0001 I0130 FCP
I0001 I0131 FCP
...................<<snippet>>.........
I0202 I0131 FCP
I0202 I0132 FCP
I0202 I0133 FCP
The desired LSSes are not yet available on the storage side because they do not own any
volume. So we create new volumes, as shown in Example 8-3.
Now that the LSS is available and shown by using the lslss command, we create the PPRC
paths for the chosen LSSes, as shown in Example 8-4 on page 227.
Now the PPRC relationship has been established and the disks can be configured at the
operating system level.
The multipath driver used for specific storage families in the AIX operating system
configuration can be found easily and configured by using the manage_disk_drivers
command, as shown in Example 8-6.
Use the same command for activating AIX_AAPCM as the default driver, as shown in
Example 8-7. Changing the multipath driver requires a system reboot.
After reboot, verify the present configured driver for the 2107DS8K device that represents the
DS8xxxx storage family, as shown in Example 8-8.
Note: The DS8800 SDDPCM driver is not supported. If the DS8K Subsystem Device
Driver Path Control Module (SDDPCM) driver is installed, it should be removed. By using
the NO_OVERRIDE option, you can use SDDPCM to manage DS8000 storage systems
families. We do not have SDDPCM installed on our system, so we left the NO_OVERRIDE
value unchanged.
Since AIX 7.1 TL2, the ODM unique type field for DS8K managed by AIX Path Control
Mode changed from disk/fcp/mpioosdisk to disk/fcp/aixmpiods8k. This change does
not affect software the SDDPCM.
The command for changing the attributes in Table 8-1 is shown in Example 8-9.
Note: By default, dynamic tracking is enabled on all systems that are running AIX 7.1.
The command available for verifying and cancelling the disk reservation while PCM is the
default driver is devrsrv -c query -l hdisk_name. The command output for hdisk31 is shown
in Example 8-10.
Note: The reservation policy can be also changed to no reserve by using the chdev a
reserve_policy=no_reserve l hdisk_number command.
The lsattr -Rl hdisk_name -a san_rep_device command does not provide information
regarding expected values.
Note: When san_rep_device is yes, the hdisk is configured for PPRC. The unique_id is
based on the Copy Relation ID from Inquiry Page 0x83, rather than the LUN ID Descriptor
from Inquiry Page 0x80.
The san_rep_cfg attribute determines what types of devices are configured as HyperSwap
disks:
none = [DEFAULT] Devices are not to be configured for PPRC.
revert_disk The selected hdisk is to be reverted to not PPRC-configured and
keeps its existing name. The -U switch used along with the
revert_disk parameter in the chdev command allows you to
maintain the hdisk name while is reverted but for the secondary
device.
migrate_disk The selected hdisk is to be converted to PPRC-configured and
keeps its existing name.
new Newly defined hdisks will be configured as PPRC, if capable.
new_and_existing New and existing hdisks will be configured as PPRC, if capable.
Existing hdisks will have a new logical hdisk instance name, and the
previous hdisk name will remain in Defined state.
In Example 8-11 on page 231, hdisk31 is replicated by Metro Mirror to hdisk71. This pair of
disks is configured for HyperSwap on AIX. The storage disk membership is shown in
Table 8-2.
Example 8-11 Replication status for hdisk31 and hisk71 on the storage side
lspprc -fullid -l 4404
Date/Time: November 25, 2013 4:44:54 PM CST IBM DSCLI Version: 6.6.0.305 DS: IBM.2107-75TL771
ID State Reason Type Out Of Sync Tracks
Tgt Read Src Cascade Tgt Cascade Date Suspended SourceLSS Timeout (secs) Critical Mode
First Pass Status Incremental Resync Tgt Write GMIR CG PPRC CG isTgtSE DisableAutoResync
================================================================================================
================================================================================================
=========================================================================================
IBM.2107-75TL771/4404:IBM.2107-75LY981/0004 Full Duplex - Metro Mirror 0
Disabled Disabled Invalid - IBM.2107-75TL771/44 60 Disabled
Invalid Disabled Disabled N/A Disabled Unknown -
Example 8-12 Disk attributes before transforming into HyperSwap capable disks
root@r6r4m51:/testhyp> lsattr -El hdisk31 |egrep 'reserve_policy|san_rep_cfg|san_rep_device'
reserve_policy no_reserve Reserve Policy True+
san_rep_cfg none SAN Replication Device Configuration Policy True+
san_rep_device detected SAN Replication Device False
root@r6r4m51:/testhyp> lsattr -El hdisk71 |egrep 'reserve_policy|san_rep_cfg|san_rep_device'
reserve_policy no_reserve Reserve Policy True+
san_rep_cfg none SAN Replication Device Configuration Policy True+
san_rep_device detected SAN Replication Device False
Using the lspprc command at the AIX operating system level, verify the disks so that you
know exactly what is the current information regarding Peer-to-Peer Remote Copy status, as
shown in Example 8-13.
Configure hdisk31for HyperSwap as the principal (source) disk, as shown in Example 8-14.
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY9810
Device Specific.(Z7)..........0004
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........004
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY981000407210790003IBMfcp
Logical Subsystem ID..........0x00
Volume Identifier.............0x04
Subsystem Identifier(SS ID)...0xFF00
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........4000400400000000
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E393330
Serial Number.................75TL7714
Device Specific.(Z7)..........4404
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........404
Device Specific.(Z2)..........075
Unique Device Identifier......200B75TL771440407210790003IBMfcp
Logical Subsystem ID..........0x44
Volume Identifier.............0x04
Subsystem Identifier(SS ID)...0xFF44
Control Unit Sequence Number..00000TL771
Storage Subsystem WWNN........500507630affc16b
After configuring the disk for HyperSwap, hdisk71 has the status of Defined and only hdisk31
is available. The lspprc command indicates the paths for the HyperSwap disk, as shown
Example 8-15.
Note: At any time, only one of the two path groups is selected for I/O operations to the
hdisk. The selected path group is identified in the output by (s).
At this time, the HyperSwap disk configuration has been performed without unmounting file
systems or stopping the application.
The migrating disk should be the primary disk when it is migrated to the HyperSwap disk.
Otherwise, if the auxiliary disk is chosen instead of the primary, and the primary disk is part of
a volume group, the message from Example 8-16 appears.
Example 8-16 Choosing hdisk71 as migrated disk for HyperSwap instead of hdisk31
root@r6r4m51:/testhyp> chdev -l hdisk71 -a san_rep_cfg=migrate_disk -U
Method error (/usr/lib/methods/chgdisk):
0514-062 cannot perform the requested function because the
specified device is busy.
Important: If the primary disk does not belong to any volume group or the volume group is
varied off, the chdev command succeeds for the auxiliary disk (PPRC target). In this case,
even if the PPRC replication direction is reversed on the storage side, on the AIX operating
system, the disk is not seen with the required information. The entire process for migrating
disk should be redone.
This is not a recommended method for enabling HyperSwap by using a secondary disk.
Manufacturer................IBM
Machine Type and Model......2107900
Part Number.................
ROS Level and ID............2E313336
Serial Number...............75LY9811
...........................<<snipped text>>..............................
Device Specific.(Z7)........1004
We swap hdisk45 to auxiliary storage, and unconfigure it as HperSwap disk by using the
chdev command, as shown in Example 8-19. At the end, the hdisk is configured as hdisk45
and reverted disk but on auxiliary storage.
Hdisk45 is swapped
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY9811
Device Specific.(Z7)..........1004
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........004
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY981100407210790003IBMfcp
Logical Subsystem ID..........0x10
Volume Identifier.............0x04
Subsystem Identifier(SS ID)...0xFF10
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........4010400400000000
root@r6r4m51:/work> lspprc -v hdisk10
Invalid device name hdisk10
root@r6r4m51:/work> cfgmgr
Manufacturer..................IBM
In some cases, if the HyperSwap is enabled on the disk that is the target on Metro Mirror
replication, the disk is not usable for HyperSwap. To be HyperSwap functional, you must set
up its revert_disk attribute and then follow the procedure for activating HyperSwap on the
primary disk again.
This is an important feature, because you can reconfigure HyperSwap mirror groups by
adding or removing disks in the mirror group configuration. It is no longer necessary to bring
the resources offline while the HyperSwap mirror group is configured, as in the previous
version of PowerHA SystemMirror 7.1.2 Enterprise Edition.
Node-level unmanage mode is the main feature used when mirror group reconfiguration is
required.
Follow these steps for adding or replacing new disks in a resource group protected by
HyperSwap:
1. Configure new disks for HyperSwap to have the same Metro Mirror replication direction.
2. Stop PowerHA system services, and leave resource groups in an Unmanaged state.
3. Modify the mirror groups configuration by adding or removing new disks.
4. Configure the corresponding resource groups to reflect the configuration of the new mirror
group definition.
5. Start PowerHA SystemMirror services, leaving the resource groups in Unmanaged state.
6. Verify and synchronize the cluster configuration.
7. Bring resource groups online.
8. Validate the configuration.
Adding and removing disks is shown in the Oracle Node HyperSwap and Oracle RAC
active-active configuration in 8.20.2, Adding new disks to the ASM configuration: Oracle RAC
HyperSwap on page 294.
The requirements for single-node HyperSwap deployment are the same as the requirements
for single node in multi-node cluster deployment:
DS88xx and higher with minimum microcode level 86.30.49.0 or higher
AIX version 6.1 TL9 or AIX 71 TL3
PowerHA SystemMirror 7.1.3
FC, FCoE, NPIV are only supported host attachment connection
Functions:
It offers storage protection in case of a primary storage failure.
It is configured when delivered, so it does not require a second node to form a cluster.
Limitations:
Extending single-node HyperSwap cluster by adding other nodes in configuration is not
possible, because since the sites cannot be added in single-node HyperSwap mode. The
entire cluster configuration should be performed after the cluster-wide policies are set to
disabled. In this case, the single-node HyperSwap configuration is lost.
While a node is configured for a single-node HyperSwap, it cannot be added as a node
into another stretched or linked cluster.
This does not provide protection for node failure, because there is only one node.
The Oracle single-node database installation has a grid infrastructure and a database home
directory on the file systems that are created on a volume group, with disks that are
configured for HyperSwap.
The destination of the databases data files are the raw disks that are managed by the ASM
configuration and HyperSwap.
The disks were configured in the PowerHA SystemMirror node. Their designated roles are
shown in Table 8-4. The procedure for configuring a disk for the HyperSwap environment is
described in 8.13, Configure disks for the HyperSwap environment on page 229.
In this example, we use a single files system for GRID HOME and Database HOME created
on the oravg volume group with the /u01 and ORACLE_BASE /u01/app/oracle mount point.
ASM uses the hdisk41, hdisk61, and hdisk63 disks.
Configuring disks for HyperSwap requires careful planning regarding the LSSes used, the
replication direction for the HyperSwap disks, and which disk on the system is configured as
the primary disk when migrate_disk and no_reserve_policy attributes are set.
In Example 8-20, the HyperSwap-configured disks are shown using the disks_hyp.sh script
for a quick view of the HyperSwap pair disks.
The storage device ID in Site A is 75TL771, and the storage device ID on Site B is 75LY981.
The replication direction is from Storage A to Storage B for all Metro Mirror replicated disks.
The cluster has been created, and the output is shown in Example 8-22.
[TOP]
Cluster Name: one_node_hyperswap
Cluster Type: Stretched
Heartbeat Type: Unicast
Repository Disk: None
Cluster IP Address: None
NODE r6r4m51:
Network net_ether_01
r6r4m51 9.3.207.109
r6r4m51:
Hdisk: hdisk0
PVID: 00cdb3119e416dc6
[MORE...428]
F1=Help F2=Refresh F3=Cancel
F6=Command
F8=Image F9=Shell F10=Exit
/=Find
n=Find Next
We create the repository disk on hdisk4, which is not a HyperSwap disk in the first phase. The
disk attributes values are shown in Example 8-23 on page 242. The reserve_policy attribute
is also set as no_reserve.
Example 8-24 Define a repository disk and cluster IP address for a single-node HyperSwap
Define Repository Disk and Cluster IP Address
[Entry Fields]
* Cluster Name one_node_hyperswap
* Heartbeat Mechanism Unicast +
* Repository Disk [(00cdb3119ad0e49a)] +
Cluster Multicast Address []
(Used only for multicast heartbeat)
The cluster repository has been configured and verified, as shown in Example 8-25 on
page 243.
HACMPsircol:
name = "one_node_hyperswap_sircol"
id = 0
uuid = "0"
ip_address = ""
repository = "00cdb3119ad0e49a"
backup_repository = ""
The single-node HyperSwap has the key configuration point and the activation of the
cluster-wide HyperSwap policies. This activation operation brings two sites into the
configuration that are defined internally and not visible in the site configuration menu.
The sites are configured automatically, using cluster-wide HyperSwap policies. The cluster
definition and node selection are preliminary steps in the configuration.
Example 8-26 Define cluster-wide HyperSwap policies for single-node HyperSwap activation
Define Cluster wide HyperSwap Policies
[Entry Fields]
Single node HyperSwap Enabled +
The field for a single-node HyperSwap is shown enabled by default. We choose to leave that
default setting.
The activation procedure requires not to have sites already configured in PowerHA
SystemMirror since enabling the single-node HyperSwap feature automatically adds the sites
in the cluster configuration.
Activating the cluster-wide HyperSwap policies provides only the completion status of the
running command. Behind, two sites only for storage site association are configured. The
HACMsite ODM class is not populated and the sites are not shown as site definitions. The
sites Site1_primary and Site2_secondary are used internally by the clxd daemon.
In the next steps, we configure the DS8000 Metro Mirror (in-band) resources by using this
smitty fast_path to define and configure both storages systems, as shown in Example 8-27 on
page 244: smitty cm_cfg_strg_systems or smitty sysmirror Cluster Applications and
[Entry Fields]
* Storage System Name [STG_A]
* Site Association +
* Vendor Specific Identifier IBM.2107-00000LY981 +
* WWNN 500507630AFFC16B +
lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk
x Site Association x
x x
x Move cursor to desired item and press Enter. x
x x
x Site1_primary x
x Site2_secondary x
x x
x F1=Help F2=Refresh F3=Cancel x
F1=Hx F8=Image F10=Exit Enter=Do x
Esc+x /=Find n=Find Next x
F9=Smqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
In the same way, the secondary storage is added and configured, with the site association as
Site2_secondary. Both storage systems are defined as Metro Mirror resources, as shown in
Example 8-28.
NAME="STG_B"
TYPE="ds8k_inband_mm"
VENDOR_ID="IBM.2107-00000TL771"
WWNN="500507630AFFC16B"
SITE="Site2_secondary"
ATTRIBUTES=""
root@r6r4m51:/usr/es/sbin/cluster/utilities>
The associated data for the storage configurations can be obtained by using the odmget
HACMPxd_storage_system command, as shown Example 8-29 on page 245.
HACMPxd_storage_system:
xd_storage_tech_id = 5
xd_storage_system_id = 7
xd_storage_system_name = "STG_A"
xd_storage_vendor_unique_id = "IBM.2107-00000LY981"
xd_storage_system_site_affiliation = "Site1_primary"
xd_storage_system_wwnn = "5005076308FFC6D4"
HACMPxd_storage_system:
xd_storage_tech_id = 5
xd_storage_system_id = 8
xd_storage_system_name = "STG_B"
xd_storage_vendor_unique_id = "IBM.2107-00000TL771"
xd_storage_system_site_affiliation = "Site2_secondary"
xd_storage_system_wwnn = "500507630AFFC16B
The next configuration step is the mirror group definition. The definition specifies the logical
collection of volumes that must be mirrored to a storage system on the remote site.
Note: In case of volume replication or path recovery, the HyperSwap function makes sure
to perform a re-sync automatically for auto. For manual, a user- recommended action
would be displayed in errpt for a HyperSwap-enabled mirror group and in hacmp.out for
the HyperSwap-disabled mirror group. It is best to configure a split and merge policy when
using auto.
In this step, we add in User Mirror Group configuration to the volume group oravg and as
RAW disks were selected hdisk41, hdisk61 and hdisk63.
[Entry Fields]
* Mirror Group Name [ORA_MG]
Volume Group(s) oravg
Raw Disk(s) hdisk61:3be20bb3-2aa1> +
HyperSwap Enabled +
Consistency Group Enabled +
Unplanned HyperSwap Timeout (in sec) [60] #
HyperSwap Priority Medium
Recovery Action Manual +
Unplanned HyperSwap timeout remains momentary unchanged. This value represents how
long a connection remains unavailable before an unplanned HyperSwap site failover occurs.
After the Mirror Group definition, based on Metro Mirror replication direction, associated
storage systems are automatically added to the Mirror Group definition as shown in
Example 8-31.
[Entry Fields]
Mirror Group Name ORA_MG
New Mirror Group Name []
Volume Group(s) oravg +
Raw Disk(s) [3be20bb3-2aa1-e421-ef> +
Associated Storage System(s) STG_A STG_B +
HyperSwap Enabled +
Consistency Group Enabled +
Unplanned HyperSwap Timeout (in sec) [60] #
HyperSwap Priority Medium
Recovery Action Manual +
Re-sync Action Automatic +
[Entry Fields]
* Resource Group Name [ORARG]
* Participating Nodes (Default Node Priority) [r6r4m51] +
Further on, the resource group configuration is performed and the previous mirror group
defined is also indicated in DS8000-Metro Mirror (In-band) Resources entry field, as shown
in Example 8-33. In this step, as we proceeded for User Mirror Group definition, we select the
volume group oravg, and also the raw disks hdisk41, hdisk63, and hdisk61, based on each
disks UUID.
Example 8-33 Resource group attributes for using HyperSwap Mirror Group
Change/Show All Resources and Attributes for a Resource Group
Service IP Labels/Addresses [] +
Application Controllers [] +
[BOTTOM]
Specifying the DS8000-Metro Mirror (In-band) Resources is required for all resource groups
that have disks protected by the HyperSwap function.
Note: The raw disks are identified by UUID. Disk UUID can be obtained using lspv -u.
Adding raw disks by UUID does not require a PVID.
The DS8000-Metro Mirror (In-band) Resources field is not automatically populated, even if
the volume groups are already part of a resource group while you indicate the disks or volume
groups that are protected by HyperSwap.
We proceed with verifying and cluster synchronization, and then we start the PowerHA
SystemMirror services.
Note: During the verify and synchronize step, this message appears:
Mirror Group ORA_MG has the Recovery Action set to manual. In case of a site
outage, this resource will not be automatically failed-over, and a manual intervention will
be required to resolve the situation and bring the RG online on the secondary site.
Because we want to have more flexibility for disk management and not be dependent of an
exact disk location on the system, the ASM disks are configured to use a special file that is
created with the mknod command, as shown in Example 8-35 on page 249. Also, the required
permissions were added on the corresponding raw disks.
Now, it is time to install the Oracle single-instance database with rmed database data files
configured to use AMS disk group +DATA. The database resource is shown configured in
Example 8-36.
See Oracle Grid Infrastructure for a Standalone Server on the Oracle Database Installation
Guide web page for details about installation and configuration of Oracle single-instance
database:
http://docs.oracle.com/cd/E11882_01/install.112/e24321/oraclerestart.htm
$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE r6r4m51
ora....ER.lsnr ora....er.type ONLINE ONLINE r6r4m51
ora.asm ora.asm.type ONLINE ONLINE r6r4m51
ora.cssd ora.cssd.type ONLINE ONLINE r6r4m51
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE r6r4m51
ora.ons ora.ons.type OFFLINE OFFLINE
We also create and configure the itsodb database, which has database files on ASM disk
group +DATA. Available resources are shown in Example 8-37.
Before the new ASM disk addition, we start the database loading by using Swingbench and
execute the procedure for inserting data into the table. The load runs continuously until the
reconfiguration is finished.
The Swingbench workload starts at 14:24, continuing until the new disk is added. The entire
load is shown in Figure 8-6.
Figure 8-6 I/O Megabytes and I/O Requests per second during Swingbench load
We introduce to the existing configuration a new disk, hdisk42, which has the same Metro
Mirror replication direction, as shown in Example 8-38. First, the disk is configured with the
same permissions required by ASM. We also create the special pseudo device file by using
the mknod command.
We monitor the database alert log file and ASM log file to make sure that they are visible
during the disk addition, as shown in Example 8-39.
We stop PowerHA services and leave the resource group in Unmanaged state, as shown in
Example 8-40.
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [r6r4m51] +
BROADCAST cluster shutdown? false +
* Select an Action on Resource Groups Unmanage Resource Gro> +
root@r6r4m51:/> clRGinfo -p
$ sqlplus / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
STATUS
------------
OPEN
We start the PowerHA SystemMirror services, as shown in Example 8-41, without bringing
the resource group online.
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [r6r4m51] +
* Manage Resource Groups Manually +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
COMMAND STATUS
We add the Mirror Group configuration and add the new disk to the resource group. Notice
that all of the disks should be picked up again from the list.
This time, the database is not affected. During the synchronization, in the clxd.log, we see
that the mirror group is reconfiguration for new disks, as shown in Example 8-42.
Now, we bring the resource group online, as shown in Example 8-43. We also monitor the
database activity.
COMMAND STATUS
[TOP]
Attempting to bring group ORARG online on node r6r4m51.
Waiting for the cluster to process the resource group movement request....
We validate that there were no error messages and verify the clxd.log, the hacmp.out log, and
the database alert log, as shown in Example 8-45.
We configure the new added disk and verify the addition in the ASM configuration, as shown
in Example 8-46.
Adding the new ASM disk to the DATA disk group is shown in Example 8-47.
Diskgroup altered.
We verify that disks in the configuration have the same data replication direction. As we
expected, all disk sources are on the storage with wwpn 500507630affc16b and their
replicated targets are on the auxiliary storage with wwpn 5005076308ffc6d4, as shown in
Example 8-48.
In case of storage failure in one site, the I/O is transparently routed to the remaining site as a
function of HyperSwap. One cluster node in the configuration must remain active to monitor
and keep the application running.
To provide a workload on the configured database that has data files on ASM, besides writing
directly on disks during tests when ACFS is configured, we use Swingbench as a load data
generator and as a benchmark. See the Swingbench website for installation and configuration
details.
In our tests, we use the benchmark order entry, which provides a PL/SQL stress test model.
This test is based on static PL/SQL with a small set of tables that are heavily queried and
updated.
Also see the Oracle white paper titled Evaluating and Comparing Oracle Database Appliance
Performance:
http://www.oracle.com/technetwork/server-storage/engineered-systems/database-appli
ance/documentation/oda-eval-comparing-performance-1895230.pdf
In parallel with Swingbench, we use a PL/SQL procedure to insert data into the database.
This data is composed of a generated sequence, corresponding system timestamp, and the
name of the instance when the insert is performed. The procedure is shown in Example 8-49.
These operations are performed, and each is described further in this section:
1. Planned HyperSwap from Storage A to Storage B
2. Storage migration: New storage is added in PowerHA configuration, and the HyperSwap
configuration is used for migration between Storage B and Storage C
3. Unplanned HyperSwap is performed after the migration
As stated previously, because it is only a one-cluster node, there is no flexibility to move the
applications to another site. Only the storage disks can be swapped.
In this scenario, we perform a planned HyperSwap operation for the ORA_MG mirror group
while the Swingbench OE benchmark runs. To do that, you can use smitty fast path
cm_user_mirr_gp or this path: smitty cspoc Storage Manage Mirror Groups
Manage User Mirror Group(s).
We select the desired mirror group for which we perform the planned swap as shown in
Example 8-50.
Example 8-50 Performing planned swap for the ORA_MG user mirror group
Manage User Mirror Group(s)
[Entry Fields]
* Mirror Group(s) ORA_MG +
* Operation Swap +
??????????????????????????????????????????????????????????????????????????????
The HyperSwap operation is triggered, and we verify the disk status, as shown in
Example 8-51.
Note: The single-node HyperSwap recovery action is set to Manual. As such, you must
swap back the mirror group before the cluster is restarted. If manual recovery is not
performed and you restart the cluster, you will be able to bring up the resource groups but
the mirror group relation will not be started.
All process events are logged in the clxd.log as shown in Example 8-52.
When a physically storage relocation is required, maintaining the imposed replication limit of
maximum 100KM, HyperSwap can be used to achieve the business continuity storage related
without a scheduled outage. In this case, all the disks are swapped to the storage that will
remain in place. If the entire site is relocated, a HyperSwap PowerHA SystemMirror
active-active configuration could be put in place.
The storages and disk configurations are shown in Table 8-5 and Table 8-6 on page 262.
The storage migration is performed having configured a database on the node and an
appropriate workload using Swingbench load generator.
The disks provided from each storage are shown in Table 8-6 on page 262.
On the Storage_C, the r6r4m51 node is not configured for HyperSwap, as shown in
Example 8-53.
Configuring the node for in-band communication and HyperSwap capable requires the host
profile for the corresponding hostconnect to be IBM pSeries - AIX with Powerswap support
as shown in Example 8-54.
We change the hostconnect with above profile (Example 8-54) as shown in Example 8-55.
Example 8-55 Changing hostconnect profile to IBM pSeries - AIX with Powerswap support
dscli> chhostconnect -profile "IBM pSeries - AIX with Powerswap support" 0058
Date/Time: December 19, 2013 6:57:18 PM CST IBM DSCLI Version: 6.6.0.305 DS:
IBM.2107-75NR571
CMUC00013I chhostconnect: Host connection 0058 successfully modified.
dscli> chhostconnect -profile "IBM pSeries - AIX with Powerswap support" 0059
Date/Time: December 19, 2013 6:57:30 PM CST IBM DSCLI Version: 6.6.0.305 DS:
IBM.2107-75NR571
CMUC00013I chhostconnect: Host connection 0059 successfully modified.
The disks are located in Storage_A. We swap the disks to storage_B since the storage_A is
being migrated.The disks are swapped to site SITE_B and their configuration is shown in
Example 8-57.
Example 8-57 Validating disk location on remaining storage after swap operation
root@r6r4m51:/> lspprc -Ao |egrep 'hdisk31|hdisk61|hdisk63|hdisk41|hdisk42'
hdisk31 Active 1(s) 0 5005076308ffc6d4 500507630affc16b
hdisk41 Active 1(s) 0 5005076308ffc6d4 500507630affc16b
hdisk42 Active 1(s) 0 5005076308ffc6d4 500507630affc16b
hdisk61 Active 1(s) 0 5005076308ffc6d4 500507630affc16b
hdisk63 Active 1(s) 0 5005076308ffc6d4 500507630affc16b
At this time, we stop the cluster services and leave the resource group in Unmanaged state
as shown in Example 8-58.
DATE
-----------------
19-12-13 19:12:14
Using the disk reverting procedure, we revert the disk to have the same number when the
primary storage is removed as shown in Example 8-59.
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY9810
Device Specific.(Z7)..........0004
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........004
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY981000407210790003IBMfcp
Logical Subsystem ID..........0x00
Volume Identifier.............0x04
Subsystem Identifier(SS ID)...0xFF00
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........4000400400000000
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY9810
Device Specific.(Z7)..........0E04
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........E04
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY9810E0407210790003IBMfcp
Logical Subsystem ID..........0x0e
Volume Identifier.............0x04
Subsystem Identifier(SS ID)...0xFF0E
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........400e400400000000
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY9810
Device Specific.(Z7)..........0E05
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........E05
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY9810E0507210790003IBMfcp
Logical Subsystem ID..........0x0e
Volume Identifier.............0x05
Subsystem Identifier(SS ID)...0xFF0E
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........400e400500000000
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY981E
Device Specific.(Z7)..........E204
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........204
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY981E20407210790003IBMfcp
Logical Subsystem ID..........0xe2
Volume Identifier.............0x04
Subsystem Identifier(SS ID)...0xFFE2
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........40e2400400000000
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY981E
Device Specific.(Z7)..........E700
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........700
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY981E70007210790003IBMfcp
Logical Subsystem ID..........0xe7
Volume Identifier.............0x00
Subsystem Identifier(SS ID)...0xFFE7
We remove the PPRC relationships for the corresponding disks as shown in Example 8-62.
Depending on the disk load, it is recommended to use the pausepprc command to pause all
mirrored disks before removing the relationships with the rmpprc command operation.
Note: rmpprc with -quiet switch can be used to eliminate the operation confirmation.
We check if the disks are visible on the system as shown in Example 8-63.
The configured disks are in bold in Example 8-63. The rest of the disks have the ID shown in
Example 8-64.
We remove the disks as shown in Example 8-65. Considering the storage is removed from
our configuration, we do not have to change the LUN masking for corresponding disks on the
storage side.
Only after chdev -l hdisk# -a revert_disk -U the disks that are taken out and have none
for VG.
In case you have Volume Groups on these hdisks, they will be seen by the system and you
must exportvg and importvg these disks after their removal.
We create at this time the volumes, the PPRC paths and we start the Metro Mirror replication
for the corresponding volumes on the new attached storage subsystems as shown in
Example 8-66.
Example 8-66 Create pprcpaths between remaining storage and the new auxiliary storage
On the storage Storage_B
dscli> lssi
Date/Time: December 19, 2013 8:21:45 PM CST IBM DSCLI Version: 6.6.0.305 DS: -
Name ID Storage Unit Model WWNN State ESSNet
=============================================================================
ds8k6 IBM.2107-75LY981 IBM.2107-75LY980 951 5005076308FFC6D4 Online Enabled
dscli> lssi
Date/Time: December 19, 2013 8:31:46 PM CST IBM DSCLI Version: 6.6.0.305 DS: -
Name ID Storage Unit Model WWNN State ESSNet
=============================================================================
ds8k5 IBM.2107-75NR571 IBM.2107-75NR570 951 5005076309FFC5D5 Online Enabled
We establish the Metro Mirror relationships for the corresponding disks as shown in
Example 8-67.
On the node, we observe that the disks were added in number of 5 as shown in
Example 8-68.
Setting up disk attributes is a required task for newly added disks from the storage
STORAGE_ C before migrating them to the HyperSwap configuration as shown in
Example 8-69 on page 270.
Now we configure the previous reverted disks to be migrate_disk as shown in Example 8-70.
Since the storage has been changed, we must reconfigure the configure DS8000 Metro
Mirror (In-Band) Resources to reflect the new changes.The storages remain as they are
defined and the new storage STORAGE_C is added on the primary site as shown in
Example 8-71.
Example 8-71 Adding the new storage system on the primary site
Add a Storage System
[Entry Fields]
* Storage System Name [STG_C]
* Site Association Site1_primary +
* Vendor Specific Identifier IBM.2107-00000NR571 +
* WWNN 5005076309FFC5D5 +
We configure again the mirror group adding the disks in the configuration as shown in
Example 8-72 on page 271.
[Entry Fields]
Mirror Group Name ORA_MG
New Mirror Group Name []
Volume Group(s) oravg +
Raw Disk(s) [f64bde11-9356-53fe-68> +
Associated Storage System(s) STG_C STG_B +
HyperSwap Enabled +
Consistency Group Enabled +
Unplanned HyperSwap Timeout (in sec) [60] #
HyperSwap Priority Medium
Recovery Action Manual +
Re-sync Action Automatic +
The storage systems appear with the new relationship after re-adding the disks.
Note: You do not have to reconfigure the Associated Storage System(s) field. Even if this
field is modified, it shows the relationship from the primary site to the secondary site.
We also configure the resource group disks to reflect the new changes, and verify and
synchronize the cluster at this point.
We bring the resource group online and validate that the database is still open and functional
as shown in Example 8-73.
????????????????????????????????????????????????????????????????????????????
? Select a Resource Group ?
Since we have only one cluster node, a message as shown in Example 8-74 appears in the
clxd.log.
As such, we stop again the services, put the resource group in Unmanaged mode and
reverse the PPRC relationship to be from STG_C to STG_B. The process is shown in the
Example 8-75.
On the storage side we reverse the Metro Mirror relation to be from STG_C to STG_B
as shown below:
At this time, we start the PowerHA services and bring online the resource group. The
operation status is displayed in Example 8-76.
..............................<<snipet>>........................................
Starting Cluster Services on node: r6r4m51
This may take a few minutes. Please wait...
r6r4m51: start_cluster: Starting PowerHA SystemMirror
r6r4m51: Dec 19 2013 21:31:26 Starting execution of
/usr/es/sbin/cluster/etc/rc.cluster
r6r4m51: with parameters: -boot -N -A -b -i -C interactive -P cl_rc_cluster
r6r4m51:
r6r4m51: Dec 19 2013 21:31:26 Checking for srcmstr active...
r6r4m51: Dec 19 2013 21:31:26 complete.
START_TIME
-----------------------------
19-dec-2013 15:54:06
This scenario simulates an unplanned HyperSwap by deactivating the zones on the SAN
switches. In Example 8-78, we present the existing zoning configuration for the cluster node
r6r4m51.
Example 8-78 Two zones defined for r6r4m51 per attached storage
zone: r6r4m51_fcs0_ds8k5
DS8K5_I0130; DS8K5_I0131; DS8K5_I0132; r6r4m51_fcs0
zone: r6r4m51_fcs0_ds8k6
r6r4m51_fcs0; DS8K6_I0204; DS8K6_I0205
zone: r6r4m51_fcs1_ds8k5
DS8K5_I0130; DS8K5_I0131; DS8K5_I0132; r6r4m51_fcs1
zone: r6r4m51_fcs1_ds8k6
r6r4m51_fcs1; DS8K6_I0204; DS8K6_I0205
We validate the replication direction for all disks which are swapped from Storage_C to
Storage_B. The disk configurations is shown in Example 8-79.
Before the SAN zones deactivation, we generate some traffic to load the database and also in
parallel, generate activity on the disks used for applications binaries. In Example 8-80 we
observe disks activity using iostat monitoring tool at the ASM level.
We deactivate the zones between node r6r4m51 and the active storage DS8k5, as shown in
Example 8-81 on page 275.
Using the Enterprise Manager, we observe how the database behaves while the disks are
swapped in the auxiliary storage. The graphic is shown in Figure 8-8.
Figure 8-8 Database load and behavior when unplanned swap is performed
We observe what logs were produced after the unplanned swap operation has been
triggered. Example 8-82shows the swap operation events logged by syslog in the
/var/hacmp/xd/log/syslog.phake file.
We find also errors in errpt where the paths are mentioned to failed and also PPRC LUNs
failed as shown in Example 8-84.
The recovery procedure for an unplanned swap scenario considering that the root cause for
the failed storage has been solved should take into account the disk replication direction since
the automatic recovery is only manual.
If the cluster has been restarted, you must manually reverse the disk replication direction,
start the cluster and check the clxd.log for completion of the mirror group start. Otherwise, if
the cluster was not restarted, you can use the CSPOC menu to swap the disks after all
prerequisites for starting mirror group are met (the disks are seen correctly HyperSwap
enabled on the system, the disk paths are not failed, disk status is not suspended, etc.).
If the LSSes for the disks configured in the system mirror group overlap with other disks LSS,
disks configured either on another mirror group or disks with the same LSS on your system,
then you get the error RC=22 when verification and synchronization is performed.
In our system, we use the pair of disks shown in Example 8-85 on page 279.
[Entry Fields]
* Mirror Group Name [rvg_mg]
Volume Group(s) rootvg
* HyperSwap Enabled
Consistency Group Enabled
Unplanned HyperSwap Timeout (in sec) [60]
#
HyperSwap Priority High
.........................<<snippet>>............................
r6r4m51: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m51: rvg_mg:Site1_primary:Site2_secondary:STG_C
We validate the PPRC paths at the AIX level, as shown in Example 8-89.
Then, we use iostat to start monitoring the activity of hdisk84, as shown in Example 8-91.
Meanwhile, we perform a swap operation for the system mirror group and verify the clxd.log,
as shown in Example 8-92 on page 281.
Therefore, we observe during the swap that, only for one second, the disk was not available,
as shown in Example 8-91 on page 280.
We start writing on the rootvg hdisk by using the dd command, as shown in Example 8-93.
We deactivate the zones communication between host r6r4m51 and the storage DS8K5, as
shown in Example 8-94.
Example 8-94 Deactivating the zones communication between the host and storage
hastk5-12:admin> zoneremove "r6r4m51__ds8k5", "DS8K5_I0130;
DS8K5_I0131;DS8K5_I0132"
hastk5-12:admin> cfgsave
You are about to save the Defined zoning configuration. This
action will only save the changes on Defined configuration.
Any changes made on the Effective configuration will not
take effect until it is re-enabled.
Do you want to save Defined zoning configuration only? (yes, y, no, n): [no] y
Updating flash ...
We observe the transition status at the writing rate of 240 MB/s, as shown in Example 8-95,
and we note that the swap time took place in 25 seconds.
..........................<<snippet>>.......................................
Also, we validate the paths at the AIX operating system level, as shown in Example 8-97.
Concurrent workloads across sites, such as Oracle Real Application Clusters (RAC), are
supported. Concurrent resource groups are also supported in stretched clusters and linked
clusters that are using HyperSwap-enabled mirror groups.
Inter-site communication is critical due to carrying network and storage-replicated data. When
inter-site communication is lost, a split-brain situation occurs. To avoid this, be sure to define a
decision mechanism that decides where the activity must continue.
HyperSwap relies on the IBM DS8K Metro Mirror Copy Services. A synchronous replication
data mechanism is recommended to be used for a maximum distance of 100KM. The
distance is imposed by the speed light in fibre, which is about 66% of the speed light in a
vacuum. Also, many network equipment components can be between the sites. These hops
can also add packet processing time, which increases the communication latency. Therefore,
when you plan to deploy a stretched cluster, take all network communication parameters into
account in terms of latency, bandwidth, and specific equipment configuration, such as buffer
credits at the SAN switch level.
PowerHA SystemMirror 7.1.3 Enterprise Edition added unicast heartbeat support. This
provides an alternative to the existing multicast heartbeat method. Either heartbeat option
can be used within a site, but only unicast is used across sites.
There are many applications that require a specific network setup for deployment, especially
when they are meant to be configured in a stretched cluster. To prevent and align a specific
network configuration when an application takes advantage of PowerHA SystemMirror
HyperSwap protection in a stretched cluster configuration, technologies such Multiprotocol
Label Switching (MPLS), Overlay Transport Virtualization, and QFabric (minimizing distance
at 80KM between sites) should be taken into account.
In our test configuration, we deploy an Oracle Real Application Cluster (RAC) on a PowerHA
SystemMirror Enterprise Edition stretched cluster, with the HyperSwap function enabled, with
two sites with two nodes per site. Details and information about Oracle RAC can be found on
the Oracle Real Application Clusters page on the Oracle website:
http://www.oracle.com/technetwork/database/options/clustering/overview/index.html
Also see the Oracle white paper titled Oracle RAC and Oracle RAC One Node on Extended
Distance (Stretched) Clusters:
http://www.oracle.com/technetwork/products/clustering/overview/extendedracversion1
1-435972.pdf
The requirements for the AIX operating system and the DS88xx microcode level are
mentioned in 8.7, HyperSwap environment requirements on page 222.
The Oracle RAC version used in our tests is 11.2.0.3 with patch 6 applied (Patch 16083653).
It is highly recommended to apply all grid infrastructure and database patches at the latest
available and recommended versions.
In this section, we describe various tests of HyperSwap functionality with Oracle RAC in the
following scenarios:
Planned HyperSwap
Unplanned HyperSwap: Sstorage failure for Site_A but not for Site_B
Unplanned HyperSwap: Storage from Site_A unavailable for both sites
Unplanned HyperSwap: Site A failure
Tie breaker disk consideration in a HyperSwap environment
CAA dynamic disk addition in a HyperSwap environment
Online storage migration in ORACLE RAC: HyperSwap
Figure 8-9 Oracle RAC, two sites and two nodes per site
The lines between the LPARs and the storage systems are the SAN zones. The required
networks for Oracle RAC are stretched across the sites by using Overlay Transport
Virtualization, MPLS, QFabric, and so on. Also, the storage subsystems are configured for
IBM Metro Mirror PPRC.
HOSTS -------------------------------------------------------------------------
satsspc2.austin.ibm.com
-------------------------------------------------------------------------------
hdisk84 IBM.2107-75NR571/C304
hdisk85 IBM.2107-75NR571/C305
hdisk86 IBM.2107-75NR571/C404
hdisk88 IBM.2107-75NR571/C501
hdisk89 IBM.2107-75NR571/C502
hdisk97 IBM.2107-75NR571/C901
HOSTS -------------------------------------------------------------------------
satsspc4.austin.ibm.com
-------------------------------------------------------------------------------
hdisk83 IBM.2107-75NR571/C304
hdisk84 IBM.2107-75NR571/C305
hdisk85 IBM.2107-75NR571/C404
hdisk87 IBM.2107-75NR571/C501
hdisk88 IBM.2107-75NR571/C502
hdisk99 IBM.2107-75NR571/C901
HOSTS -------------------------------------------------------------------------
r6r4m52.austin.ibm.com
-------------------------------------------------------------------------------
HOSTS -------------------------------------------------------------------------
satsspc2.austin.ibm.com
-------------------------------------------------------------------------------
hdisk84 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk85 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk86 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk88 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk89 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk97 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
HOSTS -------------------------------------------------------------------------
satsspc4.austin.ibm.com
-------------------------------------------------------------------------------
hdisk83 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk84 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk85 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk87 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk88 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk99 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
Configuring a stretched cluster for Oracle RAC requires the following configuration steps:
1. Populate /etc/cluster/rhosts with corresponding IP addresses.
2. Verify that the clcomd service status is active.
3. Set up cluster, sites, nodes, and networks.
4. Define the repository disk.
5. Define the storage subsystems for each site.
6. Define the mirror groups, taking into account the required fields for HyperSwap
enablement and behavior.
7. Define the resource groups and the desired startup policies
8. Modifying resource groups and adding the corresponding Mirror Group relationship
9. Verify and synchronize
10.Start services and bring online resource groups
11.Verifying cluster status and logs
After following these configuration steps, we create the cluster, choosing the appropriate
nodes on each site and the type of cluster, as shown in Example 8-99. Then, we start
configuring the cluster on the r6r4m51 node.
[Entry Fields]
* Cluster Name [orahyp1]
[Entry Fields]
* Cluster Name orahyp1
* Heartbeat Mechanism Unicast +
Repository Disk 00ce123feacbbf49
Cluster Multicast Address
(Used only for multicast heartbeat)
We define the storage subsystems, which are attached at our hosts, for both sites
(Example 8-101), using fast path smitty cm_add_strg_system.
Example 8-101 Adding storage subsystems for Site A and for Site B
Site A
[Entry Fields]
* Storage System Name [STG_A]
* Site Association SITE_A +
* Vendor Specific Identifier IBM.2107-00000NR571 +
* WWNN 5005076309FFC5D5 +
..................................<snippet>............................
[Entry Fields]
* Storage System Name [STG_B]
* Site Association SITE_B
+
* Vendor Specific Identifier IBM.2107-00000LY981 +
* WWNN 5005076308FFC6D4 +
..................................<snippet>............................
We also configure the mirror group, activating the HyperSwap function for the group of disks
that is designated for the ASM configuration. Using smitty fast path smitty cm_cfg_mirr_gps,
we configure the ORAMG user mirror group, as shown in Example 8-102.
[Entry Fields]
* Mirror Group Name [ORA_MG]
Volume Group(s) +
Raw Disk(s) hdisk41:f64bde11-9356-53fe-68bb-6a2aebc647a1 hdisk42:2198648b-a136-2416-d66f-9aa04> +
HyperSwap Enabled +
Consistency Group Enabled +
Unplanned HyperSwap Timeout (in sec) [ 60] #
HyperSwap Priority Medium
Recovery Action Automatic +
Re-sync Action Automatic
We maintain the Unplanned HyperSwap Timeout value at the default of 60 seconds. The
value represents how long a connection remains unavailable before an unplanned
HyperSwap site failover occurs.
Depending on the desired results, the parameter can be lowered to accommodate the
environment requirements. For databases, a value of 30 seconds for HyperSwap Timeout is
acceptable, taking into account the maximum time allotted for queue full operation.
When multiple disks are configured to be protected by the mirror group, a consistency group
parameter should be enabled. Based on the consistency group parameter, HyperSwap with
PowerHA SystemMirror reacts as a consistency group-aware application assuring data
consistency on the target storage within extend long busy state window.
By default, for Fixed Block extended, a long busy timeout is 60 seconds when the consistency
group parameter is enabled at the PPRC path level. Because it is a good practice not to
overlap mirror group LSSes, we can also minimize the extended long busy state window on
the storage side to 30 seconds, modifying it at the LSS level on both storage repositories (by
using xtndlbztimout), as shown in Example 8-103 on page 290.
For more information about data consistency in the DS8xxx Metro Mirror Peer-to-Peer
Remote Copy, see the IBM Redbooks publication titled IBM System Storage DS8000 Copy
Services for Open Systems, SG24-6788:
http://www.redbooks.ibm.com/redbooks/pdfs/sg246788.pdf
The next step is to configure the resource group that, practically, will be brought online on all
nodes and across the sites as part of the startup policy. The failover policy and fallback policy
are shown in Example 8-104.
[Entry Fields]
* Resource Group Name [ORARG]
In the same way as we did on the single-node HyperSwap configuration, we add the
configured mirror group to the resource group definition by using fast path: smitty
cm_change_show_rg_resource Change/Show Resources and Attributes for a
Resource Group. We pick from the list ORARG and add the desired Mirror Group and all
disks RAW or configured on volume groups in the configuration as shown in Example 8-105
on page 291.
Then, we verify and synchronize the cluster configuration. If any inconsistencies between the
resource group-configured disks and the mirror group-defined disks are detected, an error
message appears, and the configuration for the corresponding mirror group and RG should
be redone.
After finalizing the cluster configuration, we start cluster services and bring the ORARG
resource group online on all available nodes. The resource group status is shown in
Example 8-106.
To start our tests, we install and configure Oracle Real Application Cluster on all nodes. The
status of the resources in the cluster is shown in Example 8-107.
In our environment, the grid infrastructure and the Oracle database have the application
binaries installed on local disks, and the database files on the disks are managed by ASM
(Example 8-108).
The itsodb database data files are located in the DATA disk group, which is managed by ASM,
as shown in Example 8-110.
9 rows selected.
8.20.2 Adding new disks to the ASM configuration: Oracle RAC HyperSwap
Bringing new disks into the ASM configuration, as in the case of the single Oracle database
instance, requires additional procedures and taking into account the disk configuration for
HyperSwap on each Oracle RAC node.
Taking advantage of the Unmanaged HyperSwap level, we put the resource groups in
Unmanaged mode by stopping cluster services, as shown in Example 8-111 on page 295.
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [satsspc4,r6r4m51,sats> +
BROADCAST cluster shutdown? false +
* Select an Action on Resource Groups Unmanage Resource Gro> +
COMMAND STATUS
root@r6r4m51:/u01/app/11.2.0/grid/bin> clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ORARG UNMANAGED r6r4m51@SITE_A
UNMANAGED r6r4m52@SITE_A
UNMANAGED satsspc4@SITE_
UNMANAGED satsspc2@SITE_
root@r6r4m51:/u01/app/11.2.0/grid/bin>
We add two new disks to the configuration, hdisk80 and hdisk89, from the same LSS, c5, as
shown in Example 8-112.
[Entry Fields]
Mirror Group Name ORA_MG
New Mirror Group Name []
Volume Group(s) +
We modify the corresponding resource group and perform a verify and synchronize cluster
configuration. We bring the resource groups online and validate the clxd.log as shown in
Example 8-113.
Example 8-113 All five disks appear as being part of ORA_MG mirror group
INFO |2014-01-23T19:50:20.395011|Number of Opaque Attributes Values = '0'
INFO |2014-01-23T19:50:20.395039|HyperSwap Policy = Enabled
INFO |2014-01-23T19:50:20.395067|MG Type = user
INFO |2014-01-23T19:50:20.395096|HyperSwap Priority = medium
INFO |2014-01-23T19:50:20.395125|Unplanned HyperSwap timeout = 60
INFO |2014-01-23T19:50:20.395173|Raw Disks = f64bde11-9356-53fe-68bb-6a2aebc647a1
INFO |2014-01-23T19:50:20.395203|Raw Disks = 2198648b-a136-2416-d66f-9aa04b1d63e6
INFO |2014-01-23T19:50:20.395233|Raw Disks = 866b8a2f-b746-1317-be4e-25df49685e26
INFO |2014-01-23T19:50:20.395262|Raw Disks = 46da3c11-6933-2eba-a31c-403f43439a37
INFO |2014-01-23T19:50:20.395292|Raw Disks = 420f340b-c108-2918-e11e-da985f0f8acd
INFO |2014-01-23T19:50:20.396019|old_mg_name is: ORA_MG
INFO |2014-01-23T19:50:20.409919|old_mg_name is: ORA_MG
INFO |2014-01-23T19:50:20.503417|Successfully changed a Mirror Group 'ORA_MG'
When we bring the resource group online, we get the output shown in Example 8-114. Ignore
the Failed message, because it is a known problem that will be addressed in a future service
pack, but the movement of the resource group is successful.
COMMAND STATUS
[MORE...17]
We issue the mknod command for disks hdisk80 and hdisk89. Now the disks are protected by
PowerHA and can be added to ASM, as shown in Example 8-115.
We use 20 users to load the database with mostly writes, reaching almost 5K I/O per second
and 23 K transactions per minute. The provided workload is monitored by the Enterprise
Control Manager, as shown in Figure 8-11.The disks swap was performed at 05:11 PM. The
database load was started at 05:04 PM.
We verify the PowerHA SystemMirror resource group status, as shown in Example 8-116.
Example 8-116 Mirror groups active path status and resource group availability
For USER MIRROR Group ORA_MG
COMMAND STATUS
r6r4m51: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m51: ORA_MG:SITE_A:SITE_B:STG_A
r6r4m52: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m52: ORA_MG:SITE_A:SITE_B:STG_A
satsspc4: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc4: ORA_MG:SITE_A:SITE_B:STG_A
satsspc2: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc2: ORA_MG:SITE_A:SITE_B:STG_A
r6r4m51: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m51: CAA_MG:SITE_A:SITE_B:STG_A
r6r4m52: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m52: CAA_MG:SITE_A:SITE_B:STG_A
satsspc2: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc2: CAA_MG:SITE_A:SITE_B:STG_A
satsspc4: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc4: CAA_MG:SITE_A:SITE_B:STG_A
root@r6r4m51:/> clRGinfo -v
We also confirm the status of the Oracle RAC resource, as Example 8-107 on page 292
shows.
Example 8-117 Active path for MG ORA_MG and CAA_MG mirror groups and logged events
root@r6r4m51:/> lspprc -Ao |egrep 'hdisk41|hdisk42|hdisk61|hdisk80|hdisk89|hdisk91'
hdisk41 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk42 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk61 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk80 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk89 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk91 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
root@r6r4m51:/>
Since the CAA_MG has been performed later we found the swap event in clxd.log
...............................<<snippet>>..................................
INFO |2014-02-05T17:14:04.832259|Swap Mirror Group 'CAA_MG' completed.
The planned HyperSwap operation is now complete. The latency shown during the swap
operation is between 1.2 ms and 2.4 ms.
In this scenario, we load the database by using the Swingbench load generator, after starting
it by using the OE benchmark. We capture the events in hacmp.out, clxd.log, syslog.caa
and also in the indicated file by using syslog.conf and /var/hacmp/xd/log/syslog.phake for
kernel debugging.
We verify the cluster status again, as well as the disk replication direction, as shown in
Example 8-118.
Example 8-118 Identifying the source disks and the resource group status
root@r6r4m51:/> lspprc -Ao |egrep
'hdisk41|hdisk42|hdisk61|hdisk80|hdisk89|hdisk91|hdisk100'
hdisk41 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk42 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk61 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk80 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk89 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk91 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk100 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
root@r6r4m51:/> clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
ORARG ONLINE r6r4m51@SITE_A
ONLINE r6r4m52@SITE_A
ONLINE satsspc4@SITE_
ONLINE satsspc2@SITE_
We modify the zones between the nodes r6r4m51 and r6r4m52 and the DS5K storage, as
shown in Example 8-119.
On the Enterprise Control Manager, we validate the continuous load and the latency during
the swap, as shown in Figure 8-13.
We also observe the status of the disk paths, as shown in Example 8-120 on page 303. The
paths for nodes r6r4m51 and r6r4m52 to the storage with wwpn 5005076309ffc5d5 are
missing, and on satsspc2 and satsspc4 are swapped to Storage B from Site B.
root@r6r4m52:/> /work/status_disks.sh
hdisk81 Active 0, 1(s) -1 5005076309ffc5d5,5005076308ffc6d4
hdisk82 Active 0, 1(s) -1 5005076309ffc5d5,5005076308ffc6d4
hdisk83 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk85 Active 0, 1(s) -1 5005076309ffc5d5,5005076308ffc6d4
hdisk86 Active 0, 1(s) -1 5005076309ffc5d5,5005076308ffc6d4
hdisk94 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
All disks that belong to CAA_MG and ORA_MG were swapped to Storage B. Example 8-121
shows the operation status and significant events that were captured during the swap.
Note: For an unplanned HyperSwap, the clxd.log does not record the events.
We validate the active paths from the C-SPOC by using fast path, as shown in
Example 8-122:
smitty -C cm_user_mirr_gp
Example 8-122 Showing active paths for the ORA_MG mirror group
COMMAND STATUS
r6r4m51: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m51: ORA_MG:SITE_B:SITE_A:STG_B
r6r4m52: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m52: ORA_MG:SITE_B:SITE_A:STG_B
satsspc4: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc4: ORA_MG:SITE_B:SITE_A:STG_B
satsspc2: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc2: ORA_MG:SITE_B:SITE_A:STG_B
After activating the zones, we perform the operation for both mirror groups, as shown in
Example 8-123.
[Entry Fields]
* Mirror Group(s) ORA_MG +
* Operation Refresh +
Then, we validate the disk configuration and replication direction. In this scenario, we expect
to have only the disks of Storage A swapping to Storage B, as in previous scenarios, and we
write directly to the Oracle ACFS file system. The ACFS file system configuration is shown in
Example 8-124.
ASMCMD> volinfo -a
Diskgroup Name: DATA
The Cluster Synchronization Services (CSS) heartbeat values set in our test system are
shown in Example 8-125.
We start writing on the ACFS file system, as shown in Example 8-126, and start iostat for
the hdisk80 disk.
We deactivate the zones for the DS5k storage for all nodes, as shown in Example 8-127.
Example 8-128 shows the iostat output. The written kilobytes become 0 when the zone
deactivation is detected.
We count 79 seconds that the ASM disks were not available. The writing rate is more than
105 MB/s.
Consulting the log, we verify the start and end swap time for every mirror group defined in our
cluster, as shown in Example 8-129 on page 310.
Also, the ocssd.log shows the time during which there were missed disk heartbeats, as
shown Example 8-130.
r6r4m51: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m51: ORA_MG:SITE_B:SITE_A:STG_B
r6r4m52: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m52: ORA_MG:SITE_B:SITE_A:STG_B
satsspc4: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
The tie breaker disk has the following requirements and restrictions:
SCSI-3 persistent reservation support is required for I/O fencing. Technologies such
iSCSI, SCSi, or FCoE are supported.
The disk must be accessible on all cluster nodes.
The CAA repository disk cannot be used as tie breaker.
Oracle RAC disks cannot be used as tie breakers.
A third location is required.
PowerHA SystemMirror stretched cluster configuration takes advantage of all CAA cluster
communication mechanisms through these channels:
IP network
SAN fabric
Repository disk
Providing a high level of redundancy for all components and devices that are part of the
cluster configuration and eliminating all single points of failure are recommended. For
example, all network interface cards per network type are in the cluster node, there are
communication links between sites, and the network devices are redundant.
Nevertheless, when all communication between sites is lost, the cluster mechanisms
determine how the cluster reacts. In a HyperSwap environment, only PowerHA SystemMirror
determines whether the disks must be swapped to the auxiliary storage, based on the split
and merge policies.
When a site failure event occurs and the Metro Mirror source disks are located on the storage
on the same site that failed, if the split policy defined is None, the messages from
Example 8-132 appear in the sysphake log. The nodes will be rebooted on the survival site.
It is highly recommended that you use the tie breaker disk for any application that is
configured in a stretched cluster, in addition to the hardware redundancy that is required for
such an implementation.
Warning messages also appear during the verify and synchronize operation, for example:
The possibility of cluster/site partition can be minimized by adding redundancy in
communication paths and eliminating all single-point-of-failures.
In PowerHA, it is easy to configure the policies for how the cluster will behave when a split
and merge event takes place. Use either of these fast paths for configuring the tie breaker
disk:
smitty -C cm_cluster_split_merge
or
We configure the Split and Merge PowerHA policies as shown in Example 8-133 on
page 313, indicating the tie breaker disk. The disk must be seen on all cluster nodes.
[Entry Fields]
Split Handling Policy Tie Breaker +
Merge Handling Policy Tie Breaker +
We use the Swingbench load generator to simulate a database workload. We execute the
PL/SQL procedure to know when the last database insert was done, when the failure instance
was up, and at what time the first insert was committed using the new instance.
We start execution of the PL/SQL procedure by using the sqlplus client. We verify our
SQL*Net connection string for connection to the remote listener, as shown in Example 8-134.
We deactivate the zones for all four nodes of the DS5K storage, as shown in Example 8-136
on page 315.
The nodes r6r4m51 and r6r4m52 are powered off by HMC using immediate option.
In the syslog.phake file, we observe when the ORAM_MG mirror group has been fully
processed and monitor the messages from the ORACLE RAC cluster reconfiguration, as
shown in Example 8-137.
Example 8-137 Oracle RAC reconfiguration and ORA_MG mirror group swap
Feb 10 01:17:34 satsspc4 kern:debug unix: phake_event.c: 35127383:
post_sfw_action(): Posting of Action 'PPRC_ACT_DO_NOTHING' to SFW for
event_handle='0xF100010037567F20' MG[ORA_MG 9] RDG[pha_9654761964rdg1] completed
with rc=22
Feb 10 01:17:34 satsspc4 kern:debug unix: phake_event.c: 35127383:
process_sfw_event(): Processing of SFW Event '0x40000' for MG[ORA_MG 9] @
'0xF100010FE8F76800' completed with rc=0.
We validate the cluster status after the site failure, as shown in Example 8-138 on page 316.
First, we verify the existing repository disk configuration, as shown in Example 8-140.
HACMPsircol:
name = "orahyp1_sircol"
id = 0
uuid = "0"
ip_address = ""
repository = "00cdb31104eb34c3"
backup_repository = ""
root@r6r4m51:/> lscluster -d
Storage Interface Query
Node r6r4m51.austin.ibm.com
Node UUID = c5b720be-8eda-11e3-9fc8-001a64b94abd
Number of disks discovered = 1
hdisk22:
State : UP
uDid : 200B75TL771520507210790003IBMfcp
uUid : 872ba55b-b512-a9b4-158b-043f8bc50000
Site uUid : 51735173-5173-5173-5173-517351735173
Node satsspc2.austin.ibm.com
Node UUID = c5b723f2-8eda-11e3-9fc8-001a64b94abd
Number of disks discovered = 1
hdisk57:
State : UP
uDid : 200B75TL771520507210790003IBMfcp
uUid : 872ba55b-b512-a9b4-158b-043f8bc50000
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
Node r6r4m52.austin.ibm.com
Node UUID = c5b72334-8eda-11e3-9fc8-001a64b94abd
Number of disks discovered = 1
hdisk52:
State : UP
uDid : 200B75TL771520507210790003IBMfcp
uUid : 872ba55b-b512-a9b4-158b-043f8bc50000
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
Node satsspc4.austin.ibm.com
Node UUID = c5b7249c-8eda-11e3-9fc8-001a64b94abd
Number of disks discovered = 1
hdisk54:
State : UP
uDid : 200B75TL771520507210790003IBMfcp
uUid : 872ba55b-b512-a9b4-158b-043f8bc50000
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
We check hdisk100 as a HyperSwap-configured disk on host r6r4m51 and on the other hosts,
as shown in Example 8-141.
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E393330
Serial Number.................75NR571C
Device Specific.(Z7)..........C901
Device Specific.(Z0)..........000005329F101002
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY981E
Device Specific.(Z7)..........EA01
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........A01
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY981EA0107210790003IBMfcp
Logical Subsystem ID..........0xea
Volume Identifier.............0x01
Subsystem Identifier(SS ID)...0xFFEA
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........40ea400100000000
We also validate the UUID for the new CAA hdisk, as shown in Example 8-142.
Example 8-142 Validating the UUID for the new CAA disk
root@r6r4m51:/> -a561-ae19-311fca3ed3f7|dshbak -c <
HOSTS -------------------------------------------------------------------------
r6r4m51.austin.ibm.com
-------------------------------------------------------------------------------
hdisk100 00cdb3110988789d caavg_private active
352037354e52353731433930310052f416e907210790003IBMfcp af87d5be-0ac
c-a561-ae19-311fca3ed3f7
HOSTS -------------------------------------------------------------------------
r6r4m52.austin.ibm.com
-------------------------------------------------------------------------------
hdisk94 00cdb3110988789d caavg_private active
352037354e52353731433930310052f416e907210790003IBMfcp af87d5be-0ac
c-a561-ae19-311fca3ed3f7
HOSTS -------------------------------------------------------------------------
satsspc2.austin.ibm.com
-------------------------------------------------------------------------------
HOSTS -------------------------------------------------------------------------
satsspc4.austin.ibm.com
-------------------------------------------------------------------------------
hdisk99 00cdb3110988789d caavg_private active
352037354e52353731433930310052f416e907210790003IBMfcp af87d5be-0ac
c-a561-ae19-311fca3ed3f7
After the verification, we add a new Cluster_Repository mirror group by accessing the fast
path, as shown in Example 8-143:
smitty cm_add_mirr_gps_select
[Entry Fields]
Mirror Group Name CAA_MG
New Mirror Group Name []
* Site Name SITE_A SITE_B +
Non HyperSwap Disk [hdisk22:872ba55b-b512> +
* HyperSwap Disk [hdisk100:af87d5be-0cc> +
Associated Storage System(s) STG_A STG_B +
HyperSwap Enabled +
Consistency Group yes
Unplanned HyperSwap Timeout (in sec) [60] #
HyperSwap Priority High
Re-sync Action Manual +
The only step left to activate the new CAA HyperSwap disk is to verify and synchronize the
cluster. During this step, the disk repository is changed to a HyperSwap-enabled disk, as
shown in Example 8-144. The operation logs are in the clxd.log.
Node satsspc4.austin.ibm.com
Node UUID = c5b7249c-8eda-11e3-9fc8-001a64b94abd
Number of disks discovered = 1
hdisk99:
State : UP
uDid : 352037354e52353731433930310052f416e907210790003IBMfcp
uUid : af87d5be-0acc-a561-ae19-311fca3ed3f7
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
Node satsspc2.austin.ibm.com
Node UUID = c5b723f2-8eda-11e3-9fc8-001a64b94abd
Number of disks discovered = 1
hdisk97:
State : UP
uDid : 352037354e52353731433930310052f416e907210790003IBMfcp
uUid : af87d5be-0acc-a561-ae19-311fca3ed3f7
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
Node r6r4m52.austin.ibm.com
Node UUID = c5b72334-8eda-11e3-9fc8-001a64b94abd
Number of disks discovered = 1
hdisk94:
State : UP
uDid : 352037354e52353731433930310052f416e907210790003IBMfcp
uUid : af87d5be-0acc-a561-ae19-311fca3ed3f7
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
You can easily revert to a non-HyperSwap disk by using the standard procedure for CAA
repository disk replacement:
1. Add a new repository disk (use either the smitty cm_add_repository_disk or the clmgr
add repository <disk> command). The disk should meet CAA repository disk
requirements.
2. Replace the repository disk (smitty cl_replace_repository_nm or clmgr replace
repository <new_repository>). For more clmgr options, use the clmgr contextual help.
The disks used for ASM configuration and their storage membership are shown in Table 8-7.
The hdisks marked in blue in the preceding table remain in their positions during migration.
The LSS membership of each volume is also indicated in blue.
We follow the configuration steps in 8.21, Online storage migration: Oracle RAC in a
HyperSwap configuration on page 322, using the Swingbench to load the database that we
configured in the Oracle RAC environment.
We also use the Enterprise Manager Console to observe how all configurations are doing
their various steps for storage migration as reflected in our test environment.
We start by verifying the Oracle RAC resources status as shown in Example 8-145.
We also start the Swingbench test with the configuration, as shown in Figure 8-15.
We validate the disk PPRC states, and the path groups IDs as shown in Example 8-146.
Example 8-146 Validating the PPRC states and the path groups ID
root@r6r4m51:/work> dsh /work/"asm_disks_n.sh" |dshbak -c
HOSTS -------------------------------------------------------------------------
r6r4m51.austin.ibm.com
HOSTS -------------------------------------------------------------------------
r6r4m52.austin.ibm.com
-------------------------------------------------------------------------------
hdisk59 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk98 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk99 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk101 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk102 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
HOSTS -------------------------------------------------------------------------
satsspc4.austin.ibm.com
-------------------------------------------------------------------------------
hdisk98 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk99 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk101 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk103 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk104 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
HOSTS -------------------------------------------------------------------------
satsspc2.austin.ibm.com
-------------------------------------------------------------------------------
hdisk101 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk102 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk103 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk105 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
hdisk106 Active 0(s) 1 5005076309ffc5d5 5005076308ffc6d4
With the disks with the source in the Storage A, we swap the disks to Storage B. We validate
the operation with the clxd.log and again issue the command for path and stat validation.
The swap operation log is shown in Example 8-147. It marks the start time for that migration
operation.
HOSTS -------------------------------------------------------------------------
r6r4m51.austin.ibm.com
-------------------------------------------------------------------------------
hdisk49 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk50 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk97 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk99 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk100 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
HOSTS -------------------------------------------------------------------------
satsspc4.austin.ibm.com
-------------------------------------------------------------------------------
hdisk98 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk99 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk101 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk103 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk104 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
HOSTS -------------------------------------------------------------------------
satsspc2.austin.ibm.com
-------------------------------------------------------------------------------
hdisk101 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk102 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk103 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk105 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
hdisk106 Active 1(s) 0 5005076308ffc6d4 5005076309ffc5d5
Now, we proceed to stop the HACMP services by bringing the resource group to an
Unmanaged state, as shown in Example 8-148 on page 328.
We must maintain the hdisk number for all HyperSwap disks, even if we remove the disks
from Storage A from the configuration. We use the chdev command to update the disk
attributes to revert_disk with -U attribute, as shown in Example 8-149. In this way, the hdisk
number is associated with the disk from the secondary storage (Storage B in this example).
We change the disk attributes for all HyperSwap disks that are part of the storage migration. If
thare non-HyperSwap related disks that also need to be migrated, they must be accounted for
at the beginning.
After this operation, the disks are seen on the system as source on Storage B without a path
or configured disk in Storage A, as shown in Example 8-150. A path group ID of -1 indicates
that there are no paths configured from this initiator to the indicated LUN in the PPRC pair.
HOSTS -------------------------------------------------------------------------
r6r4m51.austin.ibm.com
-------------------------------------------------------------------------------
hdisk49 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk50 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk97 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk99 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk100 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
HOSTS -------------------------------------------------------------------------
satsspc4.austin.ibm.com
-------------------------------------------------------------------------------
hdisk98 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk99 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk101 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk103 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk104 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
HOSTS -------------------------------------------------------------------------
satsspc2.austin.ibm.com
-------------------------------------------------------------------------------
hdisk101 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk102 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk103 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk105 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
hdisk106 Active 1(s) -1 5005076308ffc6d4 5005076309ffc5d5
The next step is to remove the PPRC relationships, as shown in Example 8-151.
We create the PPRC relationships for all volume pairs, now with the new storage, as shown in
Example 8-152.
Example 8-152 Establishing PPRC for volume pairs with the new storage
mkpprc -remotedev IBM.2107-75TL771 -type mmir 7f01:a204 7f02:a205 9f01:2f02 e799:3700 e798:3701
dscli> mkpprc -remotedev IBM.2107-75TL771 -type mmir 7f01:a204 7f02:a205 9f01:2f02 e799:3700 e798:3701
ea01:2e01
Date/Time: February 19, 2014 3:37:09 AM CST IBM DSCLI Version: 6.6.0.305 DS: IBM.2107-75LY981
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 7F01:A204 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 7F02:A205 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 9F01:2F02 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship E799:3700 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship E798:3701 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship EA01:2E01 successfully created.
Before configuring disks to be HyperSwap-capable, we must wait while the disks are copied
to the new storage system. You can monitor the process by using the lspprc command at the
storage level, as shown in Example 8-153.
When the lspprc command indicates that the disks are in Full Duplex state, we proceed with
the next configuration steps and run cfgmgr on all nodes. We verify the copy status, as shown
in Example 8-154.
We validate the new disk attributes with the desired ones (reserve_policy, rw_timeout).
We start the disk configurations on all nodes by updating the san_rep_cfg disk attributes, as
shown in Example 8-155.
The disk configuration after updating the disk attributes is shown in Example 8-156.
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E393330
Serial Number.................75TL771A
Device Specific.(Z7)..........A204
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........204
Device Specific.(Z2)..........075
Unique Device Identifier......200B75TL771A20407210790003IBMfcp
Logical Subsystem ID..........0xa2
Volume Identifier.............0x04
Subsystem Identifier(SS ID)...0xFFA2
Control Unit Sequence Number..00000TL771
Storage Subsystem WWNN........500507630affc16b
Logical Unit Number ID........40a2400400000000
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY9817
Device Specific.(Z7)..........7F01
Device Specific.(Z0)..........000005329F101002
We validate the disk configurations and add the new storage definition in PowerHA
SystemMirror configuration, as shown in Example 8-157.
[Entry Fields]
* Storage System Name [STG_C]
* Site Association SITE_A +
* Vendor Specific Identifier IBM.2107-00000TL771 +
* WWNN 500507630AFFC16B +
We start the cluster services without bringing up the resource groups, as shown in
Example 8-158.
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [satsspc4,r6r4m51,sats> +
* Manage Resource Groups Manually +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
We modify the mirror group and the resource groups, and re-adding all hdisks in the
configurations. We verify and synchronize the cluster configuration and validate the operation
log (in this example: /var/hacmp/clverify/clverify.log).
We start bringing the resource group online, as shown in Example 8-159 on page 333.
[Entry Fields]
Resource Group to Bring Online ORARG
Node on Which to Bring Resource Group Online All_Nodes_in_Group
.............................<<snippet>>.....................................
The operation is shown with the failed status (Example 8-160). But in reality, the RG has
been brought online.
[TOP]
Attempting to bring group ORARG online on node ORARG:NONE:satsspc2.
Attempting to bring group ORARG online on node r6r4m51.
Attempting to bring group ORARG online on node ORARG:NONE:satsspc4.
Attempting to bring group ORARG online on node ORARG:NONE:r6r4m52.
No HACMPnode class found with name = ORARG:NONE:satsspc2
Usage: clRMupdate operation [ object ] [ script_name ] [ reference ]
Failed to queue resource group movement event in the cluster manager.
No HACMPnode class found with name = ORARG:NONE:satsspc4
No HACMPnode class found with name = ORARG:NONE:r6r4m52
Usage: clRMupdate operation [ object ] [ script_name ] [ reference ]
Failed to queue resource group movement event in the cluster manager.
Usage: clRMupdate operation [ object ] [ script_name ] [ reference ]
Failed to queue resource group movement event in the cluster manager.
Waiting for the cluster to process the resource group movement request....
root@r6r4m51:/work> clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
We validate the paths for the HyperSwap disks, as shown in Example 8-161.
[Entry Fields]
* Mirror Group(s) ORA_MG +
* Operation Show active path +
...............................................................................
COMMAND STATUS
r6r4m51: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m51: ORA_MG:SITE_B:SITE_A:STG_B
r6r4m52: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
r6r4m52: ORA_MG:SITE_B:SITE_A:STG_B
satsspc4: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc4: ORA_MG:SITE_B:SITE_A:STG_B
satsspc2: MG_NAME:ACTIVE_SITE:SECONDARY_SITE:STORAGE_SYSTEM_ON_ACTIVE_SITE
satsspc2: ORA_MG:SITE_B:SITE_A:STG_B
We swap the disks in Storage C in Site A and validate the swap operation log, as shown in
Example 8-162.
HOSTS -------------------------------------------------------------------------
r6r4m51.austin.ibm.com
-------------------------------------------------------------------------------
hdisk49 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk50 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk97 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk99 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk100 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
HOSTS -------------------------------------------------------------------------
satsspc2.austin.ibm.com
-------------------------------------------------------------------------------
hdisk101 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk102 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk103 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk105 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk106 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
HOSTS -------------------------------------------------------------------------
satsspc4.austin.ibm.com
-------------------------------------------------------------------------------
hdisk98 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk99 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk101 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk103 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
hdisk104 Active 0(s) 1 500507630affc16b 5005076308ffc6d4
Manufacturer..................IBM
Machine Type and Model........2107900
ROS Level and ID..............2E313336
Serial Number.................75LY9817
Device Specific.(Z7)..........7F01
Device Specific.(Z0)..........000005329F101002
Device Specific.(Z1)..........F01
Device Specific.(Z2)..........075
Unique Device Identifier......200B75LY9817F0107210790003IBMfcp
Logical Subsystem ID..........0x7f
Volume Identifier.............0x01
Subsystem Identifier(SS ID)...0xFF7F
Control Unit Sequence Number..00000LY981
Storage Subsystem WWNN........5005076308ffc6d4
Logical Unit Number ID........407f400100000000
Figure 8-16 Graphical display of the cluster behavior using the Enterprise Manager Console
At 3:37 AM, an event was observed that decreased the load activity. This can easily be
associated with the disk reconfiguration operation that was underway at that time. The results
of this event are similar to a planned HperSwap and removal of a PPRC relationship. The
decrease in the database load activity is directly related to the copy operations, as shown in
Example 8-154 on page 330. The database load reverts back to the original start value after
the copy operations have completed, as reflected by the Full Duplex state of the disk pairs.
You can also configure the kernel extension to create debug logs in the /etc/syslog.conf file
by completing the following steps:
1. In the /etc/syslog.conf file, add the line shown in Example 8-164.
Note: The debug logs are also logged in the console. For unplanned operations, all events
appear in the /var/hacmp/xd/log/syslog.phake file.
HyperSwap configurations use kernel extensions. Therefore, you can view error or warning
messages from the kernel extensions by using the errpt command, as shown in
Example 8-165.
After reading this chapter, you will understand how to integrate RBAC into a PowerHA
SystemMirror environment from scratch.
Federated security is a centralized tool that addresses Lightweight Directory Access Protocol
(LDAP), role-based access control (RBAC), and Encrypted File System (EFS) integration into
cluster management.
Through the federated security cluster, users can manage roles and the encryption of data
across the cluster.
In this book, we focus on the IBM Tivoli Directory Server software. PowerHA SystemMirror
includes an option to configure the LDAP server on cluster nodes for which at least two
cluster nodes are required for peer-to-peer replicated LDAP server setup. You can find the
detailed steps for this configuration in 9.3.1, Peer-to-peer replicated LDAP server scenario
on page 341. Depending on your environment, you can also configure the LDAP server on a
node outside of the cluster.
For external LDAP server, cluster nodes need be configured only as an LDAP client. For
detailed steps for this configuration, see 9.3.2, External LDAP server scenario on page 345.
9.2.1 Components
LDAP enables centralized security authentication, access to user and group information, and
common authentication, user, and group information across the cluster.
9.2.2 Planning
Before you can use the features of federated security, you must plan for its implementation in
your environment.
In the example in following sections of this chapter, we are using a two-node cluster to
illustrate the setup. The environment must meet the following requirements:
The AIX operating system must be at one of the following technology levels:
IBM AIX 6.1 with Technology Level 7 or later
IBM AIX 7.1 with Technology Level 1 or later
PowerHA SystemMirror Version 7.1.1 or later
IBM Tivoli Directory Server 6.2 or later
Note: IBM Tivoli Directory Server is included with AIX base media.
Task #1 start
Description: Enable IOCP
Estimated time 1 second(s)
Task #1 end
...
...
Task #46 start
Description: Updating global profile registry
Estimated time 3 second(s)
Task #46 end
# /usr/local/bin/db2ls
Install the Tivoli Directory Server server and client on two cluster nodes
The Tivoli Directory Server server and client installation steps are shown in Example 9-3.
Example 9-3 Tivoli Directory Server server and client file sets installation
Install idsLicense in the /license directory from the AIX Expansion DVD.
# /license/idsLicense
International Program License Agreement
Press Enter to continue viewing the license agreement, or, Enter "1" to accept
the agreement, "2" to decline it or "99" to go back to the previous screen,
"3" Print.
1
Install the GSKit on the LDAP server and all of the cluster nodes
The GSKit installation steps are shown in Example 9-2 on page 343.
Install the Tivoli Directory Server server and client on the LDAP server
The Tivoli Directory Server server and client installation steps are shown in Example 9-3 on
page 343.
Install the Tivoli Directory Server client on all of the cluster nodes
The Tivoli Directory Server client installation steps are shown in Example 9-6.
Press Enter to continue viewing the license agreement, or, Enter "1" to accept
the agreement, "2" to decline it or "99" to go back to the previous screen,
"3" Print.
1
Configure the LDAP server and client on all of the cluster nodes
Steps for configuring the LDAP server and client are shown in Example 9-10.
To define LDAP server on cluster node, you can do it from C-SPOC or command line:
# /usr/es/sbin/cluster/cspoc/cl_ldap_server_existing -h a3 -a cn=admin -w adminpwd
-d cn=aixdata,o=ibm -p 636 -S /newkeys/serverkey.kdb -W serverpwd
These roles can be assigned to the user to provide restricted access to the cluster functions,
based on the role.
User management is in the PowerHA SystemMirror Cluster Single Point of Control (C-SPOC),
which is shown in Figure 9-1. To reach user management, enter smitty sysmirror and select
System Management (C-SPOC) Security and Users Users in an PowerHA
SystemMirror cluster.
To create a user, you can set the authentication and registry mode to either LOCAL(FILES) or
LDAP, as shown in Figure 9-2 on page 351.
+--------------------------------------------------------------------------+
| Select an Authentication and registry mode |
| |
| Move cursor to desired item and press Enter. |
| |
| LOCAL(FILES) |
| LDAP |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
You can assign the PowerHA SystemMirror RBAC roles to the new user as shown in
Figure 9-3 on page 352.
In this section, we create four non-root users and assign these different RBAC roles to them:
haOp - ha_op
haAdmin - ha_admin
haView - ha_view
haMon - ha_mon
We use the following four examples to illustrate how the four RBAC roles can be used for
some PowerHA SystemMirror functions.
Example 9-11 Moving cluster resource group by a non-root user with ha_op role
# lsuser haOp
haOp id=208 pgrp=staff groups=staff home=/home/haOp shell=/usr/bin/ksh login=true
su=true rlogin=true telnet=true daemon=true admin=false sugroups=ALL admgroups=
tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=LDAP
SYSTEM=LDAP logintimes= loginretries=0 pwdwarntime=0 account_locked=false minage=0
maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0 minupperalpha=0 minother=0
mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8 minlen=0 histexpire=0
histsize=0 pwdchecks= dictionlist= default_roles= fsize=2097151 cpu=-1 data=262144
stack=65536 core=2097151 rss=65536 nofiles=2000 roles=ha_op
# su - haOp
$ whoami
To move a cluster resource group, enter smitty sysmirror and select System Management
(C-SPOC) Resource Group and Applications Move Resource Groups to Another
Node and select the resource group and destination node. The task completes successfully
with the result as shown in Figure 9-4.
COMMAND STATUS
[TOP]
Attempting to move resource group rg04 to node lpar0204.
Waiting for the cluster to process the resource group movement request....
Example 9-12 Creating a cluster snapshot by a non-root user with ha_admin role
# lsuser haAdmin
haAdmin id=207 pgrp=staff groups=staff home=/home/haAdmin shell=/usr/bin/ksh
login=true su=true rlogin=true telnet=true daemon=true admin=false sugroups=ALL
admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22
registry=LDAP SYSTEM=LDAP logintimes= loginretries=0 pwdwarntime=0
account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0
minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8
minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=
fsize=2097151 cpu=-1 data=262144 stack=65536 core=2097151 rss=65536 nofiles=2000
roles=ha_admin
To create a cluster snapshot, enter smitty sysmirror and select Cluster Nodes and
Networks Manage the Cluster Snapshot Configuration Create a Cluster
Snapshot of the Cluster Configuration as shown in Figure 9-5.
[Entry Fields]
* Cluster Snapshot Name [testCluster] /
Custom Defined Snapshot Methods [] +
* Cluster Snapshot Description [To test ha_admin role]
Example 9-13 Reading the hacmp.out file by a non-root user with the ha_view role
# lsuser haView
haView id=210 pgrp=staff groups=staff home=/home/haView shell=/usr/bin/ksh
login=true su=true rlogin=true telnet=true daemon=true admin=false sugroups=ALL
admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22
registry=LDAP SYSTEM=LDAP logintimes= loginretries=0 pwdwarntime=0
account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0
minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8
minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=
fsize=2097151 cpu=-1 data=262144 stack=65536 core=2097151 rss=65536 nofiles=2000
roles=ha_view
# su - haView
$ whoami
haView
$ swrole ha_view
haView's Password:
$ rolelist -e
ha_view
$ pvi /var/hacmp/log/hacmp.out
HACMP: Additional messages will be logged here as the cluster events are run
Note: You cannot use the vi editor or cat command to read or write a privileged file. You
can use only the pvi editor to do so.
Example 9-14 Monitoring RS information using clRGinfo by a non-root user with the ha_mon role
# lsuser haMon
haMon id=209 pgrp=staff groups=staff home=/home/haMon shell=/usr/bin/ksh
login=true su=true rlogin=true telnet=true daemon=true admin=false sugroups=ALL
admgroups= tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22
registry=LDAP SYSTEM=LDAP logintimes= loginretries=0 pwdwarntime=0
account_locked=false minage=0 maxage=0 maxexpired=-1 minalpha=0 minloweralpha=0
minupperalpha=0 minother=0 mindigit=0 minspecialchar=0 mindiff=0 maxrepeats=8
minlen=0 histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles=
fsize=2097151 cpu=-1 data=262144 stack=65536 core=2097151 rss=65536 nofiles=2000
roles=ha_mon
# su - haMon
$ whoami
haMon
$ swrole ha_mon
haMon's Password:
$ rolelist -e
ha_mon
$ /usr/es/sbin/cluster/utilities/clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
In this section, we use the clRGinfo example to illustrate the customization. To enable a
non-root user to run clRGinfo, complete the following steps:
1. Check whether Enhanced RBAC is enabled by running the following command:
lsattr -El sys0 -a enhanced_RBAC
True means that it is enabled. If it is not, enable it by running this command:
chdev -l sys0 -a enhanced_RBAC=true
2. Create a user-defined authorization hierarchy:
mkauth dfltmsg='IBM custom' ibm
mkauth dfltmsg='IBM custom application' ibm.app
mkauth dfltmsg='IBM custom application execute' ibm.app.exec
3. Assume that the command is not listed in /etc/security/privcmds. If you want to find out
what privileges are necessary to run the command, use tracepriv, as Example 9-15
shows. Otherwise, skip this step.
Example 9-15 Using tracepriv to find the necessary privileges to run a command
Add the command to the privileged command database by using this command:
# setsecattr -c euid=0 accessauths=ibm.app.exec
innateprivs=PV_DAC_GID,PV_NET_CNTL,PV_NET_PORT
/usr/es/sbin/cluster/utilities/clsnapshot
Now, a non-root user can run clRGinfo as the example in Example 9-16 shows.
Note: If it is done incorrectly, changing the host name can lead to multiple nodes having
the same host name. That could cause confusion in the TCP/IP networking in the
environment.
Before looking into this solution, check with your application specialist about whether a
dynamic host name is really needed. Most applications can be configured not to be host
name-dependent.
Older versions of IBM Systems Director, SAP, or Oracle applications might have a host name
dependency requirement and require that the host name is acquired by the backup system.
For information about the latest versions and requirements of those applications, check the
following websites:
http://www.ibm.com/systems/director/
http://www.sap.com
http://www.oracle.com
If a middleware product needs the host name to be changed when a failover is happening, the
most common method of accomplishing this host name change is to use the IBM AIX
hostname command in the start script for the middleware. Also, it is necessary to restore the
host name to its original name when the application is being stopped in the stop script to
avoid multiple nodes having the same host name accidentally.
There are two supported solutions in IBM PowerHA 7.1.3: Temporary and permanent host
name changes. Which of these two solutions work for you depends on your application.
The main question here is: How does the application get the host name information? For
details, see 10.5, Temporary host name change on page 370 and 10.6, Permanent host
name change on page 372.
If you are using a PowerHA version before 7.1.3, check whether the solution described in
10.7, Changing the host name in earlier PowerHA 7.1 versions on page 375, might be an
option for you.
AIX provides various commands and APIs to work with these two types of host names:
Interfaces that set or get the permanent host name. This ODM attribute is read directly
using the ODM-related primitives:
lsattr -El inet0 | grep hostname
odmget -q "attribute=hostname" CuAt
The host name can also be permanently changed by using the SMIT panel.
Interfaces that set or get the host name temporarily:
hostname
uname -n or uname -a
hostid
The function gethostname()
If your application is using scripts to set or get the host name temporarily, that is easy to
determine by searching for one of the options listed above. If your application is based on
binaries, the AIX gethostname() function is probably used. But because you have only a
binary file, you must test it to find out.
There are two ways to change the host name of an AIX system:
By using the command line. See 10.3.1, Using the command line to change the host
name on page 361. This method gives you the most flexibility.
By using the System Management Interface Tool (SMIT). See 10.3.2, Using SMIT to
change the host name information on page 365.
Be sure to read 10.3.3, Cluster Aware AIX (CAA) dependencies on page 367 for details
about the effects of the different commands on the CAA.
Chapter 10. Dynamic host name change (host name takeover) 361
hostname command
If only the hostname command is used, the type of dynamic host name that you need is
temporary host name change. For details on how to set this up, see 10.5, Temporary host
name change on page 370.
Using the hostname <name> command changes the host name in the running environment.
The hostname kernel variable will be changed to the new name. Therefore, if you use host
name without any options or arguments, you get the name that you specified for the <name>
variable.
The AIX gethostname() function is also reading from the hostname kernel variable. Therefore,
it also returns the name that you specified under <name>.
None of the other values considered to list the host name change. Example 10-1 shows this
behavior. The first part shows the output of the different commands before using the hostname
<name> command, and the second half shows the output after the change.
uname command
If the uname -n or uname -a command is used, then temporary hostname change is the better
choice. For details in how to set this up, see section 10.5, Temporary host name change on
page 370.
Using the uname -S <name> command changes the uname information in the running
environment. The kernel variable utsname is changed to the new name. Therefore, if you use
uname -n or -a, you get the name that you specified for the <name> variable.
None of the other values considered to list the host name change. Example 10-2 on page 363
shows this behavior. The first part shows the output of the different commands before using
the uname -S <name> command, and the second part shows the output after the change.
hostid command
The hostid command returns a hex value of the IP label that is normally associated with the
host name.
If the hostid command is used, then temporary hostname change is appropriate. For details
on how to set this up, see 10.5, Temporary host name change on page 370.
Using the hostid <name> command changes the hostid information in the running
environment. Keep in mind that the name that you used for <name> must be a resolvable
name. You can use the IP address instead. If you use the hostid command without any
options or arguments, you get the hex value for the specified information under <name>. To get
readable information, you can use either the host or ping command:
host $(hostid)
or
ping $(hostid)
None of the other values considered to list the host name change. Example 10-3 on page 364
shows this behavior. The first part shows the output of the different commands before using
the hostid <name> command, and the second half shows the output after the change.
Chapter 10. Dynamic host name change (host name takeover) 363
Example 10-3 hostid command
root@asterix(/)# hostname
asterix
root@asterix(/)# uname -n
asterix
root@asterix(/)# lsattr -El inet0 | grep hostname
hostname asterix Host Name True
root@asterix(/)# hostid
0xac1e77cd
root@asterix(/)# host $(hostid)
asterix is 172.30.119.205, Aliases: pokbc.lpar0103
root@asterix(/)#
root@asterix(/)# hostid paris
root@asterix(/)# hostname
asterix
root@asterix(/)# uname -n
asterix
root@asterix(/)# lsattr -El inet0 | grep hostname
hostname asterix Host Name True
root@asterix(/)# hostid
0xac1e77ef
root@asterix(/)# host $(hostid)
paris is 172.30.119.239, Aliases: test-svc1
root@asterix(/)#
Using the chdev -l inet0 -a hostname=<name> command changes two components. Like
using the hostname command, it changes the hostname kernel variable. It also changes the
host name information in the CuAt ODM class. Therefore, if you use the lsattr -El inet0 |
grep hostname, you get the name you specified for <name>. In this case, you get the same
result when you use the hostname command.
Important: Using the chdev command makes the change persistent across a reboot, so
this change can create problems, potentially.
None of the other values considered to list the host name change. Example 10-4 on page 365
show this behavior. The first part shows the output of the different commands before using the
chdev -l inet0 -a hostname=<name> command, and the second half shows the output we get
after the change.
gethostname function
The gethostname() C function gets its value from the running hostname kernel variable.
Therefore, if you use the hostname or the chdev commands to change the host name, the
function returns the new host name.
Note: The examples for this section are illustrated in hostname command on page 362,
and odmget and lsattr commands on page 364.
Chapter 10. Dynamic host name change (host name takeover) 365
Set Hostname
[Entry Fields]
* HOSTNAME (symbolic name of your machine) [obelix]
It is important to keep in mind that this step makes several changes to your system. When you
run your changes, the system performs the chdev and hostid commands, so most of the host
name-related information gets updated in one action. The only exception is the uname
information. To get the uname-related utsname kernel variable updated also, you have two
options: You can reboot the system or use the uname -S $(hostname) command.
Example 10-5 shows the information from our test systems. For this example, we used the
value listed in Figure 10-1. The first part shows the output of the different commands before
using SMIT, and the second half shows the output after the change.
Attention: Never use smitty mktcpip on an existing environment only to change the host
name.
From a host name change point of view, you should know about the commands described in
10.3.1, Using the command line to change the host name on page 361, because they affect
the CAA. The following list gives you a brief summary:
uname -S <name> The uname information is ignored by CAA.
Depending on your environment, there are different sequences, which are explained in these
sections:
New system setup
Adding and configuring PowerHA in an existing environment on page 369
Note: Keep in mind that some of the tasks listed here must be done on all cluster nodes.
Chapter 10. Dynamic host name change (host name takeover) 367
Before starting with these steps, make sure that you have the scripts that manage your host
name takeover available. An example of what we used is listed in 10.9, PowerHA hostname
change script on page 378.
1. Install AIX and the PowerHA components that you need for your environment.
2. Configure your AIX environment by defining all necessary variables, user IDs, group IDs,
TCP/IP settings, and disks.
3. Verify that you have all necessary PowerHA TCP/IP addresses defined in your /etc/hosts
file, and make sure that you have all shared volume groups (VGs) known to all cluster
nodes.
4. Add your boot IP addresses to the /etc/cluster/rhosts file. That file must have the same
content on all cluster nodes.
5. Configure all your boot or base addresses in all your cluster nodes.
6. Check that the host name is equal to one of your IP-Labels used for the boot address(es).
7. If you already know your storage configuration details, you can do this at this step. If not
just continue with the next step.
8. Start configuring PowerHA:
a. Configure the Power Cluster name and nodes.
b. Configure your repository disk.
c. Synchronize your cluster.
d. Define the application script to set the host name.
e. Add your service IP address.
f. Configure a resource group with your service IP only.
g. Synchronize your cluster.
h. Start only one cluster node, for instance your primary note.
This makes the host name and your service address available to install the application.
9. Configure your VGs and file systems and mount them if not already done as part of step 7.
10.Install your applications. Depending on your application, you might need to varyon and
mount your application-related VGs first.
11.Stop your application and stop the resource group or move the resource group to your
next system.
12.Activate the resource group on your backup system (still IP and maybe VG only) if not
already done as part of step 11.
13.Install you application on the backup system.
14.Stop your application and your resource group.
15.Add your application start/stop scripts to your resource group. Check that your resource
group now contains all components that are necessary for starting your application.
16.Synchronize you PowerHA cluster.
17.Continue with 10.5, Temporary host name change on page 370 or 10.6, Permanent host
name change on page 372 and start testing.
6. Now you might must migrate data from your local VG to a shared VG.
If moving data from the local VG to the shared VG is necessary, the safest way is to use
the following commands:
cd <source_dir>
find . | backup -if - | (cd <target_dir>; restore -xdqf -)
umount or delete <source_dir>
Chapter 10. Dynamic host name change (host name takeover) 369
14.Install your application.
15.Stop your application and your resource group.
16.Add your application start/stop scripts to your resource group. Make sure that your
resource group now contains all components that are necessary to start your application.
17.Synchronize your PowerHA cluster.
18.Continue with either Temporary host name change or 10.6, Permanent host name
change on page 372, and then start testing.
If you are already using host name takeover with an earlier PowerHA version and you are
planning for migration to PowerHA 7.1.3, you must check your existing host name change
scripts first. As explained in more detail in section 10.8, Migrating a host name takeover
environment on page 376, these scripts should not contain a chdev -l inet0 command.
To get the temporary host name takeover to work, follow the description in section 10.4,
Initial setup and configuration on page 367. Also check that the start/stop scripts handle the
change of the host name.
Note: A goal of the temporary host name takeover is that this change does not persist
across a reboot.
The following section shows example output from our test system. Before starting the
resource group, check some of the system settings. Example 10-6 shows that our initial host
name is asterix and that the CAA node names are asterix and obelix.
Example 10-6 Check hostname information before starting the resource group
root@asterix(/)# hostname; uname -n ; host $(hostid); lsattr -El inet0 | grep host
asterix
asterix
asterix is 129.40.119.205, Aliases: pokbc.lpar0103
hostname asterix Host Name True
root@asterix(/)#lscluster -m
Calling node query for all nodes...
Node query number of nodes examined: 2
----------------------------------------------------------------------------
Next, start PowerHA or the resource group. Wait until the cluster is back in a stable state.
Then, use the commands shown in Example 10-6 on page 370.
Example 10-7 shows the output after that. As expected, the hostname changes when you use
one of the following commands: hostname, uname -n, or host $(hostid). The information in
CAA and CuAt does not change.
Chapter 10. Dynamic host name change (host name takeover) 371
----------------------------------------------------------------------------
Now, when you move the resource group to the backup system, you see that the system
asterix gets back its original host name. Also, the host name of the backup system is now
showing the host name paris rather than obelix.
Note: This action restores the previous host name in the /etc/hosts directory and
appends it into the new entry as this syntax shows:
x.x.x.x <new host name> <old host name>
Note: Issue the smitty chinet command from the console to avoid losing the connection if
you are logged in through the old IP address.
11. Optional: To change the node name in PowerHA, complete the following steps on one of
the cluster nodes (assuming the same node as the one in step 9).
a. Update the new node name with smitty cm_manage_nodes.
b. Update only the new node name. Do not select the communication path again.
Chapter 10. Dynamic host name change (host name takeover) 373
12.Update /etc/cluster/rhosts to the new boot IP addresses. Then stop and restart clcomd:
# stopsrc -s clcomd
# startsrc -s clcomd
13.Verify and synchronize the cluster configuration from node 1.
14.Start cluster services.
Note: To minimize downtime, the previous steps can be tuned without stopping the cluster
service, but it still requires two short downtime periods during the resource group
movement. Follow the actions in step 2 on page 235 through step 8 on page 235, and then
follow these steps:
Note: It is not allowed to change the node name in PowerHA when the node is active.
However, you can change it later after bringing down the cluster service. To change the
node name in PowerHA, complete the following steps on one of the cluster nodes:
Update the new node name with smitty cm_manage_nodes.
Update only the new node name, do not select communication path again at this step.
Then, verify and synchronize the cluster configuration from the same node.
If the stop script has been run when there is an unplanned resource group movement, such
as the result of a source node network failure, the host name change behavior and actions
are similar to the planned resource group movement. However, if it is a node failure, the stop
script is not run, so the failure node host name does not change back to its original, but
remain as the service IP label after restart. In this case, after the failed node is up again, you
must manually change its host name by using smitty mkhostname. However, you are not able
to synchronize the cluster configuration from the failure node because PowerHA
SystemMirror does not allow synchronization from an inactive node to an active node. You
must manually update the COMMUNICATION_PATH in the ODM of the node that the
application is now running on. The commands are shown in Example 10-8.
Then, you can synchronize the cluster configuration from the node to the failed node and start
the cluster service on the failed node again.
When you stop the cluster service on the source node with the option to move the resource
group, it behaves like the planned resource group movement. So you must synchronize the
cluster configuration from the source node after the movement.
You can then start the cluster service on the source node again. Then, move the resource
group back according to the planned resource group movement.
Note: Generally, after the cluster is configured, you should not need to change the host
name of any cluster nodes.
To change the host name of a cluster node, you must first remove the Cluster Aware AIX
(CAA) cluster definition, update PowerHA SystemMirror and the AIX operating system
configurations, and then synchronize the changes to re-create the CAA cluster with the new
host name.
To change the host name for a cluster node in PowerHA SystemMirror 7.1.2 or earlier
versions, complete the following steps:
1. Stop the cluster services on all nodes by using the Bring Resource Group Offline option.
Chapter 10. Dynamic host name change (host name takeover) 375
2. To remove the CAA cluster, complete the following steps on all nodes:
a. Get the name of the CAA cluster:
# lscluster -i | grep Name
b. Get the disk of the repository disk:
# lspv | grep caavg_private
# clusterconf -ru <repository disk>
c. CAA_FORCE_ENABLED=1 ODMDIR=/etc/objrepos /usr/sbin/rmcluster -f -n <CAA
cluster name> -v
d. CAA_FORCE_ENABLED=1 ODMDIR=/etc/objrepos /usr/sbin/rmcluster -f -r
<repository disk> -v
e. Reboot the node to clean up the CAA repository information.
3. To update the AIX operating system configuration, complete the following steps on all the
nodes with the new host name:
a. Change the /etc/hosts file for each node in the cluster with the new host name. If your
environment is using a DNS, you must update the DNS with the new host name.
b. Change the /etc/cluster/rhosts file on all cluster nodes.
c. Run smitty mktcpip to change the host name and IP address.
d. Stop and restart clcomd:
# stopsrc -s clcomd; startsrc -s clcomd
4. To update the PowerHA SystemMirror configuration, complete the following steps on one
of the cluster nodes:
a. Update the communication path with smitty cm_manage_nodes. Select only the new
communication path. Do not update the new node name at this step.
b. Update the boot IP label, and then edit tmp1 with the new host name in the ip_label
field and the new IP address in the corresponding identifier field:
# odmget HACMPadapter > /tmp/tmp1
# odmdelete -o HACMPadapter
# odmadd /tmp/tmp1
c. Discover the network interfaces and disks:
# smitty cm_cluster_nodes_networks
d. Verify and synchronize the cluster configuration. This process creates the CAA cluster
configuration with the updated host name.
5. Optional: To change the node name in PowerHA, complete the following steps on one of
the cluster nodes:
a. Update the new node name using smitty cm_manage_nodes. Update only the new node
name, do not select the communication path again at this step.
b. Verify and synchronize the cluster configuration.
6. Start the cluster services.
Now that you know that your scripts are using the chdev command, as shown in
Example 10-9, you need to test whether the chdev command is needed.
It is rare that an application checks for the content of the AIX CuAT ODM class. Therefore, in
most cases it is not necessary to use the chdev command.
If you have a test environment or a wide maintenance window, an easy way to test it is by
replacing the command chdev in our example with hostname <name>. Example 10-10 shows
the change that we did in comparison to the part shown in Example 10-9.
Chapter 10. Dynamic host name change (host name takeover) 377
If your application still works with the modification shown in Example 10-10 on page 377, you
can use the temporary host name takeover.
If your application does not work, you have one of the rare cases, so you must use the
permanent host name takeover option.
#VERBOSE_LOGGING=high
[[ "$VERBOSE_LOGGING" == "high" ]] && set -x
# Variables
Task="$1"
SystemA="asterix"
SystemB="obelix"
ServiceHostname="paris"
PathUtils="/usr/es/sbin/cluster/utilities"
ActualHAnodename=$(${PathUtils}/get_local_nodename)
case $Task in
start) hostname $ServiceHostname;
hostid $ServiceHostname;
uname -S $ServiceHostname;
RC=0;;
stop) hostname $ActualHAnodename;
hostid $ActualHAnodename;
uname -S $ActualHAnodename;
RC=0;;
*) echo "Unknown Argument used";
RC=1;;
esac
exit $RC
PowerHA provides commands such as cldump and clstat for monitoring the status of the
cluster. There are also IBM Tivoli file sets that provide support for existing version 5
monitoring environments. There is no specific cluster monitoring function for Tivoli Monitoring
version 6 or other OEM enterprise monitoring products. For more information, see 11.3.1,
IBM Tivoli Monitoring agent for UNIX logs on page 390.
The SNMP protocol is the crux of obtaining the status of the cluster. The SNMP protocol is
used by network management software and systems for monitoring network applications and
devices for conditions that warrant administrative attention. The SNMP protocol is composed
of a database and a set of data objects. The set of data objects forms a Management
Information Base (MIB). The standard SNMP agent is the snmpd daemon. A SMUX (SNMP
Multiplexing protocol) subagent allows vendors to add product-specific MIB information.
The clstrmgr daemon in PowerHA acts as a SMUX subagent. The SMUX peer function,
which is in clstrmgrES, maintains cluster status information for the PowerHA MIB. When the
clstrmgrES starts, it registers with the SNMP daemon, snmpd, and continually updates the
MIB with cluster status in real time. PowerHA implements a private MIB branch that is
maintained by a SMUX peer subagent to SNMP that is contained in the clstrmgrES daemon,
as shown in Figure 11-1.
Network Management
Management Agent
Application snmpd
Monitoring
clstmgr daemon
Custom Scripts (SNMP subagent)
PowerHA provided tools: AIX Public MIB
clinfo daemon /etc/mib.defs
clstat
Private MIB
hacmp.defs
1
Application monitoring is a feature of PowerHA which aides the cluster in determining whether the application is
alive and well. Further information about application monitoring is beyond the scope of this chapter,
ISO (1) Identified Organization (3) Department of Defense (6) Internet (1) Private
(4) Enterprise (1) IBM (2) IBM Agents (3) AIX (1) aixRISC6000 (2)
risc6000agents (1) risc6000clsmuxpd (5)
risc6000clsmuxpd(5)
The resultant MIB for PowerHA cluster would be 1.3.6.1.4.1.2.3.1.2.1.5.1. The data held
within this MIB can be pulled by using the snmpinfo command shown in Example 11-1.
Individual elements, such as the cluster state and cluster substate, can be pulled as shown in
Example 11-2.
Note: the -v translates the numbered MIB branch path to reable variable name.
In Example 11-2, the cluster has a state of 2 and a substate of 32. To determine the meaning
of these values, see the /usr/es/sbin/cluster/hacmp.my file, which contains a description of
each HACMP MIB variable (Example 11-3 on page 382).
clusterSubState OBJECT-TYPE
SYNTAX INTEGER { unstable(16), error(64),
stable(32), unknown(8), reconfig(128),
notconfigured(256), notsynced(512) }
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The cluster substate"
You can conclude from Example 11-3 that the cluster status is UP and STABLE. This is the
mechanism that clinfo/clstat uses to display the cluster status.
The clstat utility uses clinfo library routines (via the clinfo daemon) to display all node,
interface, and resource group information for a selected cluster. The cldump does likewise, as
a one-time command, by interrogating the private MIB directly within the cluster node. Both
rely solely on the SNMP protocol and the mechanism described above.
The examples that follow are templates that have been written for customer environments and
can be customized. The scripts are included in Appendix B, Custom monitoring scripts on
page 415.
PowerHA version 7 uses the Cluster Aware AIX (CAA) infrastructure for heartbeat functions
across all IP and SAN-based interfaces. With versions 7.1.1 and 7.1.2, heartbeats across IP
interfaces are via a special IP multicast (class D) address. In certain environments,
multicasting is disabled within the Ethernet switch infrastructure; in others, multicast
communications might not be allowed by the network team as a corporate policy. As such,
starting with version 7.1.3, the administrator can switch to unicast for heartbeats. This is
similar to previous versions that used Reliable Scalable Cluster Technology (RSCT).
From a status perspective, be sure that you know whether IP communications are multicast or
unicast. If you are using multicasting and multicasting is disabled within your switch
environment, the IP interfaces appear up to AIX but down to CAA. This is a particularly bad
situation. Query HA -c reports the communication method (multicast or unicast) and the active
status, from a CAA perspective, of all IP interfaces by using the lscluster -m command.
SAN and repository disk communications are a way of providing a non-IP-based network. In
previous releases, the communication was handled via RSCT topology services (topsvcs)
with heartbeats over disk. Now it is handled by CAA, and clstat no longer provides the status
of this non-IP heartbeat communication. It is critical that this status is known and active in a
running cluster. Effective with AIX 7.1 TL3, Query HA -c also provides this status via the clras
command. See Example 11-4.
In addition to the default reporting status of the clstrmgr manager and the resource groups,
Query HA can show the status of network interfaces, non-IP disk heartbeat networks (in
version 5 and 6), online volume groups, running events, application monitoring and CAA
IP/SAN/Disk communication (in version 7). It uses the various flag options that are shown in
Example 11-5 on page 384.
Usage: qha [-n] [-N] [-v] [-l] [-e] [-m] [-1] [-c]
-n displays network interfaces
-N displays network interfaces + nonIP heartbeat disk
-v shows online VGs
-l logs entries to /tmp/qha.out
-e shows running event
-m shows appmon status
-1 single interation <<CORRECT spelling? Should be iteration.>>
-c shows CAA IP/SAN/Disk Status (AIX7.1 TL3 min.)
Example 11-6 shows qha -nevmc running and displaying the network interfaces, such as
running events, online volume groups, application monitors, and the CAA communication
status.
In Figure 11-4, the running qha -nvm shows a failed application monitor.
To set up Query HA, copy the script into any directory in the roots path, for example:
/usr/sbin. Also, qha can be modified to send SNMP traps to a monitoring agent upon state
change. To enable this feature, invoke qha with the -l flag and edit the script at the specific
point as shown in Example 11-7.
The core of the code is based on the previous qha example and amended accordingly. It is
recommended that qha_rmc is invoked from cron.
Figure 11-6 on page 387 shows the qha_rmc snapshot run manually from the command line,
which shows the output from our test clusters.
To set up qha_rmc, copy the script to a suitable location in the users path. Make sure that
unprompted SSH access is configured for each cluster node. To create a cluster definition file,
use this file format, as shown in Example 11-8 on page 388:
Now, edit the script and adjust the global variables as appropriate, as shown in Example 11-9.
Depending on the number of clusters to be monitored, you might have to adjust the SLEEPTIME
variable in the global variables at the start of the script.
liveHA is invoked from the command line (as shown in Example 11-10) and produces both
text and CGI outputs over SSH (in the same operation), in a way that is similar to qha_rmc, as
shown in 11.2.2, Custom example 2: Remote multi-cluster status monitor (qha_rmc) on
page 386. liveHA runs from any OS that supports Korn Shell. In addition to clstat, liveHA
shows the active node being queried, the internal cluster manager status, and the status of
the CAA SAN communications.
Figure 11-8 shows the SMIT screen while monitoring a remote cluster via SSH/SNMP.
#cat clhosts
mhoracle1
mhoracle2
Note: This section requires an understanding of IBM Tivoli Monitoring v6.1.x or later and
the concept of the IBM Tivoli Monitoring agent for UNIX logs.
The /var/hacmp/adm/cluster.log file is the main PowerHA SystemMirror log file. PowerHA
SystemMirror error messages and messages about PowerHA SystemMirror-related events
are appended to this log with the time and date when they occurred.
Note: See the IBM Tivoli Monitoring Information Center for detailed Installation and
configuration guide:
http://ibm.co/UOQxNf
Note: ul is the agent code for the monitoring agent for UNIX logs.
You can ensure that the required application support is installed, as Example 11-13 on
page 392 shows.
Install Tivoli Monitoring agent for UNIX logs in the cluster nodes
Follow these steps to install IBM Tivoli Monitoring Agent in the PowerHA cluster nodes:
1. Install the Tivoli Monitoring agent for UNIX logs (ul) in all the nodes of the PowerHA
cluster.
2. Configure the ul agent to establish connectivity to the monitoring server.
3. Ensure the installation of the agent, as shown in Example 11-14.
Note: $CANDLEHOME refers to the directory where the IBM Tivoli Monitoring components are
installed. Typically, it is this path: /opt/IBM/ITM
IBM Tivoli Monitoring v6.1 and later supports a type of agent called IBM Tivoli Universal
Agent, which is a generic agent of IBM Tivoli Monitoring. In the next sections, monitoring
PowerHA SNMP traps through the Tivoli Universal Agent is explained.
Note: This section requires an understanding of IBM Tivoli Monitoring v6.1.x or later and
the concept of the IBM Tivoli Universal Agent.
The IBM Tivoli Universal Agent extends the performance and availability management
capabilities of IBM Tivoli Monitoring to applications and operating systems not covered by
other IBM Tivoli Monitoring agents. It gives you a single point to manage all of your enterprise
resources and protects your investment in applications and resources.
This scenario is based on using the SNMP Data Provider. It brings the functionality of Simple
Network Management Protocol (SNMP) management capability to IBM Tivoli Monitoring,
which enables you to integrate network management with systems and applications
management. This includes network discovery and trap monitoring.
Through the SNMP Data Provider, the Universal Agent can monitor any industry standard
MIB or any MIB that you supply. Tivoli Monitoring creates Universal Agent applications for you
by converting the MIBs into data definition metafiles. You can then monitor any MIB variable
as an attribute and monitor any SNMP traps that are sent to the data provider.
The cluster manager maintains cluster status information in a special PowerHA SystemMirror
MIB (/usr/es/sbin/cluster/hacmp.my). When the cluster manager starts on a cluster node, it
registers with the SNMP snmpd daemon, and then continually gathers cluster information.
The cluster manager maintains an updated topology of the cluster in the PowerHA
SystemMirror MIB as it tracks events and the resulting states of the cluster.
Important: The default hacmp.my that is installed with the PowerHA cluster file sets for
V7.1.3 has errors that are corrected with the installation of PowerHA V7.1.3 SP1. See
Appendix B, Custom monitoring scripts on page 415 for the file to use in the earlier
versions of PowerHA.
Note: See the IBM Tivoli Monitoring Information Center for detailed Installation and
Configuration guide:
http://ibm.co/UOQxNf
2. A typical /etc/snmpdv3.conf that receives PowerHA SNMP traps has the entries as shown
in Example 11-16.
DEFAULT_SECURITY no-access - -
Note: MyCommunity is the community name used in Example 11-16 on page 396. You
may replace the community name with your own community name or leave the default
community name, Public.
In Example 11-16 on page 396, the target server, 1.1.1.1, is the server where the IBM
Tivoli Universal Agent is installed, as explained in Install the IBM Tivoli Universal Agent
on page 397.
4. Wait for a few seconds for the following line to appear in the
/var/hacmp/log/clstrmgr.debug file:
"smux_simple_open ok, try smux_register()"
5. Ensure the correctness of the SNMP configuration by running the cldump or the clstat
command.
Note: $CANDLEHOME refers to the directory where IBM Tivoli Monitoring components are
installed, which is typically: /opt/IBM/ITM
2. Ensure that the SNMP data provider is enabled through the following line in um.config:
KUMA_STARTUP_DP='ASFS,SNMP'
3. Define the IBM Tivoli Universal Agent application by building the appropriate data
definition metafile.
If you are well-versed in Universal Agent data definition control statements, you may use
the default hacmp.my (/usr/es/sbin/cluster/hacmp.my) to build up the Universal Agent
metafile manually.
Alternatively, you may use MibUtility, which is available from OPAL, to convert the MIB file
(/usr/es/cluster/utilities/hacmp.my) into an IBM Tivoli Monitoring Universal Agent
application. Append the generated trapcnfg_* file to TRAPCNFG file, the location of
which is defined in the $KUM_WORK_PATH environment variable.
Note: The PowerHA.mdl metafile and the trapcnfg trap file are included in Appendix B,
Custom monitoring scripts on page 415 for your reference.
You can now proceed to create appropriate situations for automated event monitoring and
subsequent event escalation.
11.5.1 SNMP v1
Simple Network Management Protocol (SNMP) version 3 is the default version that is used in
the latest releases of the AIX operating system. However, you can use SNMP version 1 and
configure it with the /etc/snmpd.conf file for PowerHA cluster trap monitoring if you prefer.
Additionally, you can use the configuration file as shown in Example 11-20 to integrate
PowerHA cluster traps with the Tivoli Universal Agent as described in section 11.4, PowerHA
cluster SNMP trap monitoring on page 394.
community MyCommunity
#community private 127.0.0.1 255.255.255.255 readWrite
#community system 127.0.0.1 255.255.255.255 readWrite 1.17.2
In Example 11-20 on page 399, 2.2.2.2 is the SNMP Manager that is capable of monitoring
SNMP v3 traps.
The hdisk2 is defined as repository disk, so it is assigned to the caavg_private volume group,
and hdisk5 is defined as backup_repository. Example A-1 shows the configuration.
HACMPsircol:
name = "sas_itso_cl_sircol"
id = 1
uuid = "0"
ip_address = ""
repository = "00f70c9976cc355b"
backup_repository = "00f70c9976cc3613
In the configuration shown in Figure A-1 on page 403, all single failures of a component,
node, or a whole data center could be covered by IBM PowerHA 7.1 within the standard
procedures. In a rolling disaster or a multi-component outage, a situation might occur where
PowerHA is not able to restart the service with standard procedures.
Switches Switches
SAN Heartbeat
SAN Storage 1
SAN Storage 2
Backup
r epository r epository
rootvg rootvg
+ +
datavgs datavgs
An example of an outage where multiple components are involved is described in Figure A-2
on page 405. Datacenter 1 completely fails due to a power outage caused by the network
provider. Cluster node 1 fails, and cluster node 2 loses the repository disk. Example A-2
shows the entries logged in /var/hacmp/log/hacmp.out.
Description
OPERATOR NOTIFICATION
User Causes
ERRLOGGER COMMAND
Recommended Actions
REVIEW DETAILED DATA
Detail Data
MESSAGE FROM ERRLOGGER COMMAND
INFORMATION: Invoked rep_disk_notify event with PID 15204572 +++
In the failure scenario where Datacenter 1 has failed completely (Figure A-2 on page 405),
PowerHA fails over the workload to node 2 in Datacenter 2 and allows the business to
continue. In this case, the cluster operates in the restricted mode, without any repository disk.
It is expected that an administrator notices the repository disk failure or unavailability and
uses the repository disk replacement menus to enable a new repository disk for the cluster.
However, assume that Datacenter 1 failed and, for some reason, when the workload is failing
over or even after it has failed over, node 2 reboots (intentionally or otherwise) without
recreating a new repository. After reboot in that situation, node 2 would not have any
repository disk to start the cluster. This is related to the missing repository disk hosted in
Datacenter 1. Then, it becomes necessary that a repository disk recovery process be initiated
to re-create the repository disk and allow node 2 to start the cluster and workload.
Switches Switches
SAN Heartbeat
SAN Storage 1
SAN Storage 2
Backup
r eposi tory r eposi tory
rootvg rootvg
+ +
datavgs datavgs
In releases before PowerHA 7.1.3, a new CAA cluster definition setup with only the available
node is required before PowerHA including service could be started.
Starting with PowerHA 7.1.3, a new process is available that re-creates the previous cluster
configuration on a new disk. See Table A-1 for caavg_recreate support that is available with
different versions of PowerHA.
< PowerHA 7.1.2 Special procedure during Please contact your support
outage. center.
Note: The APAR file sets must be installed before the outage occurs to use the new
process.
The following section describes the steps require to start the service on the remaining node.
root@b2:/> clRGinfo
Cluster IPC error: The cluster manager on node b2 is in ST_INIT or NOT_CONFIGURED
state and cannot process the IPC request.
HACMPsircol:
name = "sas_itso_cl_sircol"
id = 1
uuid = "0"
In this state, the new support in the clmgr command is able to re-create the repository disk on
a new disk. The new disk must fulfill the same requirements as the old one and cannot be
part of another volume group in the LPAR. The clmgr command (shown in Example A-6) can
be issued to re-create the repository disk on hdisk5.
Note: The replacement of the repository disk might take a while, depending on the cluster
and LPAR configuration.
Subsequently, the repository disk is available and the CAA cluster starts on cluster node b2,
as Example A-7 shows.
Example A-7 Repository disk is changed to hdisk5 and the CAA cluster is available
root@b2:/> lspv
hdisk0 00f6f5d056bafee2 rootvg active
hdisk1 00f6f5d076cce945 None
hdisk3 00f70c9976cc35af sasapp_vg
hdisk4 00f70c9976cc35e2 sasapp_vg
hdisk5 00f70c9976cc3613 caavg_private active
hdisk6 00f70c9976cc3646 None
root@b2:/> lscluster -i
Network/Storage Interface Query
Node b2
Node UUID = 573ae868-52ef-11e3-abd9-7a40c9ce2704
Number of interfaces discovered = 3
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = EE:AF:09:B0:26:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 192.168.100.85 broadcast 192.168.103.255 netmask
255.255.252.0
Node a2
Node UUID = 573ae80e-52ef-11e3-abd9-7a40c9ce2704
Number of interfaces discovered = 0
PowerHA can now be started to make the service available again by using smitty clstart
from the smitty menu or from the command line, using clmgr online node b2 WHEN=now.
Note: Keep in mind that, at this point, you have neither fixed the whole problem nor
synchronized the cluster. After the power is up again in Datacenter 1, node 1 of the cluster
will start with the old repository disk. You must clean up this situation before starting the
service on cluster node 1.
Error description
In the case of a simultaneous node and repository disk failure (for example, when a data
center fails), it might be necessary to replace the repository disk before all nodes are up
again.
A node that is down while the repository disk is replaced continues to access the original
repository disk after reboot.
If the original repository disk is available again, the CAA cluster services start using this disk.
The node remains in the DOWN state. The lscluster -m output is shown in Example A-8.
-----------------------------------------------------
Node name: ha2clA
Cluster shorthand id for node: 2
UUID for node: 1ac309e2-d7ed-11e2-91ce-46fc4000a002
State of node: UP
...
Points of contact for node: 2
------------------------------------------
Interface State Protocol Status
------------------------------------------
en0 UP IPv4 none
en1 UP IPv4 none
To force a previously failed node to use the new repository disk, run these commands on the
affected node:
$ export CAA_FORCE_ENABLED=true
$ clusterconf -fu
Use the lscluster -c command to verify that the CAA cluster services are inactive.
Wait up to 10 minutes for the node to join the CAA cluster again, using the new repository
disk.
Execute the lscluster -c and lscluster -m commands to verify that the CAA has restarted.
Before restarting PowerHA on the affected node, the PowerHA configuration needs to be
synchronized. The synchronization needs to be started at a node that was up while the
repository disk was replaced. Select smitty sysmirror Cluster Nodes and Networks
Verify and Synchronize Cluster Configuration.
If there are multiple nodes available to do so and PowerHA is not up and running on all of
them, choose an active node to start the synchronization.
The sequence to correct the repository disk mismatch for the cluster nodes is described in the
workflow depicted in Figure A-3.
root@a2:/> lscluster -m
Calling node query for all nodes...
Node query number of nodes examined: 2
Node name: a2
Cluster shorthand id for node: 2
UUID for node: 573ae80e-52ef-11e3-abd9-7a40c9ce2704
State of node: DOWN NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of clusters node is a member in: 1
CLUSTER NAME SHID UUID
sas_itso_cl 0 5741f7c0-52ef-11e3-abd9-7a40c9ce2704
SITE NAME SHID UUID
LOCAL 1 51735173-5173-5173-5173-517351735173
Node name: b2
Cluster shorthand id for node: 3
UUID for node: 573ae868-52ef-11e3-abd9-7a40c9ce2704
State of node: UP
Smoothed rtt to node: 25
Mean Deviation in network rtt to node: 18
Number of clusters node is a member in: 1
CLUSTER NAME SHID UUID
sas_itso_cl 0 5741f7c0-52ef-11e3-abd9-7a40c9ce2704
SITE NAME SHID UUID
LOCAL 1 51735173-5173-5173-5173-517351735173
Example A-10 shows node a2 still pointing to the old repository disk.
root@a2:/> echo $?
3
root@a2:/> lscluster -c
lscluster: Cluster services are not active.
If you know which of the disks is the new repository disk, you can issue the command
clusterconf -r <hdiskx> as shown in Example A-12. If no command is issued, the node
waits up to 600 seconds to automatically join cluster.
Example A-12 Issuing the command to use the repository disk if known
root@a2:/> clusterconf -r hdisk5
root@a2:/>
Example A-13 shows the cluster configuration after the node joints the cluster again.
Example A-13 Cluster configuration after the node joint the cluster again
root@a2:/> lscluster -m
Calling node query for all nodes...
Node query number of nodes examined: 2
Node name: a2
Cluster shorthand id for node: 2
UUID for node: 573ae80e-52ef-11e3-abd9-7a40c9ce2704
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of clusters node is a member in: 1
CLUSTER NAME SHID UUID
sas_itso_cl 0 5741f7c0-52ef-11e3-abd9-7a40c9ce2704
SITE NAME SHID UUID
LOCAL 1 51735173-5173-5173-5173-517351735173
----------------------------------------------------------------------------
Node name: b2
Cluster shorthand id for node: 3
UUID for node: 573ae868-52ef-11e3-abd9-7a40c9ce2704
State of node: UP
Smoothed rtt to node: 11
Mean Deviation in network rtt to node: 8
Number of clusters node is a member in: 1
CLUSTER NAME SHID UUID
sas_itso_cl 0 5741f7c0-52ef-11e3-abd9-7a40c9ce2704
SITE NAME SHID UUID
LOCAL 1 51735173-5173-5173-5173-517351735173
The ODM still has the old entries that need to be corrected as shown in Example A-14.
HACMPsircol:
name = "sas_itso_cl_sircol"
id = 1
uuid = "0"
ip_address = ""
repository = "00f70c9976cc355b"
backup_repository = "00f70c9976cc3613"
root@a2:/> lspv
hdisk0 00f70c99540419ff rootvg active
hdisk1 00f70c9975f30ff1 None
hdisk2 00f70c9976cc355b None
hdisk3 00f70c9976cc35af sasapp_vg
hdisk4 00f70c9976cc35e2 sasapp_vg
hdisk5 00f70c9976cc3613 caavg_private active
hdisk6 00f70c9976cc3646 None
Before restarting PowerHA on the affected node, the PowerHA configuration needs to be
synchronized. The synchronization needs to be started at the node that was up while the
repository disk was replaced. Switch to node b2, and sync the cluster as shown in
Example A-15.
Example A-16 shows how node b2 discovers the old repository disk.
Example A-16 Run configuration manager (cfgmgr) on node b2 to discover old repository disk
root@b2:/> lspv
hdisk0 00f6f5d056bafee2 rootvg active
hdisk1 00f6f5d076cce945 None
hdisk3 00f70c9976cc35af sasapp_vg concurrent
hdisk4 00f70c9976cc35e2 sasapp_vg concurrent
hdisk5 00f70c9976cc3613 caavg_private active
hdisk6 00f70c9976cc3646 None
root@b2:/> cfgmgr
root@b2:/> lspv
hdisk0 00f6f5d056bafee2 rootvg active
hdisk1 00f6f5d076cce945 None
hdisk2 00f70c9976cc355b None
hdisk3 00f70c9976cc35af sasapp_vg concurrent
hdisk4 00f70c9976cc35e2 sasapp_vg concurrent
hdisk5 00f70c9976cc3613 caavg_private active
hdisk6 00f70c9976cc3646 None
# qha can be freely distributed. If you have any questions or would like to see
any enhancements/updates, please email [email protected]
# VARS
export PATH=$PATH:/usr/es/sbin/cluster/utilities
VERSION=`lslpp -L |grep -i cluster.es.server.rte |awk '{print $2}'| sed 's/\.//g'`
CLUSTER=`odmget HACMPcluster | grep -v node |grep name | awk '{print $3}' |sed
"s:\"::g"`
UTILDIR=/usr/es/sbin/cluster/utilities
# clrsh dir in v7 must be /usr/sbin in previous version's it's
/usr/es/sbin/cluster/utilities.
# Don't forget also that the rhost file for >v7 is /etc/cluster/rhosts
if [[ `lslpp -L |grep -i cluster.es.server.rte |awk '{print $2}' | cut -d'.' -f1`
-ge 7 ]]; then
CDIR=/usr/sbin
else
CDIR=$UTILDIR
fi
OUTFILE=/tmp/.qha.$$
LOGGING=/tmp/qha.out.$$
ADFILE=/tmp/.ad.$$
HACMPOUT=`/usr/bin/odmget -q name="hacmp.out" HACMPlogs | fgrep value | sed
's/.*=\ "\(.*\)"$/\1\/hacmp.out/'`
COMMcmd="$CDIR/clrsh"
REFRESH=0
usage()
{
echo "qha version 9.06"
echo "Usage: qha [-n] [-N] [-v] [-l] [-e] [-m] [-1] [-c]"
echo "\t\t-n displays network interfaces\n\t\t-N displays network
interfaces + nonIP heartbeat disk\n\t\t-v shows online VGs\n\t\t-l logs entries to
/tmp/qha.out\n\t\t-e shows running event\n\t\t-m shows appmon status\n\t\t-1
single interation\n\t\t-c shows CAA SAN/Disk Status (AIX7.1 TL3 min.)"
}
if [ $HBOD = "TRUE" ]; then # Code for v6 and below only. To be deleted soon.
# Process Heartbeat on Disk networks (Bill Millers code)
VER=`echo $VERSION | cut -c 1`
if [[ $VER = "7" ]]; then
print "[HBOD option not supported]" >> $OUTFILE
fi
HBODs=$($COMMcmd $HANODE "$UTILDIR/cllsif" | grep diskhb | grep -w $HANODE | awk
'{print $8}')
for i in $(print $HBODs)
do
APVID=$($COMMcmd $HANODE "lspv" | grep -w $i | awk '{print $2}' | cut -c 13-)
AHBOD=$($COMMcmd $HANODE lssrc -ls topsvcs | grep -w r$i | awk '{print $4}')
if [ $AHBOD ]
then
printf "\n\t%-13s %-10s" $i"("$APVID")" [activeHBOD]
else
printf "\n\t%-13s %-10s" $i [inactiveHBOD]
fi
done
fi
}
function work
{
HANODE=$1; CNT=$2 NET=$3 VGP=$4
#clrsh $HANODE date > /dev/null 2>&1 || ping -w 1 -c1 $HANODE > /dev/null 2>&1
$COMMcmd $HANODE date > /dev/null 2>&1
if [ $? -eq 0 ]; then
EVENT="";
CLSTRMGR=`$COMMcmd $HANODE lssrc -ls clstrmgrES | grep -i state | sed 's/Current
state: //g'`
if [[ $CLSTRMGR != ST_STABLE && $CLSTRMGR != ST_INIT && $SHOWEVENT = TRUE ]];
then
EVENT=$($COMMcmd $HANODE cat $HACMPOUT | grep "EVENT START" |tail -1 | awk
'{print $6}')
# Main
NETWORK="FALSE"; VG="FALSE"; HBOD="FALSE"; LOG=false; APPMONSTAT="FALSE"; STOP=0;
CAA=FALSE; REMOTE="FALSE";
# Get Vars
while getopts :nNvlem1c ARGs
do
case $ARGs in
n) # -n show interface info
NETWORK="TRUE";;
N) # -N show interface info and activeHBOD
NETWORK="TRUE"; HBOD="TRUE";;
v) # -v show ONLINE VG info
VG="TRUE";;
l) # -l log to /tmp/qha.out
LOG="TRUE";;
e) # -e show running events if cluster is unstable
SHOWEVENT="TRUE";;
m) # -m show status of monitor app servers if present
APPMONSTAT="TRUE";;
1) # -1 exit after first iteration
STOP=1;;
c) # CAA SAN / DISK Comms
CAA=TRUE;;
\?) printf "\nNot a valid option\n\n" ; usage ; exit ;;
esac
done
OO=""
trap "rm $OUTFILE; exit 0" 1 2 12 9 15
while true
do
COUNT=0
print "\\033[H\\033[2J\t\tCluster: $CLUSTER ($VERSION)" > $OUTFILE
echo "\t\t$(date +%T" "%d%b%y)" >> $OUTFILE
if [[ $REMOTE = "TRUE" ]]; then
Fstr=`cat $CLHOSTS |grep -v "^#"`
else
Fstr=`odmget HACMPnode |grep name |sort -u | awk '{print $3}' |sed "s:\"::g"`
fi
for MAC in `echo $Fstr`
cat $OUTFILE
if [ $LOG = "TRUE" ]; then
wLINE=$(cat $OUTFILE |sed s'/^.*Cluster://g' | awk '{print " "$0}' |tr -s
'[:space:]' '[ *]' | awk '{print $0}')
wLINE_three=$(echo $wLINE | awk '{for(i=4;i<=NF;++i) printf("%s ", $i) }')
if [[ ! "$OO" = "$wLINE_three" ]]; then
# Note, there's been a state change, so write to the log
# Alternatively, do something addtional, for example: send an snmp trap
alert, using the snmptrap command. For example:
# snmptrap -c <community> -h <anmp agent> -m "appropriate message"
echo "$wLINE" >> $LOGGING
fi
OO="$wLINE_three"
fi
if [[ STOP -eq 1 ]]; then
exit
fi
sleep $REFRESH
done
# qha_remote can be freely distributed. If you have any questions or would like
to see any enhancements/updates, please email [email protected]
# VARS
export PATH=$PATH:/usr/es/sbin/cluster/utilities
CLHOSTS="/alex/clhosts"
UTILDIR=/usr/es/sbin/cluster/utilities
# clrsh dir in v7 must be /usr/sbin in previous version's it's
/usr/es/sbin/cluster/utilities.
# Don't forget also that the rhost file for >v7 is /etc/cluster/rhosts
OUTFILE=/tmp/.qha.$$
usage()
{
echo "Usage: qha [-n] [-N] [-v] [-l] [-e] [-m] [-1] [-c]"
echo "\t\t-n displays network interfaces\n\t\t-N displays network
interfaces + nonIP heartbeat disk\n\t\t-v shows online VGs\n\t\t-l logs entries to
/tmp/qha.out\n\t\t-e shows running event\n\t\t-m shows appmon status\n\t\t-1
single interation\n\t\t-c shows CAA SAN/Disk Status (AIX7.1 TL3 min.)"
}
function adapters
{
i=1
j=1
cat $ADFILE | while read line
do
en[i]=`echo $line | awk '{print $1}'`
name[i]=`echo $line | awk '{print $2}'`
if [ i -eq 1 ];then printf " ${en[1]} "; fi
if [[ ${en[i]} = ${en[j]} ]]
then
printf "${name[i]} "
else
printf "\n${en[i]} ${name[i]} "
fi
let i=i+1
let j=i-1
done
rm $ADFILE
if [ $HBOD = "TRUE" ]; then # Code for v6 and below only. To be deleted soon.
# Process Heartbeat on Disk networks (Bill Millers code)
VER=`echo $VERSION | cut -c 1`
if [[ $VER = "7" ]]; then
print "[HBOD option not supported]" >> $OUTFILE
fi
HBODs=$($COMMcmd $HANODE "$UTILDIR/cllsif" | grep diskhb | grep -w $HANODE | awk
'{print $8}')
for i in $(print $HBODs)
do
APVID=$($COMMcmd $HANODE "lspv" | grep -w $i | awk '{print $2}' | cut -c 13-)
AHBOD=$($COMMcmd $HANODE lssrc -ls topsvcs | grep -w r$i | awk '{print $4}')
if [ $AHBOD ]
then
printf "\n\t%-13s %-10s" $i"("$APVID")" [activeHBOD]
else
printf "\n\t%-13s %-10s" $i [inactiveHBOD]
fi
done
fi
function initialise
{
if [[ -n $CLUSTER ]]; then return; fi
echo "Initialising..."
HANODE=$1;
$COMMcmd $HANODE date > /dev/null 2>&1
if [ $? -eq 0 ]; then
CLUSTER=`$COMMcmd $HANODE odmget HACMPcluster | grep -v node |grep name | awk
'{print $3}' |sed "s:\"::g"`
VERSION=`$COMMcmd $HANODE lslpp -L |grep -i cluster.es.server.rte |awk '{print
$2}'| sed 's/\.//g'`
fi
}
function work
{
HANODE=$1; CNT=$2 NET=$3 VGP=$4
#clrsh $HANODE date > /dev/null 2>&1 || ping -w 1 -c1 $HANODE > /dev/null 2>&1
$COMMcmd $HANODE date > /dev/null 2>&1
if [ $? -eq 0 ]; then
EVENT="";
CLSTRMGR=`$COMMcmd $HANODE lssrc -ls clstrmgrES | grep -i state | sed 's/Current
state: //g'`
if [[ $CLSTRMGR != ST_STABLE && $CLSTRMGR != ST_INIT && $SHOWEVENT = TRUE ]];
then
EVENT=$($COMMcmd $HANODE cat $HACMPOUT | grep "EVENT START" |tail -1 | awk
'{print $6}')
printf "\n%-8s %-7s %-15s\n" $HANODE iState: "$CLSTRMGR [$EVENT]"
else
printf "\n%-8s %-7s %-15s\n" $HANODE iState: "$CLSTRMGR"
fi
# RG status
if [[ $APPMONSTAT = "TRUE" ]]; then
$COMMcmd $HANODE "
$UTILDIR/clfindres -s 2>/dev/null | grep ONLINE | grep $HANODE | awk -F':'
'{print \$1}' | while read RG
do
$UTILDIR/clfindres -m "\$RG" 2>/dev/null | egrep -v '(---|Group Name)' | sed
's/ */ /g' | sed '/^$/d' | awk '{printf}'
echo ""
done
" | awk '{printf " "$1"\t"$2" ("; for (i=4; i <= NF; i++) printf FS$i; print
")" }' | sed 's/( /(/g'
else
$COMMcmd $HANODE $UTILDIR/clfindres -s 2>/dev/null |grep -v OFFLINE | while read
A
do
if [[ "`echo $A | awk -F: '{print $3}'`" == "$HANODE" ]];
then
echo $A | awk -F: '{printf " %-18.16s %-10.12s %-1.20s\n", $1, $2,
$9}'
fi
COUNT=0; OO=""
trap "rm $OUTFILE; exit 0" 1 2 12 9 15
while true
do
Fstr=`cat $CLHOSTS |grep -v "^#"`
if [[ COUNT -eq 0 ]]; then
for MAC in `echo $Fstr`; do
initialise $MAC
done
fi
print "\\033[H\\033[2J\t\tCluster: $CLUSTER ($VERSION)" > $OUTFILE
echo "\t\t$(date +%T" "%d%b%y)" >> $OUTFILE
for MAC in `echo $Fstr`
do
let COUNT=COUNT+1
work $MAC $COUNT $NETWORK $VG $HBOD
done >> $OUTFILE
cat $OUTFILE
if [ $LOG = "TRUE" ]; then
wLINE=$(cat $OUTFILE |sed s'/^.*Cluster://g' | awk '{print " "$0}' |tr -s
'[:space:]' '[ *]' | awk '{print $0}')
wLINE_three=$(echo $wLINE | awk '{for(i=4;i<=NF;++i) printf("%s ", $i) }')
if [[ ! "$OO" = "$wLINE_three" ]]; then
# Note, there's been a state change, so write to the log
# Alternatively, do something addtional, for example: send an snmp trap
alert, using the snmptrap command. For example:
# snmptrap -c <community> -h <anmp agent> -m "appropriate message"
usage()
{
echo "\nUsage: qhar\n"
}
echo "</body>"
echo "</html>"
.
wq
EOF
chmod 755 $CGIFILE
}
function work
{
lssrc -ls clstrmgrES |grep -i state |sed 's:Current:$HANODE:g' |sed "s:state: :g"
/usr/es/sbin/cluster/utilities/clfindres -s 2>/dev/null |grep -v OFFLINE
# Main
format_cgi
rm $OUTFILE*
for clusternumber in `cat $CLUSTERfile | grep "^cluster:" | cut -f2 -d:`
do
NODES=`grep "^cluster:$clusternumber:" $CLUSTERfile | cut -f3 -d:`
echo "\t\tCluster: $clusternumber " >> $OUTFILE.$clusternumber
for MAC in $NODES
do
work $MAC
done >> $OUTFILE.$clusternumber &
done
sleep $SLEEPTIME
# got to wait for jobs to be completed, this time may have to be tuned depending
on the no. of clusters
cat $OUTFILE*
# add the outfiles to the cgi
ONE=TRUE
for f in $OUTFILE*
do
# sed the file # hack as aix/sed does not have -i
cat $f | sed 's:ONLINE:<font color="#00FF00">ONLINE<font color="#000000">:g' \
| sed 's:ST_STABLE:<font color="#00FF00">UP \& STABLE<font color="#000000">:g' \
| sed 's:ERROR:<font color="#FF0000">ERROR!<font color="#000000">:g' \
| sed 's:ST_RP_FAILED:<font color="#FF0000">SCRIPT FAILURE!<font
color="#000000">:g' \
| sed 's:ST_INIT:<font color="#2B65EC">NODE DOWN<font color="#000000">:g' \
| sed 's:SECONDARY:<font color="#2B65EC">SECONDARY<font color="#000000">:g' \
| sed 's:ST_JOINING:<font color="#2B65EC">NODE JOINING<font color="#000000">:g' \
| sed 's:ST_VOTING:<font color="#2B65EC">CLUSTER VOTING<font color="#000000">:g' \
usage()
{
printf "Usage: $PROGNAME [-n] [-1] [-i]\n"
printf "\t-n Omit Network info\n"
printf "\t-1 Display 1 report rather than loop\n"
printf "\t-i Displays the internal state of cluster manager\n"
printf "\t-c Displays the state of SAN Communications\n"
printf "Note: By default unprompted ssh must be configured from\n"
printf " the client monitor to each cluster node\n"
exit 1
}
###############################################################################
#
# Global VARs
#
###############################################################################
#******ONLY alter the code below this line, if you want to change******
#********************this behaviour of this script*********************
INTERNAL=0
PROGNAME=$(basename ${0})
#HA_DIR="$(cl_get_path)"
###############################################################################
#
# Name: format_cgi
#
# Create the cgi (on the fly!)
#
###############################################################################
format_cgi()
{
if [ -f $CGIFILE ]; then rm $CGIFILE; fi
touch $CGIFILE
ex -s $CGIFILE <<EOF
a
#!/usr/bin/ksh
print "Content-type: text/html\n";
###############################################################################
#
# Name: print_address_info
#
# Prints the address information for the node and network given in the
# environment
#
###############################################################################
print_address_info()
{
[[ "$VERBOSE_LOGGING" = "high" ]] && set -x
if [[ "$address_net_id" = "$net_id" ]]
then
active_node=$(echo "$ADDRESS_MIB_FUNC" | grep -w
"$ADDRESS_ACTIVE_NODE.$node_id.$address" | cut -f3 -d" ")
if [[ "$active_node" = $node_id ]]
then
address_label=$(echo "$ADDRESS_MIB_FUNC" | grep -w
"$ADDRESS_LABEL.$node_id.$address" | cut -f2 -d\")
address_state=$(echo "$ADDRESS_MIB_FUNC" | grep -w
"$ADDRESS_STATE.$node_id.$address" | cut -f3 -d" ")
printf "\t%-15s %-20s " $address $address_label
case $address_state in
2)
printf "UP\n"
;;
4)
printf "DOWN\n"
;;
*)
done
}
###############################################################################
#
# Name: print_rg_info
#
# Prints the online RG status info.
#
###############################################################################
print_rg_info()
{
i=1;
RGONSTAT=`echo "$CLUSTER_MIB" | grep -w "$node_name" |egrep -w
"(ONLINE|ERROR|ACQUIRING|RELEASING)" | while read A
do
if [ i -eq 1 ];then printf "\n\tResource Group(s) active on
$node_name:\n"; fi
echo "$A" | awk -F: '{printf "\t %-15s %-10s %-10s\n", $1, $2, $9}'
let i=i+1
done`
#if [ $i -gt 1 ]; then printf "$RGONSTAT\n"; fi
echo $RGONSTAT > /dev/null 2>&1
#echo $RGONSTAT | grep ONLINE > /dev/null 2>&1
#printf "$RGONSTAT\n"
if [ $? -eq 0 ]
then
printf "$RGONSTAT\n"
fi
}
###############################################################################
#
# Name: print_network_info
#
# Prints the network information for the node given in the environment
#
###############################################################################
print_network_info()
{
[[ "$VERBOSE_LOGGING" = "high" ]] && set -x
PRINT_IP_ADDRESS="true"
print_address_info
done
}
###############################################################################
#
# Name: print_node_info
#
# Prints the node information for each node found in the MIB
#
###############################################################################
print_node_info()
{
NODE_ID_COUNTER=0
let NODE_ID_COUNTER=NODE_ID_COUNTER+1
echo ""
printf "Node : $formatted_node_name State: " "$formatted_node_name"
if [ INTERNAL -eq 1 ]; then
internal_state=`ssh $SSHparams $USER@$node_name lssrc -ls clstrmgrES
2>/dev/null |grep -i state |awk '{print $3}'`
finternal_state=`echo "($internal_state)"`
fi
case $node_state in
2)
printf "UP $finternal_state\n"
;;
4)
printf "DOWN $finternal_state\n"
;;
32)
printf "JOINING $finternal_state\n"
;;
64)
printf "LEAVING $finternal_state\n"
;;
esac
let cluster_num_nodes=cluster_num_nodes-1
done
###############################################################################
#
# Name: print_cluster_info
#
# Prints the cluster information for the cluster found in the MIB of which
# this node is a member.
#
###############################################################################
print_cluster_info ()
{
HANODE=$1
case $cluster_state in
2)
cs="UP"
;;
4)
cs="DOWN"
;;
esac
case $cluster_substate in
4)
css="DOWN"
;;
8)
css="UNKNOWN"
;;
16)
css="UNSTABLE"
;;
2 | 32)
css="STABLE"
;;
64)
css="ERROR"
;;
128)
css="RECONFIG"
print_node_info
echo "\n"
###############################################################################
# Main
###############################################################################
format_cgi
while true
do
for NODE in `cat $CLHOSTS |grep -v "^#"`
do
SUCCESS=1
while [ SUCCESS -eq 1 ]
do
#ping -w 1 -c1 $NODE > /dev/null 2>&1
ssh $SSHparams ${USER}@${NODE} date > /dev/null 2>&1
if [ $? -eq 0 ]; then
# get the snmp info
CLUSTER_MIB=`ssh $SSHparams $USER@$NODE "snmpinfo -c $SNMPCOMM -m dump -o
/usr/es/sbin/cluster/hacmp.defs cluster
snmpinfo -c $SNMPCOMM -m dump -o /usr/es/sbin/cluster/hacmp.defs network
snmpinfo -c $SNMPCOMM -m dump -o /usr/es/sbin/cluster/hacmp.defs node
snmpinfo -c $SNMPCOMM -m dump -o /usr/es/sbin/cluster/hacmp.defs address
/usr/es/sbin/cluster/utilities/clfindres -s 2> /dev/null"`
# is there any snmp info?
snmpinfocheck=`echo $CLUSTER_MIB |grep $CLUSTER_BRANCH`
exit 0
---------------------------------------------------------------------------
--
--
-- C) The root of the RISC6000CLSMUXPD-MIB is as follows:
--
ibm OBJECT IDENTIFIER ::= { enterprises 2 }
ibmAgents OBJECT IDENTIFIER ::= { ibm 3 }
aix OBJECT IDENTIFIER ::= { ibmAgents 1 }
aixRISC6000 OBJECT IDENTIFIER ::= { aix 2 }
risc6000agents OBJECT IDENTIFIER ::= { aixRISC6000 1 }
risc6000clsmuxpdOBJECT IDENTIFIER ::= { risc6000agents 5 }
--
clusterOBJECT IDENTIFIER ::= { risc6000clsmuxpd 1 }
nodeOBJECT IDENTIFIER ::= { risc6000clsmuxpd 2 }
addressOBJECT IDENTIFIER ::= { risc6000clsmuxpd 3 }
networkOBJECT IDENTIFIER ::= { risc6000clsmuxpd 4 }
--
clstrmgrOBJECT IDENTIFIER ::= { risc6000clsmuxpd 5 }
cllockdOBJECT IDENTIFIER ::= { risc6000clsmuxpd 6 }
clinfoOBJECT IDENTIFIER ::= { risc6000clsmuxpd 7 }
--
applicationOBJECT IDENTIFIER ::= { risc6000clsmuxpd 8 }
--
clsmuxpdOBJECT IDENTIFIER ::= { risc6000clsmuxpd 9 }
eventOBJECT IDENTIFIER ::= { risc6000clsmuxpd 10 }
--
resmanager OBJECT IDENTIFIER ::= { risc6000clsmuxpd 11 }
site OBJECT IDENTIFIER ::= { risc6000clsmuxpd 12 }
--
address6 OBJECT IDENTIFIER ::= { risc6000clsmuxpd 13 }
--
-- II. The Cluster Group
--
-- A) clusterId
-- This field is read from the HACMP for AIX object repository.
--
clusterIdOBJECT-TYPE
SYNTAXINTEGER
trapClusterStateTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ clusterState, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever the cluster changes state."
::= 10
--
-- E) clusterPrimary
-- This field is returned by the clstrmgr.
-- Status is deprecated as lock manager is no longer supported.
--
--
clusterPrimaryOBJECT-TYPE
SYNTAXINTEGER
--
-- F) clusterLastChange
-- This field is a integer string returned by the gettimeofday()
-- library call and is updated if any cluster, node,
-- or address information changes.
--
clusterLastChangeOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"Time in seconds of last change in this cluster."
::= { cluster 6 }
--
-- G) clusterGmtOffset
-- This field is a integer string returned by the gettimeofday()
-- library call and is updated if any cluster, node,
-- or address information changes.
--
clusterGmtOffsetOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"Seconds west of GMT for the time of last change in this cluster."
::= { cluster 7 }
--
-- H) clusterSubState
-- This field is returned by the clstrmgr.
--
--
clusterSubStateOBJECT-TYPE
SYNTAXINTEGER { unstable(16), error(64),
stable(32), unknown(8), reconfig(128),
notconfigured(256), notsynced(512) }
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The cluster substate"
::= { cluster 8 }
trapClusterSubStateTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ clusterSubState, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever the cluster changes substate."
::= 11
--
trapNewPrimaryTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ clusterPrimary, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever the primary node changes."
::= 15
--
-- K) clusterNumNodes
-- This field is returned by the clstrmgr.
--
--
clusterNumNodes OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The number of nodes in the cluster"
::= { cluster 11 }
--
-- L) clusterNodeId
-- This field is read from the HACMP for AIX object repository.
--
clusterNodeIdOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The ID of the local node"
::= { cluster 12 }
--
-- M) clusterNumSites
-- This field is returned by the clstrmgr.
--
--
-- III. The node group
--
-- A) The node table
-- This is a variable length table which is indexed by
-- the node Id.
--
nodeTableOBJECT-TYPE
SYNTAXSEQUENCE OF NodeEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"A series of Node descriptions"
::= { node 1 }
--
nodeEntryOBJECT-TYPE
SYNTAXNodeEntry
ACCESSnot-accessible
STATUSmandatory
INDEX{ nodeId }
::= { nodeTable 1 }
--
NodeEntry::= SEQUENCE {
nodeIdINTEGER,
nodeStateINTEGER,
nodeNumIfINTEGER,
nodeNameDisplayString,
nodeSiteDisplayString
}
--
-- B) nodeId
-- This field is read from the HACMP for AIX object repository.
--
nodeIdOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The ID of the Node"
::= { nodeEntry 1 }
--
-- C) nodeState
-- This row is returned by the clstrmgr.
--
trapNodeStateTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeState, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever a node changes state."
::= 12
--
-- D) nodeNumIf
-- This row is returned by the clstrmgr.
--
--
nodeNumIfOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The number of network interfaces in this node"
::= { nodeEntry 3 }
--
-- E) nodeName
-- This row is returned by the clstrmgr.
--
--
nodeNameOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The name of this node"
::= { nodeEntry 4 }
nodeSiteOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The site associated with this node"
::= { nodeEntry 5 }
--
--
-- The site group
--
-- A) The site table
trapSiteStateTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ siteState, clusterId, siteId }
DESCRIPTION
"Fires whenever a site changes state."
::= 18
--
--
--
--
-- H) addrActiveNode
-- This field is returned from the Cluster Manager.
--
--
--
-- V. The network group
--
-- A) The network table
-- This is a variable length table index by node Id
-- and network Id.
--
netTableOBJECT-TYPE
SYNTAXSEQUENCE OF NetEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"A series of Network descriptions"
::= { network 1 }
--
netEntryOBJECT-TYPE
SYNTAXNetEntry
ACCESSnot-accessible
STATUSmandatory
INDEX{ netNodeId, netId }
::= { netTable 1 }
--
NetEntry::= SEQUENCE {
netNodeIdINTEGER,
netId INTEGER,
netNameDisplayString,
netAttributeINTEGER,
netStateINTEGER,
netODMidINTEGER,
netTypeDisplayString,
netFamily INTEGER
}
--
trapNetworkStateTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ netState, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever a network changes state."
::= 13
--
-- G) netODMid
-- This field is read from the HACMP for AIX object repository.
--
netODMidOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The ODM id of the network"
::= { netEntry 6 }
--
-- H) netType
-- This field is read from the HACMP for AIX object repository.
-- It indicates the physical type of the network: ethernet, token
-- ring, ATM, etc.
--
netTypeOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The physical network type, e.g. ethernet"
::= { netEntry 7 }
--
-- I) netFamily
-- This field is read from the HACMP for AIX object repository.
-- It indicates if the HACMP-network is a INET/INET6/Hybrid network.
--
netFamily OBJECT-TYPE
SYNTAX INTEGER { unknown(0), clinet(1), clinet6(2), clhybrid(3) }
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Family of the network."
::= { netEntry 8 }
--
--
--
--
-- VI. The Cluster Manager (clstrmgr) group
--
--
--
-- VIII. The Client Information Daemon (clinfo) group
--
-- A) The clinfo table
-- This is a variable length table which is indexed by
-- the node Id.
--
clinfoTableOBJECT-TYPE
SYNTAXSEQUENCE OF ClinfoEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"A series of clinfo process entries"
::= { clinfo 1 }
--
clinfoEntryOBJECT-TYPE
SYNTAXClinfoEntry
ACCESSnot-accessible
STATUSmandatory
INDEX{ clinfoNodeId }
::= { clinfoTable 1 }
--
ClinfoEntry::= SEQUENCE {
clinfoNodeIdINTEGER,
clinfoVersionDisplayString,
clinfoStatusINTEGER
}
--
-- B) clinfoNodeId
-- This field is the cluster node id running the clinfo daemon.
--
clinfoNodeIdOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
--
-- E) appVersion
-- This field is passed to the cl_registerwithclsmuxpd() routine.
--
appVersionOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The version of the application"
::= { appEntry 4 }
--
-- trapAppState
-- This fires whenever the state of the application changes.
-- Note that this is based on application's socket connection
-- with the clsmuxpd daemon: when the socket is active, the
-- application is considered up, otherwise its down.
--
trapAppStateTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ appName, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever an application is added or deleted."
::= 16
--
--
-- X. The Resource Group
-- Contains information about cluster resources and resource groups.
--
-- A) The Resource Group Table
--
resGroupTableOBJECT-TYPE
SYNTAXSEQUENCE OF ResGroupEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"A series of Resource Group descriptions"
::= { resmanager 1 }
--
resGroupEntryOBJECT-TYPE
SYNTAXResGroupEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"Individual Resource Group description"
INDEX { resGroupId }
::= { resGroupTable 1 }
--
ResGroupEntry::= SEQUENCE {
resGroupIdINTEGER,
resGroupNameDisplayString,
resGroupPolicyINTEGER,
resGroupUserPolicyNameDisplayString,
--
-- B) Resource Group Id
--
--
resGroupIdOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The ID of the Resource Group"
::= { resGroupEntry 1 }
trapRGAddTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ resGroupId }
DESCRIPTION
"Fires whenever a resource group is added."
::= 20
trapRGDelTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ resGroupId }
DESCRIPTION
"Fires whenever a resource group is deleted."
::= 21
--
-- C) Resource Group Name
--
--
resGroupNameOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The name of the Resource Group"
::= { resGroupEntry 2 }
--
-- D) Resource Group Policy
--
--
resGroupPolicyOBJECT-TYPE
SYNTAXINTEGER {
cascading(1),
rotating(2),
concurrent(3),
userdefined(4),
custom(5)
}
ACCESSread-only
--
-- E) Resource Group User-Defined Policy Name
--
--
resGroupUserPolicyNameOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The name of the user-defined policy"
::= { resGroupEntry 4 }
--
-- F) Number of Resources in a Resource Group
--
--
resGroupNumResourcesOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The number of resources defined in the group"
::= { resGroupEntry 5 }
--
-- G) Number of Participating Nodes in a Resource Group
--
--
resGroupNumNodesOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The number of participating nodes in the group"
::= { resGroupEntry 6 }
trapRGChangeTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ resGroupId, resGroupPolicy,
resGroupNumResources, resGroupNumNodes }
DESCRIPTION
"Fires whenever the policy, number of nodes,
or the number of resources of a resource
group is changed."
::= 22
--
-- H) Resource Group's Startup Policy
--
--
--
-- I) Resource Group's Fallover Policy
--
--
resGroupFalloverPolicyOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The Resource Group's Fallover Policy
This can have the following values:
Fallover To Next Priority Node On the List - 5
Fallover Using DNP - 6
Bring Offline - 7"
::= { resGroupEntry 8 }
--
-- J) Resource Group's Fallback Policy
--
--
resGroupFallbackPolicyOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The Resource Group's Fallback Policy
Fallback to Higher Priority Node in the List - 8
Never Fallback - 9"
::= { resGroupEntry 9 }
--
--
-- XI. The Resources
--
-- A) The Resource Table
--
--
resTableOBJECT-TYPE
SYNTAXSEQUENCE OF ResEntry
ACCESSnot-accessible
STATUSmandatory
--
-- C) Resource Id
-- This is stored in the hacmp configuration.
--
resourceIdOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The ID of the Resource"
::= { resEntry 2 }
--
-- D) Resource Name
-- User supplied name, e.g. "Ora_vg1" or "app_serv1"
--
resourceNameOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The name of this resource"
::= { resEntry 3 }
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The Type of the Resource"
::= { resEntry 4 }
--
-- XII. The Resource Group Node State
--
-- A) The Resource Group Node State Table
-- The participating nodes and the current location of a given
resource
-- group are determined and maintained via this table and indexed by
-- resource group ID and node ID.
--
--
resGroupNodeTableOBJECT-TYPE
SYNTAXSEQUENCE OF ResGroupNodeEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"A series of resource group and associated node state descriptions"
::= { resmanager 3 }
--
resGroupNodeEntryOBJECT-TYPE
SYNTAXResGroupNodeEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"Individual resource group/node state descriptions"
INDEX { resGroupNodeGroupId, resGroupNodeId }
::= { resGroupNodeTable 1 }
--
ResGroupNodeEntry::= SEQUENCE {
resGroupNodeGroupIdINTEGER,
resGroupNodeIdINTEGER,
--
-- B) The Resource Group Id
-- Cluster wide unique id assigned by hacmp.
--
--
resGroupNodeGroupId OBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The ID of the resource group"
::= { resGroupNodeEntry 1 }
--
-- C) The Participating Node Id
-- Node id of each node in the group.
--
--
resGroupNodeIdOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
DESCRIPTION
"Node ID of node located within resource group"
::= { resGroupNodeEntry 2 }
--
-- D) The Resource Group Node State
-- State of the group on each node participating in the group.
--
--
resGroupNodeStateOBJECT-TYPE
SYNTAXINTEGER {
online(2),
offline(4),
unknown(8),
acquiring(16),
releasing(32),
error(64),
onlineSec (256),
acquiringSec (1024),
releasingSec (4096),
errorsec (16384),
offlineDueToFallover (65536),
offlineDueToParentOff (131072),
offlineDueToLackOfNode (262144),
unmanaged(524288),
unmanagedSec(1048576)
-- offlineDueToNodeForcedDown(2097152)
trapRGState TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ resGroupNodeGroupId, resGroupNodeId,
resGroupNodeState, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever a resource group changes
state on a particular node."
::= 23
--
--
-- XIII. The clsmuxpd group
-- Various statistics maintained by the smux peer daemon.
--
-- A) clsmuxpdGets
-- Incremented on each get request.
--
clsmuxpdGetsOBJECT-TYPE
SYNTAXCounter
ACCESSread-only
STATUSmandatory
DESCRIPTION
"Number of get requests received"
::= { clsmuxpd 1 }
--
-- B) clsmuxpdGetNexts
-- Incremented on each get-next request.
--
clsmuxpdGetNextsOBJECT-TYPE
SYNTAXCounter
ACCESSread-only
STATUSmandatory
DESCRIPTION
"Number of get-next requests received"
::= { clsmuxpd 2 }
--
-- C) clsmuxpdSets
-- Incremented on each set request.
-- Note that the smux does not durrently support set requests.
--
clsmuxpdSetsOBJECT-TYPE
SYNTAXCounter
ACCESSread-only
STATUSmandatory
DESCRIPTION
"Number of set requests received"
::= { clsmuxpd 3 }
--
-- D) clsmuxpdTraps
--
-- State Event traps
--
trapSwapAdapterTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, addrLabel, eventCount }
DESCRIPTION
"Specified node generated swap adapter event"
::= 64
trapSwapAdapterCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated swap adapter complete event"
::= 65
trapJoinNetworkTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node has joined the network"
::= 66
trapFailNetworkTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated fail network event"
::= 67
trapJoinNetworkCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated join network complete event"
::= 68
trapFailNetworkCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated fail network complete event"
::= 69
trapFailNodeTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated fail join node event"
::= 71
trapJoinNodeCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated join node complete event"
::= 72
trapFailNodeCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName , eventCount}
DESCRIPTION
"Specified node generated fail node complete event"
::= 73
trapJoinStandbyTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated join standby event"
::= 74
trapFailStandbyTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node has failed standby adapter"
::= 75
trapEventNewPrimaryTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, clusterPrimaryNodeName, eventCount}
DESCRIPTION
"Specified node has become the new primary"
::= 76
trapClusterUnstableTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated cluster unstable event"
trapClusterStableTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated cluster stable event"
::= 78
trapConfigStartTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Configuration procedure has started for specified node"
::= 79
trapConfigCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Configuration procedure has completed for specified node"
::= 80
trapClusterConfigTooLongTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node has been in configuration too long"
::= 81
--
-- Note that this event is no longer used and this trap will never occur.
--
trapClusterUnstableTooLongTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node has been unstable too long"
::= 82
trapEventErrorTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Specified node generated an event error"
::= 83
trapDareTopologyTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Dynamic reconfiguration event for topology has been issued"
::= 84
trapDareTopologyStartTRAP-TYPE
trapDareTopologyCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Dynamic reconfiguration event for topology has completed"
::= 86
trapDareResourceTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Dynamic reconfiguration event for resource has been issued"
::= 87
trapDareResourceReleaseTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Dynamic reconfiguration event for resource has been released"
::= 88
trapDareResourceAcquireTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Dynamic reconfiguration event for resource has been acquired"
::= 89
trapDareResourceCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Dynamic reconfiguration event for resource has completed"
::= 90
trapFailInterfaceTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Interface has failed on the event node"
::= 91
trapJoinInterfaceTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Interface has joined on the event node"
::= 92
trapServerRestart TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Server has been restarted on the event node"
::= 94
trapServerRestartComplete TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Server restart is complete on the event node"
::= 95
trapServerDown TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Server has failed on the event node"
::= 96
trapServerDownComplete TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, netName, eventCount }
DESCRIPTION
"Server has failed on the event node"
::= 97
trapSiteDown TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Site failed"
::= 98
trapSiteDownComplete TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Site failure complete on the event site"
::= 99
trapSiteUp TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Site is now up"
::= 100
trapSiteMerge TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Site has merged with the active site"
::= 102
trapSiteMergeComplete TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Site merge is complete on the event site"
::= 103
trapSiteIsolation TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Site is isolated"
::= 104
trapSiteIsolationComplete TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Site isoaltion is complete on the event site"
::= 105
trapClusterNotify TRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Cluster Notify event has occurred on event node"
::= 106
trapResourceStateChangeTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Resource State Change event has occurred on event node"
::= 107
trapResourceStateChangeCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"Resource State Change Complete event has occurred on event node"
trapExternalResourceStateChangeTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"External Resource State Change event has occurred on event node"
::= 109
trapExternalResourceStateChangeCompleteTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ nodeName, clusterName, siteName, eventCount }
DESCRIPTION
"External Resource State Change Complete event has occurred on event node"
::= 110
--
-- XV. The Resource Group Dependency Configuration
-- Contains information about cluster resources group dependencies.
--
-- A) The Resource Group Dependency Table
--
resGroupDependencyTableOBJECT-TYPE
SYNTAXSEQUENCE OF ResGroupDependencyEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"A series of Resource Group Dependency descriptions"
::= { resmanager 4 }
--
resGroupDependencyEntryOBJECT-TYPE
SYNTAXResGroupDependencyEntry
ACCESSnot-accessible
STATUSmandatory
DESCRIPTION
"Individual Resource Group Dependency description"
INDEX { resGroupDependencyId }
::= { resGroupDependencyTable 1 }
--
ResGroupDependencyEntry::= SEQUENCE {
resGroupDependencyIdINTEGER,
resGroupNameParentDisplayString,
resGroupNameChildDisplayString,
resGroupDependencyTypeDisplayString,
resGroupDependencyTypeIntINTEGER
}
--
-- B) Resource Group Dependency Id
resGroupDependencyIdOBJECT-TYPE
SYNTAXINTEGER
ACCESSread-only
STATUSmandatory
trapRGDepAddTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ resGroupDependencyId }
DESCRIPTION
"Fires when a new resource group dependency is added."
::= 30
trapRGDepDelTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ resGroupDependencyId }
DESCRIPTION
"Fires when a new resource group dependency is deleted."
::= 31
trapRGDepChangeTRAP-TYPE
ENTERPRISErisc6000clsmuxpd
VARIABLES{ resGroupDependencyId, resGroupNameParent,
resGroupNameChild, resGroupDependencyType,
resGroupDependencyTypeInt }
DESCRIPTION
"Fires when an resource group dependency is changed."
::= 32
--
-- C) Resource Group Name Parent
--
--
resGroupNameParentOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The name of the Parent Resource Group"
::= { resGroupDependencyEntry 2 }
--
-- D) Resource Group Name
--
--
resGroupNameChildOBJECT-TYPE
SYNTAXDisplayString
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The name of the Child Resource Group"
::= { resGroupDependencyEntry 3 }
--
-- E) Resource Group Dependency Type
--
--
-- F) Resource Group Dependency Policy
--
--
resGroupDependencyTypeIntOBJECT-TYPE
SYNTAXINTEGER {
globalOnline(0)
}
ACCESSread-only
STATUSmandatory
DESCRIPTION
"The type of the Resource Group Dependency"
::= { resGroupDependencyEntry 5 }
--
--
-- XVI. The address6 group
--
-- A) The address6 table
-- This is a variable length table which is indexed by
-- the node Id, inet_type, octet_count, ip address (in octet form) and
-- prefix length.
--
addr6Table OBJECT-TYPE
SYNTAX SEQUENCE OF Addr6Entry
ACCESS not-accessible
STATUS mandatory
DESCRIPTION
"A series of IPv4/v6 address descriptions"
::= { address6 1 }
--
addr6Entry OBJECT-TYPE
SYNTAX Addr6Entry
ACCESS not-accessible
STATUS mandatory
INDEX { addr6NodeId, addr6InetType, addr6OctetCount, addr6Address,
addr6PrefixLength }
::= { addr6Table 1 }
--
Addr6Entry ::= SEQUENCE {
addr6NodeId INTEGER,
addr6InetType INTEGER,
trapAddressState TRAP-TYPE
ENTERPRISE risc6000clsmuxpd
VARIABLES { addr6State, addr6NetId, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever a address changes state."
::= 14
trapAdapterSwap TRAP-TYPE
ENTERPRISE risc6000clsmuxpd
VARIABLES { addr6State, clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever a address swap occurs."
::= 17
--
-- K) addr6ActiveNode
-- This field is returned from the Cluster Manager.
--
addr6ActiveNode OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The ID of the Node on which this IP address is active"
::= { addr6Entry 10 }
--
-- L) oldAddr6ActiveNode
-- This field is returned from the Cluster Manager.
--
oldAddr6ActiveNode OBJECT-TYPE
SYNTAX INTEGER
ACCESS not-accessible
STATUS mandatory
DESCRIPTION
"The ID of the Node on which this IP address was previously
active"
::= { addr6Entry 11 }
trapAddressTakeover TRAP-TYPE
ENTERPRISE risc6000clsmuxpd
VARIABLES { addr6ActiveNode, oldAddr6ActiveNode,
clusterId, clusterNodeId }
DESCRIPTION
"Fires whenever IP address takeover occurs."
::= 19
END
You may use externally available tools for the conversion of MIB to the Tivoli Monitoring MDL
file. The MibUtility, which is available in OPAL, is a common tool that you can use for the
conversion. Alternatively, you can write your own definition file, based on your understanding
of the MIB.
Example B-6 shows a sample data definition metafile (PowerHA.mdl) that may be loaded into
Tivoli Monitoring for PowerHA SNMP monitoring.
* IBM Corp.
* This file was created by the IBM Tivoli Monitoring Agent Builder
* Version 6.1.0
* ----------------------------------------------------------------
//SNMP TEXT
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
//ATTRIBUTES
Agent_Info D 128 0.0 @Identifies the SNMP host name and community names for agents
to query.
Agent_Name D 64 KEY 0.0 @Identifies the SNMP host name relating to a particular
sample of data.
risc6000clsmuxpd {1.3.6.1.4.1.2.3.1.2.1.5}
trapClusterSubState {1.3.6.1.4.1.2.3.1.2.1.5} 6 11 A 1 0 "Status Events"
SDESC
Fires whenever the cluster changes substate.
EDESC
trapClusterStable {1.3.6.1.4.1.2.3.1.2.1.5} 6 78 A 1 0 "Status Events"
SDESC
Specified node generated cluster stable event
EDESC
trapFailNetworkComplete {1.3.6.1.4.1.2.3.1.2.1.5} 6 69 A 1 0 "Status Events"
SDESC
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
IBM PowerHA SystemMirror 7.1.2 Enterprise Edition for AIX, SG24-8106
IBM PowerHA SystemMirror Standard Edition 7.1.1 for AIX Update, SG24-8030
Deploying PowerHA Solution with AIX HyperSwap, REDP-4954
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Other publications
These publications are also relevant as further information sources:
RSCT Version 3.1.2.0 Administration Guide, SA22-7889
Online resources
These websites are also relevant as further information sources:
PowerHA SystemMirror Concepts
http://public.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.powerha.con
cepts/hacmpconcepts_pdf.pdf
PowerHA SystemMirror system management C-SPOC
http://public.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.powerha.con
cepts/ha_concepts_install_config_manage.htm/
HyperSwap for PowerHA SystemMirror in the IBM Knowledge Center
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.ai
x.powerha.pprc%2Fha_hyperswap_main.htm
IBM PowerHA SystemMirror HyperSwap with Metro Mirror
https://www.ibm.com/developerworks/aix/library/au-aix-hyper-swap/#!
Outlines the latest This IBM Redbooks publication for IBM Power Systems with IBM
PowerHA SystemMirror Standard and Enterprise Editions (hardware, INTERNATIONAL
PowerHA
software, practices, reference architectures, and tools) documents a TECHNICAL
enhancements
well-defined deployment model within an IBM Power Systems SUPPORT
environment. It guides you through a planned foundation for a dynamic ORGANIZATION
Describes clustering infrastructure for your enterprise applications.
with unicast
This information is for technical consultants, technical support staff, IT
communications architects, and IT specialists who are responsible for providing high
availability and support for the IBM PowerHA SystemMirror Standard
Includes migration and Enterprise Editions on IBM POWER systems. BUILDING TECHNICAL
scenarios INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.