0% found this document useful (0 votes)
267 views59 pages

Session Title:: IBM Power Systems Technical University

HA

Uploaded by

ulysses_ramos
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
267 views59 pages

Session Title:: IBM Power Systems Technical University

HA

Uploaded by

ulysses_ramos
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

IBM Power Systems Technical University

October 1822, 2010 Las Vegas, NV

Session Title:
Designing a PowerHA SystemMirror for AIX High Availability Solution

Session ID:

HA17(AIX)

Speaker Name: Michael Herrera

2010 IBM Corporation

Best Practices for Designing a PowerHA SystemMirror for AIX High Availability Solution
Michael Herrera ([email protected]) Advanced Technical Skills (ATS) Certified IT Specialist

+
Workload-Optimizing Systems

Agenda

Common Misconceptions & Mistakes Infrastructure Considerations Differences in 7.1 Virtualization & PowerHA SystemMirror Licensing Scenarios Cluster Management & Testing Summary

HACMP is now PowerHA SystemMirror for AIX!


HA & DR solutions from IBM for your mission-critical AIX applications

Current Release: 7.1.0.X


Available on: AIX 6.1 TL06 & 7.1

Packaging Changes:
Standard Edition - Local Availability Enterprise Edition - Local & Disaster Recovery

Licensing Changes:
Small, Medium, Large Server Class

Product Lifecycle: Version HACMP 5.4.1 PowerHA 5.5.0 PowerHA SystemMirror 6.1.0 PowerHA SystemMirror 7.1.0 Release Date Nov 6, 2007 Nov 14, 2008 Oct 20, 2009 Sept 10, 2010 End of Support Date Sept, 2011 N/A N/A N/A
4

* These dates are subject to change per Announcement Flash

PowerHA SystemMirror Minimum Requirements


PowerHA SystemMirror 7.1
AIX 7.1 AIX 6.1 TL 6 - SP1

7.1.0.1 Sep

PowerHA SystemMirror 6.1 AIX 7.1


AIX 6.1 TL 2 with RSCT 2.5.4.0 AIX 5.3 TL 9, with RSCT 2.4.12.0

6.1.0.2 May 21

PowerHA Version 5.5 AIX 7.1


AIX 6.1 TL2, SP1 and APAR IZ31208, RSCT 2.5.2.0 (Async GLVM - APARs IZ31205 and IZ31207) AIX 5L 5.3 TL 9 with RSCT version 2.4.10.0

5.5.0.6 June 7

HACMP 5.4.1 AIX 6.1 with RSCT version 2.5.0.0 or higher


AIX 5.3 TL4 with RSCT version 2.4.5 (IY84920) or higher AIX 5.2 TL8 with RSCT version 2.3.9 (IY84921) or higher

5.4.1.8 May 13

Common Misconceptions
PowerHA SystemMirror is an out of the box solution
Scripting & Testing of application Start / Stop scripts Application monitors will also require scripting & testing

PowerHA SystemMirror is installed we are completely protected


Consider all single points of failure ie. SAN, LAN, I/O drawers, etc..

Heartbeats go over a dedicated link


All interfaces defined to the cluster will pass heartbeats (IP & Non-IP) CAA definitely changes this behavior

With clustering I need two of everything hence idle resources

Fact: Clustering will highlight what you are & are NOT doing right in your environment
6

Common Mistakes Beyond Base Cluster Functionality


Down / Missing Serial Networks EtherChannel Links Down ERRORs in Verification Inconsistent AIX levels Down level cluster filesets Fallback Policy not set to desired behavior Missing Filesets Missing Custom Disk Methods SAN not built in a robust fashion Bootlist Issues Dump Devices
Insufficient size Mirrored Lack Secondary

Lack of Education / Experience Not knowing Expected fallover behaviors Lack of application monitoring Not knowing what to monitor or check
CLI Logs

Poor Change Controls Not propagating changes appropriately No change history

I/O Pacing Enabled (old values) HBA Levels at GA code Fiber Channel Tunable settings not enabled Interim Fixes not loaded on all cluster nodes

Solutions: IBM Training / Redbooks / Proof of Concepts / ATS Health-check Reviews

Identify & Eliminate Points of Failure

LAN Infrastructure
Redundant Switches

SAN Infrastructure
Redundant Fabric

Application Availability
Application Monitoring Availability Reports

Infrastructure Considerations
Site A
All links through one pipe

Site B

LAN

LAN

SAN
DWDM Node A 50GB 50GB
SITEAMETROVG

SAN
DWDM Node B

50GB 50GB 50GB 50GB

50GB 50GB

Important:
Identify & Eliminate Single Points of Failure! 9

Infrastructure Considerations
Site A
XD_rs232 XD_IP

Site B
WAN

net_ether_0

LAN

LAN

SAN
DWDM Node A
ECM VG: diskhb_vg1 hdisk2 000fe4111f25a1d1

SAN
DWDM Node B
ECM VG: diskhb_vg1 hdisk3 000fe4111f25a1d1

1GB

ECM VG: diskhb_vg2 hdisk3 000fe4112f998235

1GB

ECM VG: diskhb_vg2 hdisk4 000fe4112f998235

50GB 50GB
SITEAMETROVG

50GB 50GB 50GB 50GB

50GB 50GB

Important:
Identify Single Points of Failure & design the solution around them 10

Infrastructure Considerations
Power Redundancy
Real Customer Scenarios:

I/O Drawers SCSI Backplane SAN HBAs Virtualized Environments Application Fallover Protection

Ie 1. Two nodes sharing I/O drawer


1 2 3 4 5 I/O drawer I/O drawer I/O drawer I/O drawer I/O drawer 6 7 8 9 10

Ie 2. Application Failure with no monitoring box remains up : no cluster fallover

Moral of the Story: * High Availability goes beyond just installing the cluster software
11

PowerHA SystemMirror 7.1: Topology management


Heartbeating differences between earlier cluster releases

diskhb_net1

LPAR 1

diskhb_net2

LPAR 1

LPAR 4

RSCT Subnet Heartbeat Rings

LPAR 2

LPAR 4

LPAR 2

Multicasting
diskhb_net4

LPAR 3

diskhb_net3

LPAR 3

PowerHA SM 6.1 & Earlier


RSCT Based Heartbeating Leader, Successor, Mayor, etc.. Strict Subnet Rules No Heartbeating over HBAs Multiple Disk Heartbeat Networks Point to Point only Each Network requires LUN with ECM VG

PowerHA SM 7.1 with CAA


Kernel Based Cluster Message Handling Multi cast based protocol Use Network & SAN as needed Discover and use as many adapters as possible All monitors are implemented at low levels of the AIX Kernel & are largely insensitive to system load Single Repository Disk Used to heartbeat & store information

12

Transition of PowerHA Topology IP Networks


Network: Net_ether0 9.19.51.20 (service IP) 9.19.51.10 (persistent IP) 192.168.100.1 (base address) 9.19.51.21 (service IP) 192.168.101.1 (base address)

en0

en0

(persistent IP) 9.19.51.11 ( base address) 192.168.100.2 HB Rings In 6.1 & below

VLAN
en1 en1

( base address) 192.168.101.2

Traditional heartbeating rules no longer apply. However, route stripping is still a potential issue. When two interfaces have routable IPs on the same subnet AIX will send half the traffic out of either interface

Methods to circumvent this: Link Aggregation / EtherChannel Virtualized Interfaces with dual VIO servers
9.19.51.21 9.19.51.20 9.19.51.10 (service IP) (service IP) (base address)

en2

en2

( base address)

9.19.51.11

VLAN
ent0 ent1 ent0 ent1

13

PowerHA SM 7.1: Additional Heartbeating Differences


Heartbeating:
Self Tuning Failure Detection Rate (FDR) All interfaces are used even if not in cluster networks

en3

en3

9.19.51.21 9.19.51.20 9.19.51.10

(service IP) (service IP) (base address)

en2

en2

( base address)

9.19.51.11

VLAN
ent0 ent1 ent0 ent1

Serial Networks removed:


No more rs232 support No more traditional disk heartbeating over ECM VG No more slow takeover w/disk heartbeat device as last device on selective takeover

Critical Volume Groups


Replace Multi-node Disk Heartbeating Oracle RAC three disk volume group - Voting Files Unlike MNDHB, no more general use Migration is a manual operation, and customer responsibility Any Concurrent Access Volume Group can be marked as Critical

14

CAA Cluster Aware AIX


Enabling tighter Integration with PowerHA SystemMirror

What is it: A set of services/tools embedded in AIX to help manage a cluster of AIX nodes and/or help run cluster software on AIX IBM cluster products (including RSCT, PowerHA, and the VIOS) will use and/or call CAA services/tools CAA services can assist in the management and monitoring of an arbitrary set of nodes and/or running a third-party cluster CAA does not form a cluster by itself. It is a tool set. There is no notion of quorum (If 20 nodes of a 21 node cluster are down, CAA still runs on the remaining node) CAA does not eject nodes from a cluster. CAA provides tools to fence a node but never fences a node and will continue to run on a fenced node

Major Benefits: Enhanced Health Management (Integrated Health Monitoring) Cluster Wide Device Naming
15

Cluster Aware AIX Exploiters


DB2 IBM Director TSA HMC

RSCT Consumers
IBM Storage HPC PowerHA System Mirror VIOS

Legacy RSCT
Bundled Resource Managers Group Services Messaging API Cluster Messaging Resource Mgr Services Monitoring API Cluster Monitoring Cluster Admin UI Cluster CFG Repository

RSCT With Cluster Aware AIX


Bundled Resource Managers Group Services Messaging API Resource Mgr Services Monitoring API Cluster Admin UI

Cluster Layers Integrated Cluster to CAA Cluster CFG Redesigned Capabilities Messaging Monitoring Repository

Cluster Aware AIX Legacy AIX


Cluster Repository CAA APIs and UIs Cluster Monitoring Cluster Messaging Cluster Events

RSCT and Cluster Aware AIX together provide the foundation of strategic Power Systems SW RSCT-CAA integration enables compatibility with a diverse set of dependent IBM products RSCT integration with CAA extends simplified cluster management along with optimized and robust cluster monitoring, failure detection, and recovery to RSCT exploiters on Power / AIX 16

Cluster Aware AIX: Central Repository Disk


Contrast from previous releases Aids in: Global Device Naming Inter node synchronization Centrally managed configuration Heartbeating device
PowerHA SystemMirror 7.1 & CAA:

Host 1

Host 2

Host 3

Direction: In the first release, support is confined to shared storage Will eventually evolve into a general AIX device rename interface Future direction is to enable clusterwide storage policy settings PowerHA ODM will eventually also entirely move to the repository disk
PowerHA SystemMirror 6.1 & Prior:
Cluster Synchronization

HA ODM

HA ODM

HA ODM

Central Repository
PowerHA SystemMirror will continue to run if Central Repository Disk goes away However, no changes may take place within the cluster.

Host 1

Host 2

Host 3

17

Multi Channel Health Management Out of the Box


Hardened Environments with new communication protocol

Faster detection & more efficient communication

LPAR 1

LPAR 2

Heartbeats

Reliable Messaging

Heartbeats

Reliable Messaging

First Line of Defense

Network SAN

Second Line of Defense

Third Line of Defense

Repository Disk

Highlights: RSCT Topology services no longer used for cluster Heartbeating All customers now have multiple communication paths by default
18

Basic Cluster vs. Advanced Cluster Features


IP Network

Resource Group IP VGs App Server

Basic Cluster Network Topology Resource Group/s


IPs VGs Application Server

SAN Network

Application Monitoring Pager Events

Site A IP Network IP Network

Site B

Advanced Cluster Multiple Networks

Resource Group IP VGs App1

Resource Group IP VGs App2

Resource Group IP VGs Dev App

SAN Network


Disk Replication

Crossover Connections Virtualized Resources Multiple Resource Groups Mutual Takeover Custom Resource Groups Adaptive Fallover NFS Cross-Mounts File Collections Dependencies Parent / Child Location Start After
Stop After

Smart Assistants Multiple Sites Cross Site LVM Configs Storage Replication IP Replication Application Monitoring Pager Events DLPAR Integration
Grow LPAR on Fallover

Director Management WebSMIT Management Dynamic Node Priority 19

PowerHA SystemMirror: Fallover Possibilities


Cluster Scalable to 32 nodes

One to one

One to any

Any to one

Any to any

20

Methods to Circumvent Unused Resources


Resource Group A Node A, Node B Shared IP VG/s & filesystems App 1 Resource Group B Node A, Node B Shared IP VG/s & filesystems App 2 Resource Group C Node B, Node A Shared IP VG/s & filesystems App 3 Resource Group D Node B, Node A Shared IP VG/s & filesystems App 4

RG Dependency

Mutual Takeover

RG Dependency

Frame 1 Node A rootvg


NIC HBA

Virtualization
NIC HBA

Frame 2 Node B
hdisk1
SAN

oracle_vg1

{ {

hdisk2
SAN

VIO
NIC

VIO
NIC

hdisk4

VIO

VIO

hdisk4

} }

rootvg

Storage Subsystem

HBA

HBA

oracle_vg1

21

Power Virtualization & PowerHA SystemMirror

Power HA Cluster

Power HA_node 2 AIX Rootvg


en0 vfc0

LPAR / DLPAR Micropartitioning & Shared Processor Pools Virtual I/O Server
Virtual Ethernet Virtual SCSI Virtual Fiber

Power HA_node 1

LPAR X
AIX Rootvg
en0 vfc0

LPAR Y

LPAR Z

Data

Data

vfc1

vfc1

VIO1 A VIO2 A VIO1 B VIO2 B

LAN

SAN

Live Partition Mobility Active Memory Sharing WPAR (AIX 6.1)


22

External Storage Enclosure Rootvg volumes Data

PowerHA SystemMirror Virtualization Considerations


Ethernet Virtualization
Topology should look the same as environment using link aggregation Version 7.1 no longer uses netmon.cf file As a best practice dual VIO Servers are recommended
SEA Fallover Backend

Storage Virtualization
Both methods of virtualizing storage are supported
VSCSI vs. Virtual fiber (NPIV)

In DR implementations leveraging disk replication consider the implications of using either option

Benefits of virtualization:
Maximize utilization of resources Less PCI slots & physical adapters Foundation for advanced functions like Live Partition Mobility Migrations to newer Power Hardware are simplified

* Live Partition Mobility & PowerHA SM compliment each other Maintenance vs. High Availability
(non-reactive . reactive)

Chapter 2.4 PowerVM Virtualization Considerations

23

Virtual Ethernet & PowerHA SystemMirror


No Link Aggregation / Same Frame

Virtual I/O Server (VIOS1)


ent4 (SEA) en6
Control Channel

PowerHA LPAR 1
en0

PowerHA LPAR 2
en0

Virtual I/O Server (VIOS2)


en6
Control Channel

ent4 (SEA)

ent0 (phy)

ent2 (virt)

ent5 (virt)

ent6 (virt)

ent0 (virt)

ent0 (virt)

ent6 (virt)

ent5 (virt)

ent2 (virt)

ent0 (phy)

PVID 99
Hypervisor Frame 1

PVID 10

Ethernet Switch

Ethernet Switch

This is a diagram of the configuration required for SEA fallover across VIO Servers. Note that Ethernet traffic will not be load balanced across the VIO Servers. The lower trunk priority on the ent2 virtual adapter would designate the primary VIO Server to use.
24

Virtual Ethernet & PowerHA SystemMirror


Independent Frames & Link Aggregation
Virtual I/O Server (VIOS1)
ent3 (LA) ent4 (SEA)

PowerHA LPAR 1
en0

Virtual I/O Server (VIOS2)


ent4 (SEA) ent3 (LA)

Control Channel

Control Channel

Frame1

ent1 (phy)

ent0 (phy)

ent2 (virt)

ent5 (virt)

ent0 (virt)

ent5 (virt)

ent2 (virt)

ent1 (phy)

ent0 (phy)

Hypervisor

Ethernet Switch

Ethernet Switch

Hypervisor

Frame2

ent1 (phy)

ent0 (phy)

ent2 (virt)

ent5 (virt)
Control Channel

ent0 (virt)

ent5 (virt)
Control Channel

ent2 (virt)

ent1 (phy)

ent0 (phy)

ent3 (LA)

ent4 (SEA)

en0

ent4 (SEA)

ent3 (LA)

Virtual I/O Server (VIOS1)

PowerHA LPAR 2

Virtual I/O Server (VIOS2)

25

PowerHA SystemMirror 6.1 & Below


net_ether_0 9.19.51.20 (service IP) (service IP) 9.19.51.21

9.19.51.10 ( base address)

Topsvcs heartbeating

9.19.51.11 ( base address)

en0

serial_net_0

en0

PowerHA Node 1 FRAME 1

PowerHA Node 2 FRAME 2

Hypervisor

ent1 (phy)

ent0 (phy)

ent2 (virt)

ent5 (virt)
Control Channel

FRAME X
ent3 (LA) ent4 (SEA)

ent0 (virt)

ent5 (virt)
Control Channel

ent2 (virt)

ent1 (phy)

ent0 (phy)

en0

ent4 (SEA)

ent3 (LA)

Virtual I/O Server (VIOS1)

AIX Client LPAR

Virtual I/O Server (VIOS2)

* Netmon.cf file used for single adapter networks

26

PowerHA SystemMirror 7.1 - Topology


All nodes are monitored: Cluster Aware AIX tells you what nodes are in the cluster and information on those nodes including state. A special gossip protocol is used over the multicast address to determine node information and implement scalable reliable multicast. No traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces including storage.

Differences: RSCT Topology services is no longer used for heartbeat monitoring Subnet Requirements no longer need to be followed Netmon.cf file is no longer required or used All interfaces are used for monitoring even if they are not in an HA network (this may be tunable in a future release) IGMP Snooping must be enabled on the switches
27

VSCSI Mapping vs. NPIV (virtual fiber)


FRAME 1
VIOS 1 NPIV HBA
hdisk

Node 1 vscsi0
hdisk0

} rootvg

HBA HBA

vhost0

Hypervisor

MPIO
vscsi1

hdisk1 hdisk2

hdisk

vhost0 NPIV HBA

fcs0

MPIO
fcs1

hdisk3 hdisk4

LUNS VSCSI LUNS NPIV

VIOS 2

} }

vscsi_vg

npiv_vg

FRAME 2
STORAGE SUBSYSTEM

VIOS 1

NPIV HBA vscsi0 Hypervisor


hdisk

Node 2
hdisk0

} rootvg

HBA HBA

vhost0

MPIO
vscsi1 fcs0

hdisk1 hdisk2

} }

vscsi_vg

hdisk

vhost0 NPIV HBA

MPIO
fcs1

hdisk3 hdisk4

npiv_vg 28

VIOS 2

Live Partition Mobility Support with IBM PowerHA


How does it all work?

Frame 1

SAN Storage rootvg

Frame 2

Considerations: This is a planned move It assumes that all resources are virtualized through VIO (Storage & Ethernet connections) PowerHA should only experience a minor disruption to the heartbeats during a move IVE / HEA virtual Ethernet is not supported for LPM VSCSI & NPIV virtual fiber mappings are supported

VIOS 1 VIOS 2

VIOS 1 VIOS 2

rootvg

PowerHA Node 1 PowerHA Node 2

datavg

PowerHA Node 2 PowerHA Node 1

The two solutions compliment each other by providing the ability to perform non-disruptive maintenance while retaining the ability to fallover in the event of a system or application outage 29

PowerHA and LPM Feature Comparison


PowerHA SystemMirror Live OS/App move between physical frames* Server Workload Management** Energy Management** Hardware Maintenance Software Maintenance Automated failover upon System Failure (OS or HW) Automated failover upon HW failure Automated failover upon App failure Automated failover upon vg access loss Automated failover upon any specified AIX error (via customized error notification of error report entry) Live Partition Mobility

*~ 2 seconds of total interruption time ** Require free system resources on target system

30

PowerHA SystemMirror: DLPAR Value Proposition


Pros: Automated action on acquisition of resources (bound to the PowerHA application server) HMC Verification Checking for connectivity to the HMC Ability to Grow LPAR on Failover Save $ on PowerHA SM Licensing
Thin Standby node

Cons: Requires Connectivity to HMC Potentially Slower Failover


System Specs:
32-way (2.3GHz) Squad-H+ 256GB of memory

Results:
120GB DLPAR add took 1min 55 sec 246GB DLPAR add took 4 min 25 sec 30% busy running artificial load the add took 4 minutes 36 seconds

Lacks ability to grow LPAR on-fly

ssh communication

ssh communication

LPAR A HMC
DLPAR CPU Count Minimal CPU Count Application Server

LPAR B HMC
Minimal CPU Count

Backup

31

DLPAR Licensing Scenario


How does it all work?

System A
Acquired via DLPAR with App + 1 CPU + 2 CPU Oracle DB 1 CPU Banner DB 1 CPU Standby Standby 1 CPU 1 CPU Cluster 1 Cluster 2 Cluster 3 Cluster 4

System B
Standby Standby 1 CPU 1 CPU + 1 CPU Acquired via DLPAR + 2 CPU with App

PeopleSoft 1 CPU Financial DB 1 CPU TSM Capacity 2 CPU 10 CPU

Print Server 2 CPU Capacity 10 CPU

Power7 740 16 Way

Power7 740 16 Way

Applications
Production Oracle DB Production PeopleSoft AIX Print Server Banner Financial DB Production Financial DB Tivoli Storage Manager 5.5.2.0

CPU
2 2 2 3 3 2

Memory
16 GB 8 GB 4 GB 32 GB 32 GB 8 GB

32

Environment: PowerHA App Server Definitions


Application Server Min 1 Desired 2

System A
Oracle DB 1 CPU Banner DB 1 CPU Standby Standby 1 CPU 1 CPU Cluster 1 Cluster 2 Cluster 3 Cluster 4

System B
Standby Standby 1 CPU 1 CPU

Application Server Min 1 Desired 3

PeopleSoft 1 CPU Financial DB 1 CPU

The actual application requirements are stored in the PowerHA SystemMirror definitions and enforced during the acquisition or release of application server resources

HMC

System A
Acquired via DLPAR with App + 1 CPU + 2 CPU Oracle DB 1 CPU Banner DB 1 CPU Standby Standby 1 CPU 1 CPU Cluster 1 Cluster 2 Cluster 3 Cluster 4

System B
Standby Standby PeopleSoft 1 CPU 1 CPU 1 CPU + 1 CPU

During acquisition of resources in the cluster start up the host will ssh to the pre-defined HMC/s to perform the DLPAR operation automatically

Financial DB 1 CPU

Acquired via DLPAR + 2 CPU with App

33

Environment: DLPAR Resource Processing Flow


1. Activate LPARs
LPAR Profile Min 1 Desired 1 Max 2 LPAR Profile Min 1 Desired 1 Max 3

2. Start PowerHA
Application Server Min 1 Desired 2 Max 2 Application Server Min 1 Desired 3 Max 3

Read Requirements
Application Server Min 1 Desired 2 Max 2 Application Server Min 1 Desired 3 Max 3

Activate LPARs
LPAR Profile Min 1 Desired 1 Max 2 LPAR Profile Min 1 Desired 1 Max 3

HMC

DLPAR

DLPAR

System A
- 1 CPU - 2 CPU 3. Release resources Fallover or RG_move + 1 CPU + 2 CPU 1 CPU Oracle DB 2 1 CPU Banner DB 3 Cluster 1 Cluster 2

System B
Standby Oracle DB 12CPU CPU Standby CPU Banner DB 1 3 CPU + 1 CPU + 2 CPU - 1 CPU - 2 CPU 4. Release resources Stop cluster without takeover

Take Aways: CPU allocations follow the application server wherever it is being hosted (this model allows you to lower the HA license count) DLPAR resources will only get processed during the acquisition or release of cluster resources PowerHA 6.1+ allows provide micro-partitioning support and the ability to also alter virtual processor counts DLPAR resources can come from free CPUs in shared processor pool or CoD resources

34

PowerHA SystemMirror: DLPAR Value Proposition


Environment using dedicated CPU model (No DLPAR)
System A
Oracle DB 2 CPU Banner DB 3 CPU Standby Standby 2 CPU 3 CPU Cluster 1 Cluster 2 Cluster 3 Cluster 4

System B
Standby Standby 2 CPU 3 CPU

PeopleSoft 2 CPU Financial DB 3 CPU

PowerHA license counts: Cluster 1 : 4 CPUs Cluster 2 : 6 CPUs Cluster 3 : 4 CPUs Cluster 4 : 6 CPUs Total : 20 licenses

HMC

Environment using DLPAR model


System A
Acquired via DLPAR with App + 1 CPU + 2 CPU Oracle DB 1 CPU Banner DB 1 CPU Standby Standby 1 CPU 1 CPU Cluster 1 Cluster 2 Cluster 3 Cluster 4

System B
Standby Standby PeopleSoft 1 CPU 1 CPU 1 CPU

PowerHA license counts: Cluster 1 : 3 CPUs Cluster 2 : 4 CPUs Cluster 3 : 3 CPUs Cluster 4 : 4 CPUs Total : 14 licenses

+ 1 CPU

Financial DB 1 CPU

Acquired via DLPAR + 2 CPU with App

35

PowerHA SystemMirror: DLPAR Modified Model


HMC

Environment using DLPAR model


* Same as previous slide System A System B
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Standby Standby PeopleSoft 1 CPU 1 CPU 1 CPU

Acquired via DLPAR with App

+ 1 CPU + 2 CPU

Oracle DB 1 CPU Banner DB 1 CPU Standby Standby 1 CPU 1 CPU

PowerHA license counts: Cluster 1 : 3 CPUs Cluster 2 : 4 CPUs Cluster 3 : 3 CPUs Cluster 4 : 4 CPUs Total : 14 licenses

+ 1 CPU

Financial DB 1 CPU

Acquired via DLPAR + 2 CPU with App

HMC

Environment using modified DLPAR model


System A
Acquired via DLPAR with App Oracle DB + 4 CPU Banner DB 1 CPU Cluster2 Standby 1 CPU Financial DB 1 CPU PeopleSoft Acquired + 4 CPU via DLPAR with App Cluster 1 Standby 1 CPU

System B

PowerHA license counts: Cluster 1 : 6 CPUs Cluster 2 : 6 CPUs Total : 12 licenses

36
* Consolidated both Prod LPARs into one LPAR. Control separated by Resource Groups

Data Protection with PowerHA SM 7.1 & CAA


Enhanced Concurrent Mode Volume Groups are now required ECM VGs were introduced in version 5.1 Fast Disk Takeover Fast Failure Detection Disk heartbeating Disk Fencing in CAA Fencing is automatic and transparent Cannot be turned off Fence group created by cl_vg_fence_init called from node_up CAA Storage Framework fencing support Ability to specify level of disk access allowed by device driver
Read/Write Read Only No Access (I/O is held until timeout) Fast Failure

37

Data Protection with PowerHA SM 7.1 & CAA


ECM Volume groups and the newly added protection LVM Enhanced Concurrent Mode VGs (Passive Mode) Prevent writes to logical volume or volume group devices Prevent filesystems from being mounted or any change requiring access to it CAA Fencing prevents writes to the disk itself (ie. dd which runs below LVM level)

Node A /data/app /data/db /data ACTIVE


read/write
read/write CAA read only CAA

Node B

Datavg
ECM VG

Cluster Services

Cluster Services

No Access Fail all I/Os

In the event of a failure on node B

PASSIVE
read only

Shared LUNs 38

Management Console: WebSMIT vs. IBM Director


CLI & SMIT sysmirror panels still the most common management interfaces WebSMIT Available since HACMP 5.2 Required web server to run on host until HACMP 5.5 (Gateway server) Did not fall in line with look and feel of other IBM offerings

IBM Systems Director Plug-in New for PowerHA SystemMirror 7.1 Only for management of 7.1 & above Same look and feel as IBM suite of products Will leverage existing Director implementation Uses clvt & clmgr CLI behind the covers

39

WebSMIT Gateway Model: One-to-Many (6.1 & Below)


WebSMIT converted from a one-to-one architecture to one-to-many

User_1

User_2

User_3

User_4

Multiple WebSMIT users accessing multiple clusters through *one* WebSMIT server

Standalone WebSMIT Server


Cluster_B

*One* WebSMIT server managing multiple clusters

Cluster_A

Cluster_C

40

WebSMIT Screenshot: Associations Tab

41

PowerHA SystemMirror Cluster Management


New GUI User Interface for Version 7.1 Clusters

Three tier architecture provides scalability: User Interface Management Server Director Agent

User Interface Web-based interface Command-line interface

Director Agent
Automatically installed on AIX 7.1 & AIX V6.1 TL06

AIX PowerHA Director Agent

P P P P

D D D

Secure communication D
Director Server

P P P

D D D Discovery of clusters and resources

Central point of control Supported on AIX, Linux, and Windows Agent manager

42

PowerHA SystemMirror Director Integration


Accessing the SystemMirror Plug-ins

43

IBM Systems Director: Monitoring Status of Clusters


Accessing the SystemMirror Plug-ins

44

PowerHA SystemMirror Configuration Wizards


Wizards

45

PowerHA SystemMirror Smart Assistant Enhancements


Deploy HA Policy for Popular Middleware

46

PowerHA SystemMirror Detailed Views


SystemMirror Management View

47

IBM Director: Management Dashboard

48

Do you know about clvt & clmgr ?


clmgr announced in PowerHA SM 7.1
clvt available since HACMP 5.4.1 for Smart Assists Hard linked clmgr to clvt Originally clmgr was intended for the Director team & rapidly evolved into a major, unintended, informal line item. allows for deviation from clvt without breaking the Smart Assists

From this release forward, only clmgr is supported for customer use
clvt is strictly for use by the Smart Assists

New Command Line Infrastructure


Ease of Management Stop Start Move Resources Start Cluster Services on all nodes Verify & Sync Cluster Move a resource group 49

# clmgr on cluster # clmgr sync cluster # clmgr rg appAgroup node=node2

Do you know about clcmd in CAA ?


Allows commands to be run across all cluster nodes
# lslpp -w /usr/sbin/clcmd /usr/sbin/clcmd

bos.cluster.rte

# clcmd lssrc -g caa ------------------------------NODE mutiny.dfw.ibm.com ------------------------------Subsystem Group PID clcomd caa 9502848 cld caa 10551448 clconfd caa 10092716 solid caa 7143642 solidhac caa 7340248 ------------------------------NODE munited.dfw.ibm.com ------------------------------Subsystem Group PID cld caa 4390916 clcomd caa 4587668 clconfd caa 6357196 solidhac caa 6094862 solid caa 6553698

Status active active active active active

Status active active active active active

# clcmd lspv ------------------------------NODE mutiny.dfw.ibm.com ------------------------------hdisk0 0004a99c161a7e45 caa_private0 0004a99cd90dba78 hdisk2 0004a99c3b06bf99 hdisk3 0004a99c3b076c86 hdisk4 0004a99c3b076ce3 hdisk5 0004a99c3b076d2d ------------------------------NODE munited.dfw.ibm.com ------------------------------hdisk0 0004a99c15ecf25d caa_private0 0004a99cd90dba78 hdisk2 0004a99c3b06bf99 hdisk3 0004a99c3b076c86 hdisk4 0004a99c3b076ce3 hdisk5 0004a99c3b076d2d

rootvg active caavg_private active None None None None

rootvg active caavg_private active None None None None

50

PowerHA SystemMirror: Sample Application Monitor

# cat /usr/local/hascripts/ora_monitor.sh #!/bin/ksh ps ef | grep ora_pmon_hatest 51

PowerHA SystemMirror: Pager Events


HACMPpager: methodname = "Herrera_notify" desc = Lab Systems Pager Event" nodename = "connor kaitlyn" dialnum = "[email protected]" filename = "/usr/es/sbin/cluster/samples/pager/sample.txt" eventname = "acquire_takeover_addr config_too_long event_error node_down_complete node_up_complete" retrycnt = 3 timeout = 45 # cat /usr/es/sbin/cluster/samples/pager/sample.txt Node %n: Event %e occurred at %d, object = %o

Action Taken: Halted Node Connor


Sample Email: From: root 09/01/2009 Subject: HACMP Node kaitlyn: Event acquire_takeover_addr occurred at Tue Sep 1 16:29:36 2009, object =

Attention: Sendmail must be working and accessible via the firewall to receive notifications
52

PowerHA SystemMirror Tunables


AIX I/O Pacing (High & Low Watermark)
Typically only enable if recommended after performance evaluation Historical values 33 & 24 have been updated to 513 & 256 on AIX 5.3 and 8193 & 4096 on AIX 6.1
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/disk_io_pacing.htm

Syncd Setting
Default value of 60 recommended change to 10

Failure Detection Rate (FDR) only for Version 6.1 & below
Normal Settings should suffice in most environments (note that it can be tuned further) Remember to enable FFD when using disk heartbeating

Pre & Post Custom EVENTs


Entry points for notifications or actions required before phases in a takeover

53

PowerHA SystemMirror: Testing Best Practices


Test Application scripts and Application monitors thoroughly
Common problems include edits to scripts within scripts

Test fallovers in all directions


Will confirm start & stop scripts on both locations

Test Cluster
Lpars within same frame Virtual resources

Utilize Available Tools Cluster Test Tool Testing upgrades Alternate disk install is your friend

Best Practice: Testing should be the foundation for your documentation in the event that someone not PowerHA savvy is there when a failure occurs.
54

How to be successful with PowerHA SystemMirror


Strict Change Controls
Available test environment Testing of changes

Leverage CSPOC functions


Create / Remove / Change - VGs, LVs, Filesystems User Administration

Know what to look for


cluster.log / hacmp.out / clstrmgr.debug log files /var/hacmp/log/clutils.log Summary of nightly verification /var/hacmp/clverify/clverify.log detailed verification output

munited /# cltopinfo -m Interface Name Adapter Total Missed Current Missed Address Heartbeats Heartbeats -------------------------------------------------------------------------------------------------------------------en0 192.168.1.103 0 0 rhdisk1 255.255.10.0 1 1 Cluster Services Uptime: 30 days 0 hours 31 minutes

55

Summary
Review your infrastructure for potential single points of failure
Be aware of the potential pitfalls listed in the common mistakes slide

Leverage Features like:


File Collections Application monitoring Pager Notification Events

Keep up with feature changes in each release


New dependencies & fallover behaviors

Virtualizing P7 or P6 environments is the foundation for Live Partition Mobility


NPIV capable adapters can help simplify the configuration & management

WebSMIT & IBM Director are the available GUI front-ends


The cluster release will determine which one to use
56

Learn More About PowerHA SystemMirror


PowerHA SystemMirror IBM Portal

Popular Topics: * Frequently Asked Questions * Customer References * Documentation * White Papers

http://www-03.ibm.com/systems/power/software/availability/aix/index.html
( or Google PowerHA SystemMirror and click Im Feeling Lucky)

57

Questions?

Thank you for your time!


58

Additional Resources
New - Disaster Recovery Redbook
SG24-7841 - Exploiting PowerHA SystemMirror Enterprise Edition for AIX
http://www.redbooks.ibm.com/abstracts/sg247841.html?Open

New - RedGuide: High Availability and Disaster Recovery Planning: Next-Generation


Solutions for Multi server IBM Power Systems Environments
http://www.redbooks.ibm.com/abstracts/redp4669.html?Open

Online Documentation
http://www-03.ibm.com/systems/p/library/hacmp_docs.html

PowerHA SystemMirror Marketing Page


http://www-03.ibm.com/systems/p/ha/

PowerHA SystemMirror Wiki Page


http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/High+Availability

PowerHA SystemMirror (HACMP) Redbooks


http://www.redbooks.ibm.com/cgi-bin/searchsite.cgi?query=hacmp

59

You might also like