HA Two Node GPFS Cluster With Tie-Breaker Disk - Sysadmin Continuous Improvement
HA Two Node GPFS Cluster With Tie-Breaker Disk - Sysadmin Continuous Improvement
The physical server architecture, showed in the following figure, remains the same:
– two Centos server
– two shared disks between the servers
(https://sysadminci.files.wordpress.com/2015/05/gpfs01.jpg)
The command mmlscluster output shows that only the first gpfs node has assigned the role of manager
and quorum node. In order to enable high-availability both the servers must have these two roles.
1 [root@gpfs01 ~]# /usr/lpp/mmfs/bin/mmlscluster
2
3 GPFS cluster information
4 ========================
5 GPFS cluster name: gpfs01
6 GPFS cluster id: 14526312809412325839
7 GPFS UID domain: gpfs01
8 Remote shell command: /usr/bin/ssh
9 Remote file copy command: /usr/bin/scp
10 Repository type: CCR
11
12 Node Daemon node name IP address Admin node name Designation
13
14 1 gpfs01 172.17.0.101 gpfs01 quorummanager
15 2 gpfs02 172.17.0.102 gpfs02
The filesystem fs_gpfs01 is composed by two network shared disk. In this post I’ll show how configure
thee two disks as tie-breaker disks in order to enable the high-availability.
1 [root@gpfs01 ~]# /usr/lpp/mmfs/bin/mmlsnsd a
2
3 File system Disk name NSD servers
4
5 fs_gpfs01 mynsd1 (directly attached)
6 fs_gpfs01 mynsd2 (directly attached)
Indeed as many other cluster softwares GPFS requires that the majority of quorum nodes are online to
use the filesystem in order to avoid split brain.
In this case the cluster is composed by an even number of cluster nodes so one or more tie-breaker disk
must be defined.
More details about gpfs reliability configuration can be found in this document http://www-
03.ibm.com/systems/resources/configure-gpfs-for-reliability.pdf (http://www-
03.ibm.com/systems/resources/configure-gpfs-for-reliability.pdf) .
As described before I assign the manager and quorum role to node gpfs02 and I verify it using the
command mmlscluster.
1 [root@gpfs01 ~]# mmchnode manager N gpfs02
2 Thu May 7 22:11:20 CEST 2015: mmchnode: Processing node gpfs02
3 mmchnode: Propagating the cluster configuration data to all
4 affected nodes. This is an asynchronous process.
5
6 [root@gpfs01 ~]# mmchnode quorum N gpfs02
7 Thu May 7 22:11:20 CEST 2015: mmchnode: Processing node gpfs02
8 mmchnode: Propagating the cluster configuration data to all
9 affected nodes. This is an asynchronous process.
10 [root@gpfs01 ~]# /usr/lpp/mmfs/bin/mmlscluster
11
12 GPFS cluster information
13 ========================
14 GPFS cluster name: gpfs01
15 GPFS cluster id: 14526312809412325839
16 GPFS UID domain: gpfs01
17 Remote shell command: /usr/bin/ssh
18 Remote file copy command: /usr/bin/scp
19 Repository type: CCR
20
21 Node Daemon node name IP address Admin node name Designation
22
23 1 gpfs01 172.17.0.101 gpfs01 quorummanager
24 2 gpfs02 172.17.0.102 gpfs02 quorummanager
I configure both NSD as tie-breaker disks and I verify it using the command mmlsconfig
1 [root@gpfs01 ~]# mmchconfig tiebreakerDisks="mynsd1;mynsd2"
2
3 [root@gpfs01 ~]# mmlsconfig
4 Configuration data for cluster gpfs01:
5
6 clusterName gpfs01
7 clusterId 14526312809412325839
8 autoload no
9 dmapiFileHandleSize 32
10 minReleaseLevel 4.1.0.4
11 ccrEnabled yes
12 tiebreakerDisks mynsd1;mynsd2
13 adminMode central
14
15 File systems in cluster gpfs01:
16
17 /dev/fs_gpfs01
Now the GPFS HA configuration is completed. I can shutdown one node and verify that the other node
can write and read the GPFS filesystem.
1 [root@gpfs01 ~]# mmmount /fs_gpfs01 a
2 Thu May 7 22:22:42 CEST 2015: mmmount: Mounting file systems ...
3
4 [root@gpfs02 ~]# ssh gpfs01 shutdown h now
5 [root@gpfs02 ~]# cd /fs_gpfs01/
6 [root@gpfs02 fs_gpfs01]# ls latr
7 drxrxrx 2 root root 8192 Jan 1 1970 .snapshots
8 [root@gpfs02 fs_gpfs01]# ls latr
9 total 1285
10 drxrxrx 2 root root 8192 Jan 1 1970 .snapshots
11 drwxrxrx 2 root root 262144 May 7 21:49 .
12 rwrr 1 root root 1048576 May 7 21:50 test1M
13 drxrxrx. 24 root root 4096 May 7 21:55 ..
Furthermore the log in /var/log/messages provides more details about this event. The log
below,grabbed on node gpfs02 when I shutdown the node gpfs01, shows that the node gpfs02 detected
the failure of the node gpfs01 and it has been elected cluster manager.
1 # /var/log/messages
2 ...
3 May 7 22:25:29 gpfs02 mmfs: [E] CCR: failed to connect to node 172.17.0.
4 May 7 22:25:39 gpfs02 mmfs: [E] CCR: failed to connect to node 172.17.0.
5 May 7 22:25:39 gpfs02 mmfs: [E] Node 172.17.0.101 (gpfs01) is being expe
6 May 7 22:25:39 gpfs02 mmfs: [N] This node (172.17.0.102 (gpfs02)) is now
Advertisements
-50% -50%
$199.50 $199.50
REPORT THIS AD
-50% -50%
$399.00 $399.00
REPORT THIS AD