0% found this document useful (0 votes)
285 views5 pages

HA Two Node GPFS Cluster With Tie-Breaker Disk - Sysadmin Continuous Improvement

This document discusses configuring a high availability (HA) two node GPFS cluster with tie-breaker disks. It describes assigning the manager and quorum roles to both nodes, configuring two shared disks as tie-breaker disks, and verifying the cluster can still write and read when one node is shut down. Log files on the active node show it detected the failure and was elected cluster manager.

Uploaded by

daniel_vp21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
285 views5 pages

HA Two Node GPFS Cluster With Tie-Breaker Disk - Sysadmin Continuous Improvement

This document discusses configuring a high availability (HA) two node GPFS cluster with tie-breaker disks. It describes assigning the manager and quorum roles to both nodes, configuring two shared disks as tie-breaker disks, and verifying the cluster can still write and read when one node is shut down. Log files on the active node show it detected the failure and was elected cluster manager.

Uploaded by

daniel_vp21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Sysadmin continuous improvement

Useful tips and tools for system administrator

HA two node GPFS cluster with tie-breaker disk


May 23, 2015February 29, 2016 giovannibattistasciortino cluster, linux
In a previous post (https://sysadminci.wordpress.com/2015/05/09/install-and-configure-gpfs-4-1-
filesystem-on-linux-centos-6-6/) I described how configure a GPFS cluster filesystem ( a filesystem that
can be mounted by two or more servers simultaneously ).
This article describes the changes required to enable a high-availability configuration for a GPFS cluster
filesystem. This configuration allows each node to write and read the filesystem when the other node is
down.

The physical server architecture, showed in the following figure, remains the same:
– two Centos server
– two shared disks between the servers

(https://sysadminci.files.wordpress.com/2015/05/gpfs01.jpg)

The command mmlscluster output shows that only the first gpfs node has assigned the role of manager
and quorum node. In order to enable high-availability both the servers must have these two roles.
1 [root@gpfs01 ~]# /usr/lpp/mmfs/bin/mmlscluster
2  
3 GPFS cluster information
4 ========================
5   GPFS cluster name:         gpfs01
6   GPFS cluster id:           14526312809412325839
7   GPFS UID domain:           gpfs01
8   Remote shell command:      /usr/bin/ssh
9   Remote file copy command:  /usr/bin/scp
10   Repository type:           CCR
11  
12  Node  Daemon node name  IP address    Admin node name  Designation
13 ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
14    1   gpfs01            172.17.0.101  gpfs01           quorum­manager
15    2   gpfs02            172.17.0.102  gpfs02

The filesystem fs_gpfs01 is composed by two network shared disk. In this post I’ll show how configure
thee two disks as tie-breaker disks in order to enable the high-availability.

1 [root@gpfs01 ~]# /usr/lpp/mmfs/bin/mmlsnsd ­a
2  
3 File system Disk name NSD servers
4 ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
5 fs_gpfs01 mynsd1 (directly attached)
6 fs_gpfs01 mynsd2 (directly attached)

Indeed as many other cluster softwares GPFS requires that the majority of quorum nodes are online to
use the filesystem in order to avoid split brain.
In this case the cluster is composed by an even number of cluster nodes so one or more tie-breaker disk
must be defined.
More details about gpfs reliability configuration can be found in this document http://www-
03.ibm.com/systems/resources/configure-gpfs-for-reliability.pdf (http://www-
03.ibm.com/systems/resources/configure-gpfs-for-reliability.pdf) .

As described before I assign the manager and quorum role to node gpfs02 and I verify it using the
command mmlscluster.
1 [root@gpfs01 ~]# mmchnode ­­manager ­N gpfs02
2 Thu May 7 22:11:20 CEST 2015: mmchnode: Processing node gpfs02
3 mmchnode: Propagating the cluster configuration data to all
4 affected nodes. This is an asynchronous process.
5  
6 [root@gpfs01 ~]# mmchnode ­­quorum ­N gpfs02
7 Thu May 7 22:11:20 CEST 2015: mmchnode: Processing node gpfs02
8 mmchnode: Propagating the cluster configuration data to all
9 affected nodes. This is an asynchronous process.
10 [root@gpfs01 ~]# /usr/lpp/mmfs/bin/mmlscluster
11  
12 GPFS cluster information
13 ========================
14 GPFS cluster name: gpfs01
15 GPFS cluster id: 14526312809412325839
16 GPFS UID domain: gpfs01
17 Remote shell command: /usr/bin/ssh
18 Remote file copy command: /usr/bin/scp
19 Repository type: CCR
20  
21 Node Daemon node name IP address Admin node name Designation
22 ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
23 1 gpfs01 172.17.0.101 gpfs01 quorum­manager
24 2 gpfs02 172.17.0.102 gpfs02 quorum­manager

I configure both NSD as tie-breaker disks and I verify it using the command mmlsconfig

1 [root@gpfs01 ~]# mmchconfig tiebreakerDisks="mynsd1;mynsd2"
2  
3 [root@gpfs01 ~]# mmlsconfig
4 Configuration data for cluster gpfs01:
5 ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
6 clusterName gpfs01
7 clusterId 14526312809412325839
8 autoload no
9 dmapiFileHandleSize 32
10 minReleaseLevel 4.1.0.4
11 ccrEnabled yes
12 tiebreakerDisks mynsd1;mynsd2
13 adminMode central
14  
15 File systems in cluster gpfs01:
16 ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
17 /dev/fs_gpfs01

Now the GPFS HA configuration is completed. I can shutdown one node and verify that the other node
can write and read the GPFS filesystem.
1 [root@gpfs01 ~]# mmmount /fs_gpfs01 ­a
2 Thu May 7 22:22:42 CEST 2015: mmmount: Mounting file systems ...
3  
4 [root@gpfs02 ~]# ssh gpfs01 shutdown ­h now
5 [root@gpfs02 ~]# cd /fs_gpfs01/
6 [root@gpfs02 fs_gpfs01]# ls ­latr
7 dr­xr­xr­x 2 root root 8192 Jan 1 1970 .snapshots
8 [root@gpfs02 fs_gpfs01]# ls ­latr
9 total 1285
10 dr­xr­xr­x 2 root root 8192 Jan 1 1970 .snapshots
11 drwxr­xr­x 2 root root 262144 May 7 21:49 .
12 ­rw­r­­r­­ 1 root root 1048576 May 7 21:50 test1M
13 dr­xr­xr­x. 24 root root 4096 May 7 21:55 ..

Furthermore the log in /var/log/messages provides more details about this event. The log
below,grabbed on node gpfs02 when I shutdown the node gpfs01, shows that the node gpfs02 detected
the failure of the node gpfs01 and it has been elected cluster manager.

1 # /var/log/messages
2 ...
3 May 7 22:25:29 gpfs02 mmfs: [E] CCR: failed to connect to node 172.17.0.
4 May 7 22:25:39 gpfs02 mmfs: [E] CCR: failed to connect to node 172.17.0.
5 May 7 22:25:39 gpfs02 mmfs: [E] Node 172.17.0.101 (gpfs01) is being expe
6 May 7 22:25:39 gpfs02 mmfs: [N] This node (172.17.0.102 (gpfs02)) is now
Advertisements
-50% -50%

$199.50 $199.50

REPORT THIS AD
-50% -50%

$399.00 $399.00

REPORT THIS AD

Create a free website or blog at WordPress.com. (https://wordpress.com/?ref=footer_website)

You might also like