0% found this document useful (0 votes)
195 views

Cluster Admin Guide

Uploaded by

Vikash Bora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views

Cluster Admin Guide

Uploaded by

Vikash Bora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

`

Amogh Cluster Admin Guide


For Aeronautical Development Agency - ADA

Dell Technologies Confidential 1

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Table of Contents

1. System architecture and configuration .................................................................................. 4


1.1 System Configuration .................................................................................................... 5
1.1.1 Compute nodes ...................................................................................................... 6
1.1.2 High Memory Compute nodes ................................................................................ 6
1.1.3 Processor architecture ........................................................................................... 6
1.1.4 Operating System................................................................................................... 6
1.1.5 Network infrastructure ............................................................................................ 6
2. IP Address and Partition details: ........................................................................................... 8
2.1 Master Node.................................................................................................................. 8
2.2 Login Nodes .................................................................................................................. 9
2.3 Compute Nodes .......................................................................................................... 10
2.4 Mellanox IB and GigE Switch IP .................................................................................. 11
2.5 Software details: .......................................................................................................... 11
3. Cluster Manager – (Ganana Cluster Toolkit) ...................................................................... 12
3.1 High Availability ........................................................................................................... 19
4. InfiniBand Switch: (SB7800 Series – Switch-IB™ 2 EDR 100Gb/s InfiniBand Smart
Switches) ................................................................................................................................... 28
5. GigE switch: (DellEMC Networking X-1052) ....................................................................... 34
6. Intel Parallel Studio XE 2018/2019 ..................................................................................... 36
7. Start up and Shutdown procedure ...................................................................................... 40
7.1 Start-up Procedure ...................................................................................................... 40
7.2 Shutdown Procedure ................................................................................................... 41

Dell Technologies Confidential 2


Amogh Cluster Admin Guide

Introduction

The document summarizes the HPC solution including architectural diagram, configuration, and
managing of AMOGH HPC Cluster implemented at ADA Bangalore.
The supercomputer AMOGH is based on a cluster configuration of Dell server models from the
Dell India Private Ltd, and the cluster is a combination of various models for Master nodes and
Login nodes PowerEdge R640 , for Compute nodes Dell PowerEdge C6420 with Intel Xeon Gold
6138 2.0G processors. The system is implemented by Locuz Enterprise Solution Ltd. together
with the partner companies Dell, Locuz and DDN.

Dell Technologies Confidential 3

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

1. System architecture and configuration

Total setup comprising of 256 nodes having Dell hardware PowerEdge C6420, Dell PowerEdge
R640 models, which contains master and login nodes, high memory compute, normal memory
compute node. The setup consists two Login nodes which will be used to login and submit jobs
for users. Two master nodes are configured in high availability active/passive mode. DDN Lustre
parallel file system has been configured with DDN Storage and on 6 servers. Networking consist
of 7 Dell GigE switches for OS communication and hardware management. 16 Mellanox EDR
InfiniBand switch is configured for MPI job communication and to provide Lustre file system on all
the compute nodes.

Dell Technologies Confidential 4

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

1.1 System Configuration


AMOGH production systems based on the Dell Systems with a total cpu peak performance of 655 TFlops.
The cluster consists of 246 compute nodes and 10 high memory nodes connected via high speed InfiniBand
EDR 100 Gbps network and utilizes a lustre storage around at 10 GBps, systems deployed by DDN.

The compute nodes differ in their architecture below is the list of computes according to their architecture.

Dell Technologies Confidential 5

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

1.1.1 Compute nodes


 246 nodes
 9840 cores
 2 x Intel Xeon Gold 6138, 20-core, 2.0 GHz processors per node
 192 GB of physical memory per node
 Total 47.232TB RAM

1.1.2 High Memory Compute nodes


 10 nodes
 400 cores
 2 x Intel Xeon Gold 6138, 20-core, 2.0 GHz processors per node
 384 GB of physical memory per node
 Total 3.84 TB RAM

1.1.3 Processor architecture


All computes are containing Intel Xeon Gold 6138 processor. The processor architecture is as below:
 20 cores in each processor
 Speed: 2.0 GHz, up to 3.7 GHz using Turbo Boost Technology
 Cache: 27.5 MB L3 per processor

1.1.4 Operating System


The operating system on AMOGH is Linux –
Red Hat Enterprise Linux Server release 7.4 (Maipo)
Kernel version: 3.10.0-693.17.1.el7

1.1.5 Network infrastructure


AMOGH environment are interconnected by Ethernet and InfiniBand.

 Gigabit Ethernet ( Dell EMC Networking X1052)

Total 7 Gigabit Ethernet is used for OS communication and hardware management. For Gigabit Ethernet,
no additional modules or libraries are needed. All Ethernet switches inter connected to 1G Ethernet
switches.

 InfiniBand (SB7800 Series – Switch-IB 2 EDR 100Gbps Infiniband Smart Switches)

Total 16 Infiniband switch are configured as 100% non-blocking 2:1 Fat tree topology. EDR InfiniBand is a
high-performance switched fabric chassis, which is characterized by its high throughput and low latency.

Dell Technologies Confidential 6

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Logical Connectivity

 4 Spine Switch
 12 Leaf Switch
 2:1 Fat-Tree Topology

Dell Technologies Confidential 7

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

2. IP Address and Partition details:

2.1 Master Node


Hostname Ada001.
Cluster Private IP (bridge0 with eth2) 10.1.2.3/23
Master1 LAN IP (eth0) 172.16.100.61/16

Ganana IP (Floating) 10.1.2.31/23

Management IP 10.1.129.3/23
InfiniBand IP 10.3.2.3/23

Hostname Ada002.

Cluster Private IP (bridge0 with eth2) 10.1.2.4/23


Master2 LAN IP (eth0) 172.16.100.62/16

Management IP 10.1.129.4/23

InfiniBand IP 10.3.2.4/23

Master Nodes Partition detail:


Raid 5 + 1 HS

Partition
Size
name
/boot 1 GB
/ 1.3 TB
Swap 64 GB
/var 600 GB
/tmp 300 GB
/shared 4.4 TB ISCSI storage in Active/Passive for HA

Network Mount Points: -

 5 TB NFS From 10.3.2.63:/home1/application mounted on “/app”


 217 TB NFS From 10.3.2.64:/home1/users mounted on “/adahome”

Dell Technologies Confidential 8

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

 253 TB Lustre from


10.3.2.59@o2ib,10.3.2.60@o2ib1:10.3.2.61@o2ib,10.3.2.62@o2ib1:/scratch mounted
on /scratch
 253 TB Lustre from
10.3.2.59@o2ib,10.3.2.60@o2ib1:10.3.2.61@o2ib,10.3.2.62@o2ib1:/archive mounted
on /archive

2.2 Login Nodes


Hostname ada003
Cluster Private IP (eth2) 10.1.2.5 /16
Login node1 LAN IP (eth1) 172.16.100.63/16

Management IP 10.1.129.5/16
InfiniBand IP 10.3.2.5/16

Hostname ada004

Cluster Private IP (eth2) 10.1.2.6 /16


Login node2 LAN IP (eth0) 172.16.100.64/16
Management IP 10.1.129.6/16

InfiniBand IP 10.3.2.6/16

Network Mount Points: -

 5 TB NFS From 10.3.2.63:/home1/application mounted on “/app”


 217 TB NFS From 10.3.2.64:/home1/users mounted on “/adahome”
 253 TB Lustre from
10.3.2.59@o2ib,10.3.2.60@o2ib1:10.3.2.61@o2ib,10.3.2.62@o2ib1:/scratch mounted
on /scratch

Login Nodes Partition detail:


Raid 5 + 1 HS.

Partition
Size
name

Dell Technologies Confidential 9

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

/boot 1 GB
/ 1.3 TB
Swap 64 GB
/var 600 GB
/tmp 300 GB
/shared 4.4 TB ISCSI storage in Active/Passive for HA

2.3 Compute Nodes


Hostname Cluster IP (eth0) Mgmt IP InfiniBand IP

hpc000- 10.1.128.1- 172.22.2.1- 10.3.1.1-


hpc255
10.1.2.2 10.1.129.2 10.3.2.2

(Note – Management IP is configured in shared mode in BIOS for all the compute node)

Compute Nodes Partition detail:

Partition name Size


/boot 1 GB
/var 400 GB
/tmp 100 GB
/opt 400 GB
SWAP 64 GB
/adahome 232 TB NFS From 10.3.2.64:/home1/users

270 TB Lustre from


/scratch
10.3.2.59@o2ib,10.3.2.60@o2ib1:10.3.2.61@o2ib,10.3.2.62@o2ib1:/scratch

Dell Technologies Confidential 10

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

2.4 Mellanox IB and GigE Switch IP


Mellanox ETH IP

IB HA (mgmt0) 10.1.129.40

IB Management 1 10.1.129.41

IB Management 2 10.1.129.42

Name Management IP
Gig switch1 (Gsw01) 10.1.130.1
Gig switch2 (Gsw02) 10.1.130.2
Gig switch3 (Gsw03) 10.1.130.3
Gig switch4 (Gsw04) 10.1.130.4
Gig switch5 (Gsw05) 10.1.130.5
Gig switch6 (Gsw06) 10.1.130.6
Gig switch7 (Gsw07) 10.1.130.7

2.5 Software details:


Operating System RHEL 7.4 / kernel 3.10.0-693.el7.x86_64

Cluster Management Toolkit Ganana Cluster Tool kit Ver 2.x

Scheduler Altair PBSPro Ver 19.x

Compilers GNU, Intel Cluster Studio 2018, 2019

Libraries GNU, Intel Cluster Studio 2018, 2019

MPI Intel MPI, OpenMPI

OFED Mellanox OFED - 4.5-1.0.1.0

Dell Technologies Confidential 11

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

3. Cluster Manager – (Ganana Cluster Toolkit)


Ganana Cluster Manager contains tools and applications to facilitate the installation, administration, and
monitoring of a cluster.

Ganana HPC Cluster Manager makes it easier for Admins to build Linux based HPC Cluster, and to
easily manage of their clusters on any x64 hardware. The web-based Portal is available as a
flexible, feature centric for the administrators to interact with their HPC cluster or grid in a natural
and powerful way. By standardizing click of button build & manage compute node images,
management / monitoring packages, middleware software and post installation activities. It is
highly useful in all sorts of HPC environment having a lot of advanced features and doing small
things to save a large amount of time for repeated tasks.

Ganana Cluster Toolkit Features

Dell Technologies Confidential 12

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

The following key services for cluster operations are always running on both head nodes:

ganana is cluster manager daemon service should be running in active/passive mode on both
the master nodes.

Figure – Browser login to Ganana web interface (IP https://)

Dell Technologies Confidential 13

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Figure – AMOGH Cluster Overview

Figure – Ganana Version and License

Dell Technologies Confidential 14

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Figure – Cluster management

Figure – Network Details

Dell Technologies Confidential 15

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Figure – OS images

Dell Technologies Confidential 16

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Figure – categories of profiles

Figure – Nodes Information

Dell Technologies Confidential 17

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Figure – Resource utilization monitoring of cluster

Dell Technologies Confidential 18

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

3.1 High Availability


Why Have High Availability?
In a cluster with a single head node, the head node is a single point of failure for the entire cluster.
It is often unacceptable that the failure of a single machine can disrupt the daily operations of a
cluster.
HA Concepts
Primary, Secondary, Active, Passive
Naming: In a cluster with an HA setup, one of the head nodes is named the primary head node
and the other head node is named the secondary head node.
Mode: Under normal operation, out of the two head nodes one will be in active mode, whereas
the other node will be in passive mode.
The difference between active and passive is that the active head takes the lead in cluster-related
activity, while the passive follows it.

Dell Technologies Confidential 19

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

iscsi
Multipath HA- LVM PCS
storage

Ganana High Availability


The Ganana high availability is configured with the Red Hat High Availability two-node cluster
using Pacemaker.
Shared Storage
Almost any HA setup also involves some form of shared storage between two head nodes to
preserve state after a failover sequence.

We are using ISCSI Block storage for sharing the data in active/passive mode between both the
master nodes.

Web Interface

Dell Technologies Confidential 20

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 21

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Shared Block Device Name: /dev/mapper/vg_shared-lv_shared

The “/shared “directory will be the common directory containing the ganana cluster configuration
that are required on both the nodes for HA.

To Ganana keep services as available as possible by eliminating bottlenecks and single


points of failure, we are using Red Hat High Availability two-node cluster using pcs.

High Availability Status

Dell Technologies Confidential 22

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 23

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 24

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 25

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Authentication
NIS Server is configured with primary and secondary server role for achieving the high
availability requirement

Command Output
NIS Server:
10.1.2.20 Floating IP Address
10.1.2.3 Primary Server
10.1.2.4 Secondary Server

Dell Technologies Confidential 26

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 27

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

4. InfiniBand Switch: (SB7800 Series – Switch-IB™ 2


EDR 100Gb/s InfiniBand Smart Switches)

InfiniBand is a special type of networking fabric that has very low latency compared to standard Ethernet
based networks. It enables the use of larger scales MPI jobs that can spread over several nodes.

Mellanox EDR 100Gb/s InfiniBand Switch provides the highest performing fabric solution in a 1U form
factor by delivering up to 7.2Tb/s of non-blocking bandwidth with 90ns port-to-port latency.

Managed Switch: SB7800 Infiniband EDR 100Gbps Switch


Unmanaged Switch: SB7890 Infiniband EDR 100Gbps Switch

Dell Technologies Confidential 28

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 29

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Unified Fabric Manager - Mellanox Technologies


For monitering and identfying the issue in fabric topolgy, Mellanox UFM (Unified Fabric
Manager) is configured on ada004.
UFM Fetaure:

 Measuring fabric utilization and trends


 Identifying and analyzing congestion and bottle necks
 Efficient and centralized management of a large number of devices
[root@ada004 ~]# ufm-launch-gui

To Login & Manage InfiniBand Switch using web browser:

Dell Technologies Confidential 30

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

OpenSM HA Configured on 2 Managed Switch for Fat Tree Topology


HA https://10.1.129.40
Switch1: https://10.1.129.41
Switch2: https://10.1.129.42

Figure- SM subnet manager running from IB switch

Dell Technologies Confidential 31

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 32

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 33

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

5. GigE switch: (DellEMC Networking X-1052)

Total 7 number of Dell X1052 Ethernet Switch are setup and configured in bus topology.
Specification:
 48 GbE ports Per Switch
 4 x 10Gb SFP+ ports Per Switch

All 7 Switch are

Dell Networking X1052

To Login & manage 1G switch using web browser

Gsw01 10.1.130.1
Gsw02 10.1.130.2
Gsw03 10.1.130.3
Gsw04 10.1.130.4
Gsw05 10.1.130.5
Gsw06 10.1.130.6
Gsw07 10.1.130.7

Dell Technologies Confidential 34

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Document: https://www.dell.com/support/home/in/en/inbsd1/product-support/product/networking-x1000-series/docs

Dell Technologies Confidential 35

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

6. Intel Parallel Studio XE 2018/2019


Introduction Intel® Parallel Studio XE Cluster Edition for Windows* and Linux* OS accelerates
parallel software development on cluster systems based on Intel® 64 architectures, as well as
Intel® Many Integrated Core Architecture (Intel® MIC Architecture) on Linux* OS. For Intel® MIC
Architecture, only Intel® Xeon Phi™ coprocessor (codename: Knights Corner) is supported. Intel®
Parallel Studio XE Cluster Edition provides a software tools environment for hybrid parallel
programming (message passing and threading). Intel® Parallel Studio XE Cluster Edition supports
hybrid parallel programming application development using Intel® MPI Library with optimized
parallel libraries, performance analysis, and benchmarks. Intel® Parallel Studio XE Cluster Edition
saves software developers time and improves performance on distributed computing systems.
Intel® Parallel Studio XE Cluster Edition for Linux OS and Windows* OS supports critical parts of
the message-passing interface (MPI) application development process including:
Compiler support through Intel® C++ Compiler XE and Intel® Fortran Compiler XE. Intel® C++
Compiler XE and Intel® Fortran Compiler XE for Windows* and Linux* OS provide support for
Intel® Many Integrated Core Architecture (Intel® MIC Architecture). Intel® C++ Compiler XE for
Windows* and Linux* OS provide support for offload to Intel® Graphics Technology.
Intel® MPI Library 5.1 Update 1, which implements the Message Passing Interface 3.0 Standard
(MPI-3.0). Intel MPI library enables multiple interconnect solutions with a single implementation.
Intel® MPI Library for Linux* OS supports Intel® Many Integrated Core Architecture (Intel® MIC
Architecture).  Intel® Trace Analyzer and Collector 9.0 Update 1 o Intel® Trace Collector provides
event-based tracing in cluster applications through an instrumentation library that ensures low
overhead in execution. The trace information provides performance data, statistics, and
multithreaded events on Intel® 64 and Intel® Many Integrated Core Architecture (Intel® MIC
Architecture). Intel® Trace Analyzer provides visual analysis of application activities gathered by
the Intel Trace Collector. A message checking component of the Intel Trace Collector provides a
novel MPI correctness technology that detects errors with data types, buffers, communicators,
point-to-point messages and collective operations, deadlocks, and data corruption.
Application tuning with optimized mathematical library functions from Intel® Math Kernel Library
(Intel® MKL) that includes ScaLAPACK* solvers and Cluster DFTs (Discrete Fourier Transforms)
Intel® MKL for Linux* OS support Intel® Many Integrated Core Architecture (Intel® MIC
Architecture).
Intel® MPI Benchmarks that makes it easy to gather performance information about a cluster
system.
Intel® Parallel Studio XE Cluster Edition for Linux* having the following features:

 Intel® Composer Compiler XE


 Intel® Trace Analyzer and Collector
 Intel® MPI Library
 Intel® MPI Benchmarks

Dell Technologies Confidential 36

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

 Intel MKL
 Intel IPP
 Intel V Tune Analyzer

$ module load
$icc –v -> check icc version
$ which icc -> check icc path
$ which mpirun -> check mpirun path

Intel MPI

MPI (Message-Passing Interface) is the de-facto standard for parallelization on distributed


memory parallel systems. Multiple processes explicitly exchange data and coordinate their work
flow. MPI specifies the interface but not the implementation. Therefore, there are plenty of
implementations for PC’s as well as for supercomputers. There are freely available
implementations and commercial ones, which are particularly tuned for the target platform. MPI
has a huge number of calls, although it is possible to write meaningful MPI applications just
employing some 10 of these calls.
Intel-MPI is a commercial implementation based on mpich2 which a public domain
implementation of the MPI 2 standard is provided by the Mathematics and Computer Science
Division of the Argonne National Laboratory.
The compiler drivers mpifc, mpiifort, mpiicc, mpiicpc, mpicc and mpicxx and the instruction for
starting an MPI application mpiexec will be included in the search path. There are two different
versions of compiler driver: mpiifort, mpiicc and mpiicpc are the compiler driver for Intel
Compiler. mpifc, mpicc and mpicxx are the compiler driver for GCC (GNU Compiler Collection).
System specific environment
There is a wide range of software packages targeted at different scientific domains installed on Amogh.
These packages are accessible with modules environment.
The basic command to use is module:

module
(no arguments) print usage instructions
avail or av list available software modules
whatis as above with brief descriptions
load <modulename> add a module to your environment
unload <modulename> remove a module
purge remove all modules

The modules loaded into the user’s environment can be seen with:

Dell Technologies Confidential 37

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

$ module list
To check available modules
$module avail
To use the mpich with GCC implementation:
$ module add mpich/ge/gcc

For using the application correct module needs to be loaded in current working shell or in PBSPRO
job submission scripts.

Dell Technologies Confidential 38

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

Dell Technologies Confidential 39

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

7. Start up and Shutdown procedure

7.1 Start-up Procedure


Chronological sequence to initiate “Cluster Start-up”

############# Start the Cluster ############

1. All PDUs, Switches, DDN Storage should be Power ON without any error, before powering on the
master node.

2. Power ON Master Node ada001 manually, wait for 5 minutes to power on properly and wait 2 more
additional minutes to allow lustre and nfs mount automatically

3. Check #pcs status in ada001 node

4. Check all the license server is up and running (lmgrd and flexlm)

5. Power on ada002 secondary Master node, wait for 5 minutes to power on properly and wait 2 more
additional minutes to allow lustre and nfs mount automatically"

6. Check pcs status in ada001 node, both the nodes should be in standby mode and resource should be in
disabled state.

7. Un-standby the ada001 and ada002 with the time difference of 2 minutes. Execute “pcs cluster
unstandby ada001-eth2;sleep 120; pcs cluster unstandby ada002-eth2” in ada001”

8. Verify the Ganana status by browsing the login page

9. Power on all login nodes and compute nodes with racdam/manually and wait till all nodes booted
properly"

10. Verify the lustre and nfs mount in compute and login nodes using the below commands

#clush -a “df -Th | grep -i lustre | dshbak -c”

#clush -a “df -Th | grep -i nfs | dshbak -c”

#clush -a “ypwhich | dshbak -c”

Dell Technologies Confidential 40

Dell Customer Communication - Confidential


Amogh Cluster Admin Guide

7.2 Shutdown Procedure


Chronological sequence to initiate “Cluster Shutdown”

1. Kill all pbs running jobs if any using command as below

echo qdel -W force <job id>

2. Kill all user session

pdsh -w hpc[000-255] killall -u username

pdsh -w ada[001-004] killall -u username

3. Shutdown pbs from all nodes

pdsh -w hpc[000-255] /etc/init.d/pbs stop

4. Shutdown Lustre from all nodes.

pdsh -w hpc[000-255] umount /scratch; lustre_rmmod

pdsh -w ada[001-004] umount /scratch; lustre_rmmod

5. Power off all compute nodes

pdsh -w hpc[000-255] poweroff

pdsh -w ada[003-004] poweroff

6. Power off all login nodes, except ada001 and ada002 Nodes.

7. Perform the standby action for ada002 and ada001 with the gap of 1 minutes.
8. Stop the pbs service on ada002 and ada001 with gap of 1 minutes.
9. Sync all the IO operation on the system”
10. Drop the cache of the server by issuing “echo 3 > /proc/sys/vm/drop_caches”
11. Kill all the pid linked the mounted file system
12. Unmount the nfs and lustre filesystem.
13. Remove the lustre module lustre_rmmod
14. Power off the ada002 and ada001

Dell Technologies Confidential 41

Dell Customer Communication - Confidential

You might also like