0% found this document useful (0 votes)
124 views27 pages

HPC Cluster Admin Tools DAY

Uploaded by

madhursen99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views27 pages

HPC Cluster Admin Tools DAY

Uploaded by

madhursen99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Clustering Software

© 2010, Centre for Development of Advanced


Computing, Pune
Agenda

 Definition
 Software stack
 Software stack components
 Component support
 Cluster tools
 xCAT
 KUSU
 OSCAR
 Rocks

© 2010, Centre for Development of Advanced


Computing, Pune
Clustering tool

 A clustering tool allow a group of linked computers, to work together


closely thus in many respects forming a single computer.

© 2010, Centre for Development of Advanced


Computing, Pune
Software stack

© 2010, Centre for Development of Advanced


Computing, Pune
Software stack components

 Schedulers  Compilers & Debuggers


 PBS  Intel
 SGE  AMD
 LSF  PGI
 Totalview
 MOAB
 Slurm
 OS/Kernel space
 CentOS
 Libraries
 Luster
 OFED
 OSS/MDS
 OpenMPI  Sam-QFS
 LaPACK, Atlas

© 2010, Centre for Development of Advanced


Computing, Pune
Software stack components cont.

 Verification Modules  Adapters


 Lozone3  Ethernet
 IOR  Mellanox IB
 NetPipe  SATA
 HPCC Benchmarking  Fiber channel
 IOkit  Graphics/VIZ
 LNET Telf Test  GPGPU

© 2010, Centre for Development of Advanced


Computing, Pune
Software stack components cont.

 HW platform  Node
 SW RAID  ILOM
 Constellation  IPMI-based
 Galaxy/blade/etc. technologies  Service processors
 Switch  Power on/off
 Drivers including OFED  BIOS integration

© 2010, Centre for Development of Advanced


Computing, Pune
Component Support

Open Softwares
CentOS(RHEL)
GNU Compiler collection
IPMItool
OpenSM
LNET
Self Test Product Support
IOKit Server OEM's

Verification Modules
3rd Party ISV
TotalView NetPIPE Community
PGI Compiler HPCC Bench Suit SLURM Ganglia OpenMPI
Intel Compiler Iozone3 IOR OneSIS CFEngine
LSF FreeIPMI pdsh OFED
PBS Conman Powerman
MOAB Cluster Resources

© 2010, Centre for Development of Advanced


Computing, Pune
Some Cluster tools

 xCAT
 KUSU
 OSCAR
 Rocks
 OpenHPC

© 2010, Centre for Development of Advanced


Computing, Pune
xCAT

 xCAT (Extreme Cluster Administration Toolkit) is open-source


distributed computing management software used for the deployment
and administration of clusters.
It can:
 create and manage diskless clusters
 install and manage many Linux cluster machines in parallel
 set up a high-performance computing software stack, including software for
batch job submission, parallel libraries, and other software that is useful on a
cluster
 cloning and imaging Linux and Windows machines

© 2010, Centre for Development of Advanced


Computing, Pune
features

xCAT has specific features designed to take advantage of IBM hardware


including:
 Remote Power Control
 Remote POST/BIOS console
 Serial over LAN functions
 Hardware alerts and vitals provided via SNMP and EMAIL
 Inventory and hardware management

© 2010, Centre for Development of Advanced


Computing, Pune
OSCAR

 OSCAR (Open Source Cluster Application Resources) is a snapshot of


the best known methods for building, programming, and using
clusters. It consists of a fully integrated and easy-to-install software
bundle designed for high performance cluster computing.

© 2010, Centre for Development of Advanced


Computing, Pune
Rocks

 Rocks Cluster Distribution is a Linux distribution intended for high-


performance computing clusters.
 Based on the Red Hat Linux distribution.
 Cluster on a DVD ( or set of CDs)
 Goals of Rocks
 Make Clusters Easy
 Reduce dependence on system administrators by reducing system
administration
 Enable scientists to concentrate on science rather than on computing.

© 2010, Centre for Development of Advanced


Computing, Pune
Rolls

 Alpha  hpc
 Area51  Java
 Base  kernel
 Bio
 Condor
 Ganglia
 Grid

© 2010, Centre for Development of Advanced


Computing, Pune
Architecture

© 2010, Centre for Development of Advanced


Computing, Pune
Installing Rocks

• Rocks provides for installation of a cluster with


minimum user interaction
• Steps involved
• Boot from CD or DVD
• Answer a few questions
• Get favorite beverage
• You’re on your way to a full fledged cluster in 2 hours or
less.
Innards of Automatic Installs

• Rocks uses Redhat’s Kickstart mechanism for automated installs


• Kickstart file generated and given to the installer
• Installer follows instructions in Kickstart file.
• Generation of Kickstart file done automatically by Rocks scripts
The Head Node

• Users login, submit jobs, compile code, etc


• Uses two Ethernet interfaces
• one public, one private for compute nodes
• Normally has lots of disk space
(system partitions < 30 GB)
• Provides many system services
• NFS, DHCP, DNS, MySQL, HTTP, 411, Firewall,etc
• Cluster configuration
Compute Nodes

• Basic compute workhorse


• Lots of memory (if lucky)
• Minimal storage requirements
• Single Ethernet connection for private LAN
• Disposable
• OS easily re-installed from head node
• Nodes can be heterogeneous
NFS in ROCKS

• User accounts are served over NFS


• Works for small clusters (< 128 nodes)
• Will not work for large clusters (>1024)
• NAS tends to work better
• Applications are not served over NFS
• /usr/local does not exist
• All software is installed locally (/opt)
411 Secure Information Service

• Provides NIS-like functionality


• Securely distributes password files, user and group configuration files
and the like using Public Key Cryptography to protect file content.
• Uses HTTP to distribute the files
• Scalable, secure and low latency
411 Architecture

1. Client nodes listen on the IP


broadcast address for “411
alert” messages from the head
node.
2. Nodes then pull the file from
the head node via HTTP after
some delay to avoid flooding
the master with requests.
Ganglia Monitoring

• Ganglia is a scalable distributed monitoring system for high-


performance computing systems such as clusters and grids

• It leverages widely used technologies such as XML for data


representation, XDR for compact, portable data transport, and RRDtool
for data storage and visualization.

• It uses carefully engineered data structures and algorithms to achieve


very low per-node overheads and high concurrency.

• Provides a heartbeat to determine compute node availability.


Cluster Status with Ganglia
Sun Grid Engine (SGE)

• SGE is resource management


software
• Accepts jobs submitted by users

• Schedules them for execution on


appropriate systems based on resource
management policies
• Can submit 100s of jobs without
worrying where it will run

• Supports serial as well as parallel jobs


Accessing the Cluster

Access the cluster via an SSH client

• PuTTY
• SSH Secure Shell
• X-Win32
• F-Secure
To transfer data to the cluster use
either scp or sftp.
Windows users can download and
use WinSCP (http://winscp.net)
Thank you

© 2010, Centre for Development of Advanced


Computing, Pune

You might also like