Clustering
Clustering
What is Clustering?
Clustering is the use of multiple computers, typically PCs or UNIX workstations, multiple storage
devices, and redundant interconnections, to form what appears to users as a single highly available
system. Cluster computing can be used for load balancing as well as for high availability. It is used
as a relatively low-cost form of parallel processing machine for scientific and other applications
that lend themselves to parallel operations.
Computer cluster technology puts clusters of systems together to provide better system reliability
and performance. Cluster server systems connect a group of servers together in order to jointly
provide processing service for the clients in the network.
Cluster operating systems divide the tasks amongst the available servers. Clusters of systems or
workstations, on the other hand, connect a group of systems together to jointly share a critically
demanding computational task. Theoretically, a cluster operating system should provide seamless
optimization in every case.
At the present time, cluster server and workstation systems are mostly used in High Availability
applications and in scientific applications such as numerical computations.
1
Advantages of clustering
• High performance
• Large capacity
• High availability
• Incremental growth
Applications of Clustering
• Scientific computing
• Making movies
• Commercial servers (web/database/etc)
Although clustering can be performed on various operating systems like Windows, Macintosh,
Solaris etc. , Linux has its own advantages which are as follows:-
The procedure described is based on the concept of a Beowulf cluster, both using LAM and
publicly available OSCAR software package.
Index:
• About Linux
• File structure of Linux
• NFS
• NIS
• RPM
• PVM
• PBS
• MPI
• SSH
• IP address
• Cluster Components
• Building Linux Cluster using LAM
• Open source package eg: OSCAR.
• Appendix
2
I) About Linux
Linux is an open-source operating system like Unix. It has the reputation of a very secure and
efficient system. It is used most commonly to run network servers and has also recently started to
make inroads into Microsoft dominant desktop business. It is available for wide variety of
computing devices from embedded systems to huge multiprocessors, also it is available for
different processors like x86, powerpc, ARM, Alpha, Sparc, MIPS, etc. It should be remembered
that Linux is essentially the OS Kernel developed by Linus Torvald and is different from the
commonly available distributions like RedHat, Caldera,etc (These are Linux Kernel plus GPLed
softwares).
$>cd directorypath
puts copies of all the files listed into the directory named. Contrast this to the mv command
which moves or renames a file.
See the online man pages for many other ways to use ln.
$>mkdir directoryname
3
makes a subdirectory called directoryname.
changes the name of file1 to file2. If the second argument is a directory, the file is moved to
that directory. One can also specify that the file have a new name in the directory ‘direc’:
would move file1 to directory direc and give it the name file2 in that directory.
$>rm filename
$>rmdir directoryname
removes the subdirectory named directoryname (if it is empty of files). To remove a
directory and all files in that directory, either remove the files first and then remove the
directory or use the rm –r option described above.
ls -a [directory]
lists all files, including files whose names start with a period.
ls -c [directory]
lists files by date of creation.
ls -l [directory]
lists files in long form: links, owner, size, date and time of last change.
ls -p [directory]
subdirectories are indicated by /.
4
ls -r [directory]
reverses the listing order.
ls -s [directory]
gives the sizes of files in blocks.
ls -C [directory]
lists files in columns using full screen width.
ls -R [directory]
recursively lists files in the current directory and all subdirectories.
2. File Transfer
$>put filename
$>mput pattern*
$>get filename
$>mget pattern*
$>mkdir directoryname
$>lcd directorypath
5
2.8 Changing directory in the remote host
$>cd directorypath
$>bye
Data and programs are stored in files, which are segmented in directories. In a simple way, a
directory is just a file that contains other files (or directories). The part of the hard disk where one
is authorized to save data is called home directory. Normally all the data that is to be save will be
saved in files and directories in the home directory. The symbol ~ can also be used for home
directory. The directory structure of Linux is a tree with directories inside directories, several
levels .The tree starts at what is called the root directory / (slash).
The following are the list of directories or say branches of the tree.
6
III) Network File System (NFS)
A distributed file system that enables users to access files and directories located on remote
computers and treat those files and directories as if they were local. NFS is independent of
machine types, operating systems, and network architectures through the use of remote procedure
calls (RPC).
Network Information Service has to be known throughout the network to all machines on the
network. NIS is a distributed database that provides a repository for storing information about
hosts, users, and mailboxes in the UNIX environment. It was originally developed by Sun
Microsystems and called YP (Yellow Pages). NIS is used to identify and locate objects and
resources that are accessible on a network.
Linux files are generally RPMs. RPM also stands for Red Hat Package Manager. Red Hat Linux
uses the RPM technology of software installation and upgrades. Using RPM, either from the shell
prompt or through Gnome-RPM, is a safe and convenient way to upgrade or install software.
PVM (Parallel Virtual Machine) is a software package that permits a heterogeneous collection of
Unix and/or Windows computers hooked together by a network to be used as a single large parallel
computer. The individual computers may be shared- or local-memory multiprocessors, vector
supercomputers, specialized graphics engines, or scalar workstations, that may be interconnected
by a variety of networks ,such as ethernet , FDDI.
7
VII) Portable Batch System (PBS)
OpenPBS is the original version of the Portable Batch System. It is required for the scheduling of
the jobs. It is a flexible batch queuing system. It operates on networked, multi-platform UNIX
environments. OpenPBS consists of three primary components—a job server (pbs_server)
handling basic queuing services such as creating and modifying a batch job and placing a job into
execution when it’s scheduled to be run. The executor (pbs_mom) is the daemon that actually runs
jobs. The job scheduler (pbs_sched) is another daemon. pbs_server and pbs_sched are run only on
the front end node(server node), while pbs_mom is run on every node of the cluster that can run
jobs, including the front end node(server).
MPI is a widely accepted standard for communication among nodes that run a parallel program on
a distributed-memory system. The standard defines the interface for a set of functions that can be
used to pass messages between processes on the same computer or on different computers. MPI
can be used to program shared memory or distributed memory computers. Hence MPI is a library
of routines that can be called from Fortran and C programs There are a large number of
implementations of MPI, two open-source versions are MPICH and LAM.
A packet-based binary protocol that provides encrypted connections to remote hosts or servers.
Secure Shell is a program to log into another computer over a network, to execute commands in a
remote machine, and to move files from one machine to another. It provides strong authentication
8
and secure communications over insecure channels. It is a replacement for rlogin, rsh, rcp, and
rdist, telnet, ftp.
Internet protocol is an identifier for a computer or device on a TCP/IP network. Networks using
the TCP/IP protocol route messages based on the IP address of the destination. The format of an IP
address is a 32-bit numeric address written as four numbers separated by periods. Each number can
be zero to 255. For example, 10.90.78.45 could be an IP address.
Within an isolated network, you can assign IP addresses at random as long as each one is unique.
However, connecting a private network to the Internet requires using registered IP addresses
(called Internet addresses) to avoid duplicates
The cluster consists of four major parts. These parts are: 1) Network, 2) Compute nodes, 3) Master
server, 4) Gateway. Each part has a specific function that is needed for the hardware to perform its
function.
1. Network:
2. Nodes:
3. Server:
4. Gateway:
9
XII) Building a Linux Cluster using Local Area Multicomputer (LAM)
Introduction
LAM is a high-quality open source implementation of the Message Passing Interface specification.
It is a development system for heterogeneous computers on a network which can be used to solve
compute intensive problems. LAM is an effective way for fast client-to-client communication and
is portable to all UNIX machines. It includes standard support for SUN (SunOS and Solaris), SGI
IRIX, IBM AIX, DEC OSF/1, HPUX, and LINUX.
Requirements
HARDWARE:
• Ethernet switch for physical connection between the nodes.
• CPUs (Central Processing Units) depending on the number of nodes to be clustered.
• Monitor
• Network Cables
• LAN (local area network) card
• Optional- back up power supply, racks for computers
SOFTWARE:
• LINUX OS - http://www.linuxiso.org/
• LAM package - http://www.lam-mpi.org/7.0/download.ph
Installation Procedure
Once Linux has been installed (preferably the same version) on all the nodes that need to be
clustered, rsh needs to be configured on all nodes such that one can connect to any other nodes in
the cluster without password. Following steps need to be done for the same:
1. Login with root and add same username on all the nodes, preferably with the same password.
In this document we are assuming that the username is “try”.
2. Type “setup” at command prompt and click on system services. Make sure the following are
checked: rsh, rlogin, rexec, nfs.
3. Edit the file /etc/hosts. It contains a list of IP addresses and the hostnames of the nodes that need
to be clustered,
10
[root@root]$ vi /etc/hosts
The file should have the following format:
# IP address Hostname alias
10.96.6.1 scfbio01 node1
10.96.6.2 scfbio02 node2
.
.
Respectively
(The word “alias” refers to the different nodes to be clustered and “Hostname” can be checked
by the following command on all the nodes: $ hostname)
5. Create/Edit the file /home/try/.rhosts to allow trusted access to given host/user combinations
without the need for a password.
[root@root]$ vi /home/try/.rhosts
7. Edit the file /etc/securetty to enable system services like rsh, rlogin and rexec,
[root@root ]$ vi /etc/securetty
11
8. Configuration of the nodes is now complete. We can check the connectivity between different
nodes by switching to “try” on node1, using the following command:
1. Check whether LAM (preferably same version) exists on all the systems,
[root@root]$ lam
2. If does not exist, download the latest version of LAM (rpm package) from the following link:
http://www.lam-mpi.org/7.0/download.ph
3. Create a hostfile in “try” directory on the node where MPI programs needs to be run
subsequently (we assume it to be node1). It provides a listing of the machines to be booted in a
MPI session.
[try@node1]$ vi hostfile
If the comment “WooHoo” appears on the screen, lamboot command in the next step will
execute successfully.
5. Booting LAM: lamboot starts LAM on the specified cluster to initiate MPI session.
At the end, if the comment “topology done” appears on the screen, this command will execute
on different nodes successfully.
1. Compilers for C, C++ and fortran programs are mpicc, mpiCC (or mpic++) and mpif77
respectively. These compilers include all the relevant files and directories required for running
the MPI programs.
Three examples of MPI programs are listed below for demonstrating basic concepts of parallel
computing:
#include <stdio.h>
#include<string.h>
#include <mpi.h>
#include<math.h>
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &node);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int length=atoi(argv[1]);
int perNode=length/size;
if(perNode==0) perNode=1;
if(node==0)
{
printf("The maximum input number is %5d \n",length);
int tag;
int Sum=0;
int number;
13
int startPosition=(size-1) * perNode+1;
for(;startPosition<=length;startPosition++)
{
Total+=pow(startPosition,2.0);
}
for(tag=1;tag<size;tag++)
{
MPI_Recv(&Total,1,MPI_INT,tag,tag,MPI_COMM_WORLD,status);
Sum+=Total;
}
printf("The Sum of square of 1 - %d numbers is %5d
\n",length,Sum);
}
else
{
int startPosition=(node-1) * perNode+1;
int endPosition=node * perNode;
if(startPosition>length)
{
startPosition=length+1;
endPosition=length+1;
}
if(endPosition>length)
{
endPosition=length;
}
MPI_Send(&Total,1,MPI_INT,0,node,MPI_COMM_WORLD);
}
14
MPI_Finalize();
return 0;
}
/* the program ends here */
This compilation will generate an executable. Executable file, “sumN.exe” in this case, should
be copied to try directory on all the nodes.
where np are the number of processors on which the program needs to run, 6 in this case.
(ii) Parallel Program to count number of base pairs in a given sequence and implement
Chargaff's rule of base pairing – Listed in Appendix
(iii) Parallel Program to calculate number of times the alphabet "e" appears in the given text –
Listed in Appendix
2. If another MPI program needs to be executed, the following command can be used. It will
remove all the user processes and messages without rebooting.
[try@node1]$ lamclean –v
3. When LAM is no longer needed, lamhalt command removes all traces of LAM session from the
network.
[try@node1]$ lamhalt
4. In case one or more lam nodes crash i.e., lamhalt hangs, the following command needs to be
run:
This will kill all the processes running on the hosts mentioned in hostfile.
15
XIII) OSCAR- Open Source Cluster Application Resource
OSCAR (Open Source Cluster Application Resource) software package is a high performance
cluster (HPC) used to simplify the complex tasks required to install a cluster. Its advantage is that
several HPC-related packages like MPI implementations, LAM, PVM (Parallel Virtual Machine),
PBS (Portable Batch Server) etc are installed by default and need not be installed separately. It can
be downloaded from the link given below –
http://oscar.openclustergroup.org/download
Supported Distributions
The following is a list of supported Linux distributions for the OSCAR-4.1 package:
• Red Hat Linux 9 (x86)
• Red Hat Enterprise Linux 3 (x86, ia64)
• Fedora Core 3 (x86)
• Fedora Core 2 (x86)
• Mandriva Linux 10.0 (x86)
Each individual machine of a cluster is reffered to as a node. In OSCAR cluster there are two
types of nodes: server and client. A server is responsible for serving the requests of client nodes,
whereas a client is dedicated to computation. An OSCAR cluster consists of one server node and
one or more client nodes,where all the client nodes must have homogeneous hardware. The Cluster
which we are having is named as Linster, which is originated from the term Linux Cluster.
Configuration of Linster
• 16 node cluster.
• 1 Server node and 15 Client nodes.
Server
Clients
Steps involved in installing the OSCAR distribution on the server are as follows:
• Install RedHat linux on the Server node and make partition accordingly. (We are using
RedHat Linux Enterprise 3).
• Download OSCAR distribution package from http://oscar.sourceforge.net/
• Go to the OSCAR directory by using command.
# cd /root/oscar-4.1
# ./configure.
# source /etc/profile
# cp /mnt/cdrom/RedHat/RPMS/*.rpm /tftpboot/rpm.
# cd $OSCAR_HOME
# ./install_cluster <device>
17
Substitute the device name (e.g., eth1) for server’s private network Ethernet adapter. While
running this command, an OSCAR installation wizard will appear on the screen. Run all the steps
which are given in the wizard according to their instructions.
By running this wizard up to the first four steps all things are taken care of by OSCAR itself.
In the 5th step only, define the number of clients to be added to the cluster. Assign an individual IP
addresses to all the clients. Also, assign MAC addresses to all the nodes. In the 6th step, network
booting is done in order to make the clients synchronize with the servers and linux is installed on
the client node with the same configuration as that of the server. Run “Complete Cluster Setup”
and “Test Cluster Setup” to check the authenticity of the cluster. With this, clustering is complete
in a very simple and easy manner.
18
Advantages of OSCAR
The main benefit of using this package is that there is no need to configure or install the different
file systems or services like NFS, NIS, PBS etc which otherwise will have to be install separately,
which saves the user’s time. Job can be fired on the server node and the server manages to
distribute it to Client nodes. Any number of clients can be added or removed at any instance, even
after the setup is done.
Conclusion
The prime concern for making this documentation is to introduce a general idea of utilizing the
available resources by making the users aware of the knowledge and concepts of Linux which
otherwise would not be explored due to lack of proper training and guidance. Also, while doing
clustering one would be well versed with Linux operating system, which is considered as a bit
complex and much more technical than other operating systems.
19
Appendix
I) Parallel Program to count number of base pairs in a given sequence and implement
Chargaff's rule of base pairing – refer to page 15
#include<stdio.h>
#include<string.h>
#include<mpi.h>
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &node);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int length=strlen(Seq);
int perNode=length/size;
if(node==0)
{
printf("%s \n",Seq);
printf("The length of the Sequence is %5d \n",length);
int tag;
int startPosition=(size-1) * perNode;
for(;startPosition<length;startPosition++)
if(Seq[startPosition]=='A')
BP[0]++;
else if(Seq[startPosition]=='T')
BP[1]++;
else if(Seq[startPosition]=='G')
20
BP[2]++;
else if(Seq[startPosition]=='C')
BP[3]++;
for(count=0;count<4;count++)
DNABase[count]=BP[count];
for(tag=1;tag<size;tag++)
{
MPI_Recv(&BP[0],4,MPI_INT,tag,tag,MPI_COMM_WORLD,status);
for(count=0;count<4;count++)
DNABase[count]+=BP[count];
}
for(count=0;count<4;count++)
if(count==0)
printf("The number of A's in the given sequence is %5d
\n",DNABase[count]);
else if(count==1)
printf("The number of T's in the given sequence is %5d
\n",DNABase[count]);
else if(count==2)
printf("The number of G's in the given sequence is %5d
\n",DNABase[count]);
else if(count==3)
printf("The number of C's in the given sequence is %5d
\n",DNABase[count]);
+DNABase[1],DNABase[2]+DNABase[3]);
}
else
{
int startPosition=(node-1) * perNode;
int endPosition=node * perNode;
MPI_Send(&BP[0],4,MPI_INT,0,node,MPI_COMM_WORLD);
}
21
MPI_Finalize();
return 0;
}
II) Parallel Program to calculate number of times the alphabet "e" appears in the given text
– refer to page 15
#include <stdio.h>
#include<string.h>
#include <mpi.h>
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &node);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int length=strlen(Text);
int perNode=length/size;
if(node==0)
{
printf("%s \n",Text);
printf("The length of the text is %5d \n",length);
22
int tag;
int Sum=0;
int startPosition=(size-1) * perNode;
for(;startPosition<length;startPosition++)
if(Text[startPosition]=='e')
Total++;
printf("The number of e's counted = %6d by the given node = %5d \n",Total,node);
Sum=Total;
for(tag=1;tag<size;tag++)
{
MPI_Recv(&Total,1,MPI_INT,tag,tag,MPI_COMM_WORLD,status);
Sum+=Total;
}
printf("The number of e's in the given text is %5d \n",Sum);
}
else
{
int startPosition=(node-1) * perNode;
int endPosition=node * perNode;
MPI_Send(&Total,1,MPI_INT,0,node,MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
23