0% found this document useful (0 votes)
109 views15 pages

Beowulf Cluster

This document describes how to build a simple Beowulf cluster with Ubuntu. It defines a Beowulf cluster as a group of identical, commercially available computers running a Linux operating system connected via a TCP/IP network to share processing power. The document then explains how to manually set up a basic virtual Beowulf cluster using Ubuntu as the operating system and VirtualBox for virtualization. It outlines the steps needed to configure the nodes, set up networking and passwordless SSH, and install process management tools to enable running jobs across the cluster.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views15 pages

Beowulf Cluster

This document describes how to build a simple Beowulf cluster with Ubuntu. It defines a Beowulf cluster as a group of identical, commercially available computers running a Linux operating system connected via a TCP/IP network to share processing power. The document then explains how to manually set up a basic virtual Beowulf cluster using Ubuntu as the operating system and VirtualBox for virtualization. It outlines the steps needed to configure the nodes, set up networking and passwordless SSH, and install process management tools to enable running jobs across the cluster.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

Home
Building a simple Beowulf cluster
About with Ubuntu
The beowulf cluster article on Wikipedia describes the Beowulf cluster as
follows:

“A Beowulf cluster is a group of what are normally identical,


commercially available computers, which are running a Free and
Open Source Software (FOSS), Unix-like operating system, such
as BSD, GNU/Linux, or Solaris. They are networked into a small
TCP/IP LAN, and have libraries and programs installed which
allow processing to be shared among them.” – Wikipedia,
Beowulf cluster, 28 February 2011.

This means a Beowulf cluster can be easily built with “off the shelf”
computers running GNU/Linux in a simple home network. So building a
Beowulf like cluster is within reach if you already have a small TCP/IP LAN
at home with desktop computers running Ubuntu Linux (or any other Linux
distribution).

There are many ways to install and configure a cluster. There is OSCAR(1),
which allows any user, regardless of experience, to easily install a Beowulf
type cluster on supported Linux distributions. It installs and configures all
required software according to user input.

There is also the NPACI Rocks toolkit(2), which incorporates the latest Red
Hat distribution and cluster-specific software. Rocks addresses the difficulties
of deploying manageable clusters. Rocks makes clusters easy to deploy,
manage, upgrade and scale.

Both of the afore mentioned toolkits for deploying clusters were made to be
easy to use and require minimal expertise from the user. But the purpose of
this tutorial is to explain how to manually build a Beowulf like cluster.
Basically, the toolkits mentioned above do most of the installing and
configuring for you, rendering the learning experience mute. So it would not
make much sense to use any of these toolkits if you want to learn the basics
of how a cluster works. This tutorial therefore explains how to manually build
a cluster, by manually installing and configuring the required tools. In this
tutorial I assume that you have some basic knowledge of the Linux-based
operating system and know your way around the command line. I tried
however to make this as easy as possible to follow. Keep in mind that this is
new territory for me as well and there’s a good chance that this tutorial shows
methods that may not be the best.

I myself started off with the clustering tutorial from SCFBio which gives a
great explanation on how to build a simple Beowulf cluster.(3) It describes

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 1/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

the prerequisites for building a Beowulf cluster and why these are needed.

Contents
What’s a Beowulf Cluster, exactly?
Building a virtual Beowulf Cluster
Building the actual cluster
Configuring the Nodes
Add the nodes to the hosts file
Defining a user for running MPI jobs
Install and setup the Network File System
Setup passwordless SSH for communication between nodes
Setting up the process manager
Setting up Hydra
Setting up MPD
Running jobs on the cluster
Running MPICH2 example applications on the cluster
Running bioinformatics tools on the cluster
Credits
References

What’s a Beowulf Cluster, exactly?


The definition I cited before is not very
complete. The book “Engineering a Beowulf-
style Compute Cluster”(4) by Robert G.
Brown gives a more detailed answer to this
question (if you’re serious about this, this
book is a must read). According to this book, The typical setup of a beowulf
cluster
there is an accepted definition of a beowulf
cluster. This book describes the true beowulf
as a cluster of computers interconnected with a network with the following
characteristics:

1. The nodes are dedicated to the beowulf cluster.


2. The network on which the nodes reside are dedicated to the beowulf
cluster.
3. The nodes are Mass Market Commercial-Off-The-Shelf (M2COTS)
computers.
4. The network is also a COTS entity.
5. The nodes all run open source software.
6. The resulting cluster is used for High Performance Computing (HPC).

Building a virtual Beowulf Cluster


It is not a bad idea to start by building a virtual cluster using virtualization
software like VirtualBox. I simply used my laptop running Ubuntu as the
master node, and two virtual computing nodes running Ubuntu Server Edition

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 2/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

were created in VirtualBox. The virtual cluster allows you to build and test
the cluster without the need for the extra hardware. However, this method is
only meant for testing and not suited if you want increased performance.

When it comes to configuring the nodes for the cluster, building a virtual
cluster is practically the same as building a cluster with actual machines. The
difference is that you don’t have to worry about the hardware as much. You
do have to properly configure the virtual network interfaces of the virtual
nodes. They need to be configured in a way that the master node (e.g. the
computer on which the virtual nodes are running) has network access to the
virtual nodes, and vice versa.

Building the actual cluster


It is good practice to first build and test a virtual cluster as described above. If
you have some spare computers and network parts lying around, you can use
those to build the actual cluster. The nodes (the computers that are part of the
cluster) and the network hardware are the usual kind available to the general
public (beowulf requirement 3 and 4). In this tutorial we’ll use the Ubuntu
operating system to power the machines and open source software to allow
for distributed parallel computing (beowulf requirement 5). We’ll test the
cluster with cluster specific versions of bioinformatics tools that perform
some sort of heavy calculations (beowulf requirement 6).

The cluster consists of the following hardware parts:

Network
Server / Head / Master Node (common names for the same machine)
Compute Nodes
Gateway

All nodes (including the master node) run the following software:

GNU/Linux OS
Ubuntu Server Edition. This tutorial was updated for Ubuntu
12.10.
Network File System (NFS)
Secure Shell (SSH)
Message Passing Interface (MPI)
MPICH, a high performance and widely portable implementation
of the Message Passing Interface (MPI) standard.(5)

I will not focus on setting up the network (parts) in this tutorial. I assume that
all nodes are part of the same private network and that they are properly
connected.

Configuring the Nodes


Some configurations need to be made to the nodes. I’ll walk you through
http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 3/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

them one by one.

Add the nodes to the hosts file

It is easier if the nodes can be accessed with their host name rather than their
IP address. It will also make things a lot easier later on. To do this, add the
nodes to the hosts file of all nodes.(8) (9) All nodes should have a static local
IP address set. I won’t go into details here as this is outside the scope of this
tutorial. For this tutorial I assume that all nodes are already properly
configured to have a static local IP address.

Edit the hosts file ( sudo vim /etc/hosts ) like below and remember that
you need to do this for all nodes,

127.0.0.1 localhost
192.168.1.6 master
192.168.1.7 node1
192.168.1.8 node2
192.168.1.9 node3

Make sure it doesn’t look like this:

127.0.0.1 localhost
127.0.1.1 master
192.168.1.7 node1
192.168.1.8 node2
192.168.1.9 node3

neither like this:

127.0.0.1 localhost
127.0.1.1 master
192.168.1.6 master
192.168.1.7 node1
192.168.1.8 node2
192.168.1.9 node3

Otherwise other nodes will try to connect to localhost when trying to reach
the master node.

Once saved, you can use the host names to connect to the other nodes,

$ ping -c 3 master
PING master (192.168.1.6) 56(84) bytes of data.
64 bytes from master (192.168.1.6): icmp_req=1 ttl=64 time=0.606 ms

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 4/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

64 bytes from master (192.168.1.6): icmp_req=2 ttl=64 time=0.552 ms


64 bytes from master (192.168.1.6): icmp_req=3 ttl=64 time=0.549 ms

--- master ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.549/0.569/0.606/0.026 ms

Try this with different nodes on different nodes. You should get a response
similar to the above.

In this tutorial, master is used as the master node. Once the cluster has been
set up, the master node will be used to start jobs on the cluster. The master
node will be used to spawn jobs on the cluster. The compute nodes are
node1 to node3 and will thus execute the jobs.

Defining a user for running MPI jobs

Several tutorials explain that all nodes need a separate user for running MPI
jobs.(8) (9) (6) I haven’t found a clear explanation to why this is necessary,
but there could be several reasons:

1. There’s no need to remember different user names and passwords if all


nodes use the same username and password.
2. MPICH2 can use SSH for communication between nodes.
Passwordless login with the use of authorized keys only works if the
username matches the one set for passwordless login. You don’t have
to worry about this if all nodes use the same username.
3. The NFS directory can be made accessible for the MPI users only. The
MPI users all need to have the same user ID for this to work.
4. The separate user might require special permissions.

The command below creates a new user with username “mpiuser” and user
ID 999. Giving a user ID below 1000 prevents the user from showing up in
the login screen for desktop versions of Ubuntu. It is important that all MPI
users have the same username and user ID. The user IDs for the MPI users
need to be the same because we give access to the MPI user on the NFS
directory later. Permissions on NFS directories are checked with user IDs.
Create the user like this,

$ sudo adduser mpiuser --uid 999

You may use a different user ID (as long as it is the same for all MPI users).
Enter a password for the user when prompted. It’s recommended to give the
same password on all nodes so you have to remember just one password. The
above command should also create a new directory /home/mpiuser . This is
the home directory for user mpiuser and we will use it to execute jobs on
the cluster.
http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 5/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

Install and setup the Network File System

Files and programs used for MPI jobs (jobs that are run in parallel on the
cluster) need to be available to all nodes, so we give all nodes access to a part
of the file system on the master node. Network File System (NFS) enables
you to mount part of a remote file system so you can access it as if it is a local
directory. To install NFS, run the following command on the master node:

master:~$ sudo apt-get install nfs-kernel-server

And in order to make it possible to mount a Network File System on the


compute nodes, the nfs-common package needs to be installed on all
compute nodes:

$ sudo apt-get install nfs-common

We will use NFS to share the MPI user’s home directory (i.e.
/home/mpiuser ) with the compute nodes. It is important that this directory
is owned by the MPI user so that all MPI users can access this directory. But
since we created this home directory with the adduser command earlier, it
is already owned by the MPI user,

master:~$ ls -l /home/ | grep mpiuser


drwxr-xr-x 7 mpiuser mpiuser 4096 May 11 15:47 mpiuser

If you use a different directory that is not currently owned by the MPI user,
you must change it’s ownership as follows,

master:~$ sudo chown mpiuser:mpiuser /path/to/shared/dir

Now we share the /home/mpiuser directory of the master node with all
other nodes. For this the file /etc/exports on the master node needs to be
edited. Add the following line to this file,

/home/mpiuser *(rw,sync,no_subtree_check)

You can read the man page to learn more about the exports file ( man
exports ). After the first install you may need to restart the NFS daemon:

master:~$ sudo service nfs-kernel-server restart

This also exports the directores listed in /etc/exports . In the future when
http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 6/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

the /etc/exports file is modified, you need to run the following command
to export the directories listed in /etc/exports :

master:~$ sudo exportfs -a

The /home/mpiuser directory should now be shared through NFS. In order


to test this, you can run the following command from a compute node:

$ showmount -e master

In this case this should print the path /home/mpiuser . All data files and
programs that will be used for running an MPI job must be placed in this
directory on the master node. The other nodes will then be able to access
these files through NFS.

The firewall is by default enabled on Ubuntu. The firewall will block access
when a client tries to access an NFS shared directory. So you need to add a
rule with UFW (a tool for managing the firewall) to allow access from a
specific subnet. If the IP addresses in your network have the format
192.168.1.* , then 192.168.1.0 is the subnet. Run the following
command to allow incoming access from a specific subnet,

master:~$ sudo ufw allow from 192.168.1.0/24

You need to run this on the master node and replace “192.168.1.0” by the
subnet for your network.

You should then be able to mount master:/home/mpiuser on the compute


nodes. Run the following commands to test this,

node1:~$ sudo mount master:/home/mpiuser /home/mpiuser


node2:~$ sudo mount master:/home/mpiuser /home/mpiuser
node3:~$ sudo mount master:/home/mpiuser /home/mpiuser

If this fails or hangs, restart the compute node and try again. If the above
command runs without a problem, you should test whether /home/mpiuser
on any compute node actually has the content from /home/mpiuser of the
master node. You can test this by creating a file in master:/home/mpiuser
and check if that same file appears in node*:/home/mpiuser (where
node* is any compute node).

If mounting the NFS shared directory works, we can make it so that the
master:/home/mpiuser directory is automatically mounted when the
compute nodes are booted. For this the file /etc/fstab needs to be edited.

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 7/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

Add the following line to the fstab file of all compute nodes,

master:/home/mpiuser /home/mpiuser nfs

Again, read the man page of fstab if you want to know the details ( man
fstab ). Reboot the compute nodes and list the contents of the
/home/mpiuser directory on each compute node to check if you have access
to the data on the master node,

$ ls /home/mpiuser

This should lists the files from the /home/mpiuser directory of the master
node. If it doesn’t immediately, wait a few seconds and try again. It might
take some time for the system to initialize the connection with the master
node.

Setup passwordless SSH for communication between nodes

For the cluster to work, the master node needs to be able to communicate
with the compute nodes, and vice versa.(8) Secure Shell (SSH) is usually
used for secure remote access between computers. By setting up passwordless
SSH between the nodes, the master node is able to run commands on the
compute nodes. This is needed to run the MPI daemons on the available
compute nodes.

First install the SSH server on all nodes:

$ sudo apt-get install ssh

Now we need to generate an SSH key for all MPI users on all nodes. The
SSH key is by default created in the user’s home directory. Remember that in
our case the MPI user’s home directory (i.e. /home/mpiuser ) is actually the
same directory for all nodes: /home/mpiuser on the master node. So if we
generate an SSH key for the MPI user on one of the nodes, all nodes will
automatically have an SSH key. Let’s generate an SSH key for the MPI user
on the master node (but any node should be fine),

$ su mpiuser
$ ssh-keygen

When asked for a passphrase, leave it empty (hence passwordless SSH).

When done, all nodes should have an SSH key (the same key actually). The
master node needs to be able to automatically login to the compute nodes. To

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 8/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

enable this, the public SSH key of the master node needs to be added to the
list of known hosts (this is usually a file ~/.ssh/authorized_keys ) of all
compute nodes. But this is easy, since all SSH key data is stored in one
location: /home/mpiuser/.ssh/ on the master node. So instead of having to
copy master’s public SSH key to all compute nodes separately, we just have
to copy it to master’s own authorized_keys file. There is a command to
push the public SSH key of the currently logged in user to another computer.
Run the following commands on the master node as user “mpiuser”,

mpiuser@master:~$ ssh-copy-id localhost

Master’s own public SSH key should now be copied to


/home/mpiuser/.ssh/authorized_keys . But since /home/mpiuser/
(and everything under it) is shared with all nodes via NFS, all nodes should
now have master’s public SSH key in the list of known hosts. This means that
we should now be able to login on the compute nodes from the master node
without having to enter a password,

mpiuser@master:~$ ssh node1


mpiuser@node1:~$ echo $HOSTNAME
node1

You should now be logged in on node1 via SSH. Make sure you’re able to
login to the other nodes as well.

Setting up the process manager

In this section I’ll walk you through the installation of MPICH and
configuring the process manager. The process manager is needed to spawn
and manage parallel jobs on the cluster. The MPICH wiki explains this
nicely:

“Process managers are basically external (typically distributed)


agents that spawn and manage parallel jobs. These process
managers communicate with MPICH processes using a
predefined interface called as PMI (process management
interface). Since the interface is (informally) standardized within
MPICH and its derivatives, you can use any process manager
from MPICH or its derivatives with any MPI application built
with MPICH or any of its derivatives, as long as they follow the
same wire protocol.” – Frequently Asked Questions - Mpich.

The process manager is included with the MPICH package, so start by


installing MPICH on all nodes with,

$ sudo apt-get install mpich2

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 9/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

MPD has been the traditional default process manager for MPICH till the
1.2.x release series. Starting the 1.3.x series, Hydra is the default process
manager.(10) So depending on the version of MPICH you are using, you
should either use MPD or Hydra for process management. You can check the
MPICH version by running mpich2version in the terminal. Then follow
either the steps for MPD or Hydra in the following sub sections.

Setting up Hydra

This section explains how to configure the Hydra process manager and is for
users of MPICH 1.3.x series and up. In order to setup Hydra, we need to
create one file on the master node. This file contains all the host names of the
compute nodes.(11) You can create this file anywhere you want, but for
simplicity we create it in the the MPI user’s home directory,

mpiuser@master:~$ cd ~
mpiuser@master:~$ touch hosts

In order to be able to send out jobs to the other nodes in the network, add the
host names of all compute nodes to the hosts file,

node1
node2
node3

You may choose to include master in this file, which would mean that the
master node would also act as a compute node. The hosts file only needs to
be present on the node that will be used to start jobs on the cluster, usually the
master node. But because the home directory is shared among all nodes, all
nodes will have the hosts file. For more details about setting up Hydra see
this page: Using the Hydra Process Manager.

Setting up MPD

This section explains how to configure the MPD process manager and is for
users of MPICH 1.2.x series and down. Before we can start any parallel jobs
with MPD, we need to create two files in the home directory of the MPI user.
Make sure you’re logged in as the MPI user and create the following two files
in the home directory,

mpiuser@master:~$ cd ~
mpiuser@master:~$ touch mpd.hosts
mpiuser@master:~$ touch .mpd.conf

In order to be able to send out jobs to the other nodes in the network, add the
http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 10/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

host names of all compute nodes to the mpd.hosts file,

node1
node2
node3

You may choose to include master in this file, which would mean that the
master node would also act as a compute node. The mpd.hosts file only
needs to be present on the node that will be used to start jobs on the cluster,
usually the master node. But because the home directory is shared among all
nodes, all nodes will have the mpd.hosts file.

The configuration file .mpd.conf (mind the dot at the beginning of the file
name) must be accessible to the MPI user only (in fact, MPD refuses to work
if you don’t do this),

mpiuser@master:~$ chmod 600 .mpd.conf

Then add a line with a secret passphrase to the configuration file,

secretword=random_text_here

The secretword can be set to any random passphrase. You may want to use
a random password generator the generate a passphrase.

All nodes need to have the .mpd.conf file in the home directory of
mpiuser with the same passphrase. But this is automatically the case since
/home/mpiuser is shared through NFS.

The nodes should now be configured correctly. Run the following command
on the master node to start the mpd deamon on all nodes,

mpiuser@master:~$ mpdboot -n 3

Replace “3” by the number of compute nodes in your cluster. If this was
successful, all nodes should now be running the mpd daemon. Run the
following command to check if all nodes entered the ring (and are thus
running the mpd daemon),

mpiuser@master:~$ mpdtrace -l

This command should display a list of all nodes that entered the ring. Nodes
listed here are running the mpd daemon and are ready to accept MPI jobs.

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 11/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

This means that your cluster is now set up and ready to rock!

Running jobs on the cluster


Running MPICH2 example applications on the cluster

The MPICH2 package comes with a few example applications that you can
run on your cluster. To obtain these examples, download the MPICH2 source
package from the MPICH website and extract the archive to a directory. The
directory to where you extracted the MPICH2 package should contain an
“examples” directory. This directory contains the source codes of the
example applications. You need to compile these yourself.

$ sudo apt-get build-dep mpich2


$ wget http://www.mpich.org/static/downloads/1.4.1/mpich2-1.4.1.tar.gz
$ tar -xvzf mpich2-1.4.1.tar.gz
$ cd mpich2-1.4.1/
$ ./configure
$ make
$ cd examples/

The example application cpi is compiled by default, so you can find the
executable in the “examples” directory. Optionally you can build the other
examples as well,

$ make hellow
$ make pmandel
...

Once compiled, place the executables of the examples somewhere inside the
/home/mpiuser directory on the master node. It’s common practice to place
executables in a “bin” directory, so create the directory /home/mpiuser/bin
and place the executables in this directory. The executables should now be
available on all nodes.

We’re going to run an MPI job using the example application cpi . Make
sure you’re logged in as the MPI user on the master node,

$ su mpiuser

And run the job like this,

When using MPD:

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 12/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

mpiuser@master:~$ mpiexec -n 3 /home/mpiuser/bin/cpi

When using Hydra:

mpiuser@master:~$ mpiexec -f hosts -n 3 /home/mpiuser/bin/cpi

Replace “3” by the number of nodes on which you want to run the job. When
using Hydra, the -f switch should point to the file containing the host
names. When using MPD, it’s important that you use the absolute path to the
executable in the above command, because only then MPD knows where to
look for the executable on the compute nodes. The absolute path used should
thus be correct for all nodes. But since /home/mpiuser is the NFS shared
directory, all nodes have access to this path and the files within it.

The example application cpi is useful for testing because it shows on which
nodes each sub process is running and how long it took to run the job. This
application is however not useful to test performance because this is a very
small application which takes only a few milliseconds to run. As a matter of
fact, I don’t think it actually computes pi. If you look at the source, you’ll
find that the value of pi is hard coded into the program.

Running bioinformatics tools on the cluster

By running actual bioinformatics tools you can give your cluster a more
realistic test run. There are several parallel implementations of bioinformatics
tools that are based on MPI. There are two that I currently know of:

mpiBLAST, a parallel implementation of NCBI BLAST.


ClustalW-MPI, a parallel implementation of Clustal-W.

It would be nice to test mpiBLAST, but because of a compilation issue, I was


not able to do so. After some asking around at the mpiBLAST-Users mailing
list, I got an answer:

“That problem is caused by a change in GCC version 4.4.X. We


don’t have a fix to give out for the issue as yet, but switching to
4.3.X or lower should solve the issue for the time being.”(7)

Basically, I’m using a newer version of the GCC compiler which fails to
build mpiBLAST. In order to compile it, I’d have to use an older version. But
to instruct mpicc to use GCC 4.3 instead, requires that MPICH2 be
compiled with GCC 4.3. Instead of going through that trouble, I’ve decided to
give ClustalW-MPI a try instead.

The MPI implementation of ClustalW is fairly out-dated, but it’s good


enough to perform a test run on your cluster. Download the source from the
website, extract the package, and compile the source. Copy the resulting

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 13/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

executable to the /home/mpiuser/bin directory on the master node. Use for


example Entrez to search for some DNA/protein sequences and put these in a
single FASTA file (the NCBI website can do that for you). Create several
FASTA files with multiple sequences to test with. Copy the multi-sequence
FASTA files to a data directory inside mirror (e.g. /home/mpiuser/data ).
Then run a job like this,

When using MPD:

mpiuser@master:~$ mpiexec -n 3 /home/mpiuser/bin/clustalw-mpi /home/mpiuse

When using Hydra:

mpiuser@master:~$ mpiexec -f hosts -n 3 /home/mpiuser/bin/clustalw-mpi /ho

and let the cluster do the work. Again, notice that we must use absolute paths.
You can check if the nodes are actually doing anything by logging into the
nodes ( ssh node* ) and running the top command. This should display a
list of running processes with the processes using the most CPU on the top. In
this case, you should see the process clustalw-mpi somewhere along the
top.

Credits
Thanks to Reza Azimi for mentioning the nfs-common package.

References
1. OpenClusterGroup. OSCAR.
2. Philip M. Papadopoulos, Mason J. Katz, and Greg Bruno. NPACI
Rocks: Tools and Techniques for Easily Deploying Manageable Linux
Clusters. October 2001, Cluster 2001: IEEE International Conference
on Cluster Computing.
3. Supercomputing Facility for Bioinformatics & Computational Biology,
IIT Delhi. Clustering Tutorial.
4. Robert G. Brown. Engineering a Beowulf-style Compute Cluster. 2004.
Duke University Physics Department.
5. Pavan Balaji, et all. MPICH2 User’s Guide, Version 1.3.2. 2011.
Mathematics and Computer Science Division Argonne National
Laboratory.
6. Kerry D. Wong. A Simple Beowulf Cluster.
7. mpiBLAST-Users: unimplemented: inlining failed in call to ‘int
fprintf(FILE*, const char*, …)’
8. Ubuntu Wiki. Setting Up an MPICH2 Cluster in Ubuntu.

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 14/15
16/11/2014 Byobu - Building a simple Beowulf cluster with Ubuntu

9. Linux.com. Building a Beowulf Cluster in just 13 steps.


10. wiki.mpich.org. Frequently Asked Questions - Mpich.
11. wiki.mpich.org. Using the Hydra Process Manager - Mpich.

Tags: Linux, Ubuntu, cluster, bioinformatics

This page was last updated on Wednesday September 11, 2013 at 21:10:26 CEST.

Except where otherwise noted, content on this site is licensed under a Creative Commons
Attribution-ShareAlike 3.0 License.

http://byobu.info/article/Building_a_simple_Beowulf_cluster_with_Ubuntu/#building-a-virtual-beowulf-cluster 15/15

You might also like