Parallel and Cluster Computing
Parallel and Cluster Computing
Introduction
What is cluster computing?
Classification of Cluster Computing
Technologies:
Beowulf cluster
Construction of Beowulf Cluster
The use of cluster computing in Bioinformatics & Parallel
Computing
Folding@Home Project
High performance clusters (HPC)
a 256-processor Sun cluster.
Build Your Own Cluster! 2
Mainly in parallel: Split problem in smaller tasks that are
executed concurrently
Why?
Absolute physical limits of hardware components
Economical reasons – more complex = more expensive
Performance limits – double frequency <> double performance
Large applications – demand too much memory & time
Advantages:
Increasing speed & optimizing resources utilization
Disadvantages:
Complex programming models – difficult development
3
Several applications on parallel processing:
4
Architectures of Parallel Computer:
PVP (Parallel Vector Processor)
SMP (Symmetric Multiprocessor)
MPP (Massively Parallel Processor)
COW (Cluster of Workstation)
DSM (Distributed Shared Memory)
Towards Inexpensive Supercomputing: Cluster Computing is the
Commodity Supercomputing 58.8%
5
A computer cluster is a group of linked computers, working together closely
so that in many respects they form a single computer. The components of a
cluster are commonly, but not always, connected to each other through fast
local area networks. Clusters are usually deployed to improve performance
and/or availability over that provided by a single computer, while typically
being much more cost-effective than single computers of comparable speed or
availability.
8
( Ethernet,Myrinet….)
+ (MPI)
Master: or service node or front node ( used to interact with users and manage the
cluster )
Nodes : a group of computers (computing node s)( keyboard, mouse, floppy,
video…)
Communications between nodes on an interconnect network platform ( Ethernet,
Myrinet….)
In order for the master and node computers to communicate, some sort message 9
passing control structure is required. MPI,(Message Passing Interface) is the most
commonly used such control.
To construct Beowulf cluster there are four distinct but interrelated areas of
consideration:
10
Brief Technical Parameters:
Three of the most commonly used OS are including kernel level support
for parallel programming:
Windows NT/2000
mainly used to build a High Availability Cluster or a NLB(Network Local
Balance) Cluster, provide services such as Database ,
File/Print,Web,Stream Media .Support 2-4 SMP or 32 processors. Hardly
used to build a Science Computing Cluster
Redhat Linux
The most used OS for a Beowulf Cluster.
provides High Performance and Scalability / High Reliability / Low Cost
( get freely and uses inexpensive commodity hardware )
SUN Solaris
12
Uses expensive and unpopular hardware
Network Platform
Some design considerations for the interconnect network are:
• Fast Ethernet (100Mbps): low cost / min latency: 80µs
• Maximum bandwidth
• Minimum latency
13
Parallel Environment
Two of the most commonly used Parallel Interface Libraries:
o PVM (Parallel Virtual Machine)
o MPI (Message Passing Interface)
Parallel Interface Libraries: provide a group of communication interface libraries
that support message passing. Users can call these libraries directly in their
Fortran and C programs.
14
20
What is Bioinformatics:
Also called “biomedical computing”. The application of computer
science and technology to problems in the biomolecular sciences.
Cluster Uses:
The Beowulf cluster computing design is been used by parallel
processing computer systems projects to build a powerful computer
that could assist in Bioinformatics research and data analysis.
In bioinformatics Clusters are used to run DNA string matching
algorithms or to run protein folding applications. It also use a
computer algorithm known as BLAST,(Basic Local Alignment
Search Tool), to analyze massive sets of DNA sequences for research
into Bioinformatics.
21
For Bioinfomatics MPICH2 is used which is an
implementation of MPI that was specifically designed
for use with cluster computing systems and parallel
processing. It is an open source set of libraries for
various high level programming languages that give
programmers tools to easily control how large problems
are broken apart and distributed to the various computers
in a cluster.
22
Protein folding and how is folding linked to disease?
Proteins are biology's workhorses -- its "nanomachines." Before proteins can
carry out these important functions, they assemble themselves, or "fold." The
process of protein folding, while critical and fundamental to virtually all of
biology, in many ways remains a mystery.
when proteins do not fold correctly:
Alzheimer's, Mad Cow
How?
Folding@home is a distributed computing project -- people from throughout
the world download and run software to band together to make one of the
largest supercomputers in the world. In each computer Folding@home uses
novel computational methods coupled to distributed computing, to simulate
problems.
the results get back to the main server as you computer will automatically
upload the results to the server each time it finishes a work unit, and
download a new job at that time.
23
24
Brief Architectural information:
27
Academically:
1000 nodes Beowulf Cluster Sy
stem
Used for genetic algorithm res
earch by John Coza, Stanford
University
28
http://www.pssclabs.com/products_powerwulf.asp
29
Parallel Environments are used in building clusters?
Two of the most commonly used Parallel Interface Libraries:
PVM (Parallel Virtual Machine)
MPI (Message passing Interface)
http://www.wikipedia.org/
http://folding.stanford.edu/
http://www.pssclabs.com/
http://www.genetic-programming.com
http://www.lam-mpi.org/mpi/mpi_top10.php 31