0% found this document useful (0 votes)
75 views16 pages

Parallel Matlab: Laboratory For Computational Cell Biology

The document discusses parallelizing Matlab code. It describes the current serial processing used in the lab and two options for parallelization - the Matlab Distributed Computing Toolbox and MatlabMPI. Both allow chopping jobs into tasks that can run across multiple nodes. The toolbox uses a job manager and requires installation, while MatlabMPI is open source and can run jobs across specified machines. The document provides examples and compares the advantages and disadvantages of each.

Uploaded by

baruaeee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views16 pages

Parallel Matlab: Laboratory For Computational Cell Biology

The document discusses parallelizing Matlab code. It describes the current serial processing used in the lab and two options for parallelization - the Matlab Distributed Computing Toolbox and MatlabMPI. Both allow chopping jobs into tasks that can run across multiple nodes. The toolbox uses a job manager and requires installation, while MatlabMPI is open source and can run jobs across specified machines. The document provides examples and compares the advantages and disadvantages of each.

Uploaded by

baruaeee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Parallel Matlab

Laboratory for Computational Cell Biology


Overview

● The present situation in the lab


● What types of computation can easily be paral-
lelized?
● Mathworks Distributed Computing Toolbox
● MatlabMPI
The Present Situation

● Many algorithms we use are running into com-


putational limits on a single workstation. On av-
erage, processing a movie takes 2 to 4 hours of
processing time.
● Using dedicated 64-bit computing servers have
relieved things a little bit, but processing is still
done in a serial fashion.
● Many algorithm could potentially profit of doing
parallel processing.
Programs Suitable for
Parallelization

● For the kind of 'embarrasingly parallel' pro-


cessing jobs we are talking about here, programs
that process data that can easily be chopped up
into pieces are the easiest to port. Examples:
– Edge detection on separate frames of a movie
– Image segmentation
– Pure numerical functions that run in parallel with dif-
ferent data sets (e.g. Monte Carlo simulations)
● Execution time >> Communication time
Parallel Matlab Toolkits

● Matlab Distributed Computing Toolbox – a


Mathworks implementation of a job submission
engine

● MatlabMPI – an implementation based on the


open Message Passing Interface standard
Matlab Distributed Computing
Toolbox
● Components:
– Distributed Computing Engine (DCE):
● Jobmanager (1), Workers (many)
– DC Toolbox (client sessions)
– Shared file system
● Basic program flow:
– Find a jobmanager and create a jobmanager object
– Create a job
– Set file dependencies
– Chop the problem up in pieces and create tasks
– Submit the job
– Wait for and gather results
Example Program (add numbers)

% Find job manager


jm = findResource('jobmanager','name','myjobmanager1');

% Create job object


job = createJob(jm,'FileDependencies', ...
{'/public/disttoolbox/adding.m'});

% Create tasks
createTask(job, @adding, 1, {1,2});
createTask(job, @adding, 1, {3,4});
createTask(job, @adding, 1, {5,6});

submit(job);

% Get the results


results = getAllOutputArguments(job);

% Do something with the results.


for i = 1 : size(results,1)
disp(results{i});
end
Evaluating a Function

● If you just want to quickly evaluate a function


without going through the whole circus of setting
up jobs and tasks use 'dfeval'
results = dfeval (@sum, {[1 1] [2 2] [3 3]})
results =
2
4
6
MatlabMPI (1)

● Components:
– MatlabMPI toolbox
– Shared file system (can be $HOME/matlab)

● Execute a MatlabMPI program:


– machines = {'lccbws001' 'lccbws002' 'lccbws003'};
– MatMPI_Delete_all;
– eval(MPI_Run('mpi_program', nr_of_nodes,
machine_list));

● Example programs in:


– /usr/local/matlab/MatlabMPI
MatlabMPI (2)

● Basic program flow:


– Initialize MPI
– Create a communicator
– Get size and rank of the local node
– If rank = 0 // master node
– Gather data
– Send data to compute nodes
– Probe for and receive results
– else // compute node
– Receive data from master node
– Process data and calculate results
– Send results to master node
– End
– Finalize MPI
Example program (print hostname
of compute nodes)
% Initialize MPI.
MPI_Init;

% Create communicator object.


comm = MPI_COMM_WORLD;

% Get size and rank for the node that the program is running on
comm_size = MPI_Comm_size(comm);
my_rank = MPI_Comm_rank(comm);

% Create a unique tag id for this message


tag = 1;

if my_rank == 0 % master node


% Get all the strings
for k = 1:comm_size-1
message = MPI_Recv(k, tag, comm);

disp(message);
end
else % compute node
% Send string to rank 0 node
[status,result] = unix('hostname');
message = ['Message from node ' num2str(my_rank) ': ' result];
MPI_Send(0, tag, comm, message);
end

% Finalize Matlab MPI.


MPI_Finalize;
Advantages

● Matlab DC Toolbox
– Fully integrated in Matlab by Mathworks itself.
– Jobmanager will take care of the distribution of jobs on the cluster, no need
for the user to think about this.
– Fairly easy to use and program tasks although the user still has to take care
of cutting the data / computation up in pieces.
– Easy to quickly evaluate a function in a parallel fashion.
● MatlabMPI
– Free
– Source code is available, which means we can extend functionality ourselves
– Complies with open MPI standard, which means that Matlab code can be
easily ported to other languages (C, C++, Fortran, etc)
– Not necessary to start up separate workers on nodes before running a job:
machines can be used in an adhoc fashion and can be determined at the time
of execution.
Disadvantages (1)

● Matlab DC Toolbox
– Comes at a cost per node
– Workers have to be started on all compute nodes. If these crash (and that
happened quite a number of times during experiments) they have to be
restarted manually.
– Every file used in the program should be identified and stored in a
fileDependencies array. These files should be accessible from every node
(shared filesystem). This includes m-files as well since the remote matlab
sessions are not started under the username who started the matlab client
session.
– Proprietary technology which makes it hard to port code to another language.
Disadvantages (2)

● MatlabMPI
– Matlab/toolbox licenses needed per node that the computation is run on.
– The user has to supply a list of machine names on which the computation
will run.
– More complex to program than the Matlab Toolbox (at least that was the first
impression) because of the rank number scheme used.
– A shared file system between computation nodes is needed to store
communication messages. Since we already have many of these shares, this
is not a real concern.
– Before each run of MPI_Run the shared directory has to be cleaned up by
using a call to the function MatMPI_Delete_all. Again this is not something
difficult to do, but has to be thought of.
– One has to be aware of potential deadlocks when waiting to receive
messages. This can be done by using timeouts and the MPI_Probe call. In
standard MPI non-blocking receives are available, but these have not (yet)
been implemented in MatlabMPI.
Conclusion

● Our lab has a couple of algorithms that will be suitable for


parallelization (identify with the group).
● Both the DC Toolbox and MatlabMPI Toolbox will be
usable for experimentation.
● The MPI toolbox has the advantage that people can start
using it right away, since nothing has to be installed. Type
'help MatlabMPI' for a list of functions that you can use.
● The DC Toolbox has to be purchased and installed on all
machines first.
References

● http://www.mathworks.com/products/distribtb
● http://www.ll.mit.edu/MatlabMPI/
● http://www.mpi-forum.org/

You might also like