User's Guide: Parallel Computing Toolbox™ 5
User's Guide: Parallel Computing Toolbox™ 5
User’s Guide
How to Contact MathWorks
www.mathworks.com Web
comp.soft-sys.matlab Newsgroup
www.mathworks.com/contact_TS.html Technical Support
508-647-7000 (Phone)
508-647-7001 (Fax)
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patents for more information.
Revision History
November 2004 Online only New for Version 1.0 (Release 14SP1+)
March 2005 Online only Revised for Version 1.0.1 (Release 14SP2)
September 2005 Online only Revised for Version 1.0.2 (Release 14SP3)
November 2005 Online only Revised for Version 2.0 (Release 14SP3+)
March 2006 Online only Revised for Version 2.0.1 (Release 2006a)
September 2006 Online only Revised for Version 3.0 (Release 2006b)
March 2007 Online only Revised for Version 3.1 (Release 2007a)
September 2007 Online only Revised for Version 3.2 (Release 2007b)
March 2008 Online only Revised for Version 3.3 (Release 2008a)
October 2008 Online only Revised for Version 4.0 (Release 2008b)
March 2009 Online only Revised for Version 4.1 (Release 2009a)
September 2009 Online only Revised for Version 4.2 (Release 2009b)
March 2010 Online only Revised for Version 4.3 (Release 2010a)
September 2010 Online only Revised for Version 5.0 (Release 2010b)
Contents
Getting Started
1
Product Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
v
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Using Objects in parfor Loops . . . . . . . . . . . . . . . . . . . . . . . 2-13
Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Compatibility with Earlier Versions of MATLAB
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
vi Contents
Interactive Parallel Computation with pmode
4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
Connectivity Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
Hostname Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
Socket Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
vii
Indexing into a Codistributed Array . . . . . . . . . . . . . . . . . . 5-15
2-Dimensional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 5-17
Programming Overview
6
Product Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Toolbox and Server Components . . . . . . . . . . . . . . . . . . . . . 6-3
viii Contents
Running Tasks That Call Simulink Software . . . . . . . . . . . 6-30
Using the pause Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30
Transmitting Large Amounts of Data . . . . . . . . . . . . . . . . . 6-30
Interrupting a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30
Speeding Up a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-31
ix
Programming Distributed Jobs
8
Using a Local Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
Creating and Running Jobs with a Local Scheduler . . . . . . 8-2
Local Scheduler Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7
x Contents
Using the Generic Scheduler Interface . . . . . . . . . . . . . . 9-8
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
Coding in the Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
GPU Computing
10
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
xi
Providing C Prototype Input . . . . . . . . . . . . . . . . . . . . . . . . 10-21
Complete Kernel Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
Object Reference
11
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Schedulers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Function Reference
13
Parallel Code Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Parallel Code on a MATLAB Pool . . . . . . . . . . . . . . . . . . . . 13-2
xii Contents
Configuration, Input, and Output . . . . . . . . . . . . . . . . . . . . 13-2
Interactive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-11
Property Reference
15
Job Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
Schedulers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3
Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7
xiii
Workers .......................................... 15-8
Glossary
Index
xiv Contents
1
Getting Started
Product Overview
Parallel Computing Toolbox™ software allows you to offload work from one
MATLAB® session (the client) to other MATLAB sessions, called workers.
You can use multiple workers to take advantage of parallel processing. You
can use a local worker to keep your MATLAB client session free for interactive
work, or with MATLAB® Distributed Computing Server™ you can take
advantage of another computer’s speed.
Several MathWorks products now offer built-in support for the parallel
computing products, without requiring extra coding. For the current
list of these products and their parallel functionality, see
http://www.mathworks.com/products/parallel-computing/builtin-parallel-suppo
1-2
Typical Use Cases
1-3
1 Getting Started
Batch Jobs
When working interactively in a MATLAB session, you can offload work to
a MATLAB worker session to run as a batch job. The command to perform
this job is asynchronous, which means that your client MATLAB session is
not blocked, and you can continue your own interactive session while the
MATLAB worker is busy evaluating your code. The MATLAB worker can run
either on the same machine as the client, or if using MATLAB Distributed
Computing Server, on a remote cluster machine.
1-4
Introduction to Parallel Solutions
1 Suppose your code includes a loop to create a sine wave and plot the
waveform:
for i=1:1024
A(i) = sin(i*2*pi/1024);
end
plot(A)
2 To interactively run code that contains a parallel loop, you first open a
MATLAB pool. This reserves a collection of MATLAB worker sessions
to run your loop iterations. The MATLAB pool can consist of MATLAB
sessions running on your local machine or on a remote cluster:
3 With the MATLAB pool reserved, you can modify your code to run your loop
in parallel by using a parfor statement:
parfor i=1:1024
A(i) = sin(i*2*pi/1024);
end
plot(A)
1-5
1 Getting Started
The only difference in this loop is the keyword parfor instead of for.
After the loop runs, the results look the same as those generated from
the previous for-loop.
MATLAB®
workers
parfor
MATLAB®
client
4 When you are finished with your code, close the MATLAB pool and release
the workers:
matlabpool close
The examples in this section run on three local workers. With parallel
configurations, you can control how many workers run your loops, and
whether the workers are local or remote. For more information on parallel
configurations, see “Programming with User Configurations” on page 6-16.
1-6
Introduction to Parallel Solutions
edit mywave
for i=1:1024
A(i) = sin(i*2*pi/1024);
end
4 Use the batch command in the MATLAB Command Window to run your
script on a separate MATLAB worker:
job = batch('mywave')
MATLAB® MATLAB®
client worker
batch
5 The batch command does not block MATLAB, so you must wait for the job
to finish before you can retrieve and view its results:
wait(job)
6 The load command transfers variables from the workspace of the worker to
the workspace of the client, where you can view the results:
load(job, 'A')
plot(A)
destroy(job)
1-7
1 Getting Started
edit mywave
parfor i=1:1024
A(i) = sin(i*2*pi/1024);
end
4 Run the script in MATLAB with the batch command as before, but indicate
that the script should use a MATLAB pool for the parallel loop:
This command specifies that three workers (in addition to the one running
the batch script) are to evaluate the loop iterations. Therefore, this example
uses a total of four local workers, including the one worker running the
batch script.
1-8
Introduction to Parallel Solutions
MATLAB® MATLAB®
client workers
batch
parfor
wait(job)
load(job, 'A')
plot(A)
The results look the same as before, however, there are two important
differences in execution:
• The work of defining the parfor-loop and accumulating its results are
offloaded to another MATLAB session (batch).
• The loop iterations are distributed from one MATLAB worker to another
set of workers running simultaneously (matlabpool and parfor), so the
loop might run faster than having only one worker execute it.
destroy(job)
1-9
1 Getting Started
Distributed Arrays
The workers in a MATLAB pool communicate with each other, so you can
distribute an array among the labs. Each lab contains part of the array, and
all the labs are aware of which portion of the array each lab has.
When you are finished and have no further need of data from the labs, you can
close the MATLAB pool. Data on the labs does not persist from one instance
of a MATLAB pool to another.
matlabpool close
1-10
Introduction to Parallel Solutions
Composites
Following an spmd statement, in the client context, the values from the
block are accessible, even though the data is actually stored on the labs. On
the client, these variables are called Composite objects. Each element of a
composite is a symbol referencing the value (data) on a lab in the pool. Note
that because a variable might not be defined on every lab, a Composite might
have undefined elements.
Continuing with the example from above, on the client, the Composite R has
one element for each lab:
The line above retrieves the data from lab 3 to assign the value of X. The
following code sends data to lab 3:
X = X + 2;
R{3} = X; % Send the value of X from the client to lab 3.
If the MATLAB pool remains open between spmd statements and the same
labs are used, the data on each lab persists from one spmd statement to
another.
spmd
R = R + labindex % Use values of R from previous spmd.
end
A typical use case for spmd is to run the same code on a number of labs, each
of which accesses a different set of data. For example:
spmd
INP = load(['somedatafile' num2str(labindex) '.mat']);
RES = somefun(INP)
end
Then the values of RES on the labs are accessible from the client as RES{1}
from lab 1, RES{2} from lab 2, etc.
1-11
1 Getting Started
When you are finished with all spmd execution and have no further need of
data from the labs, you can close the MATLAB pool.
matlabpool close
Although data persists on the labs from one spmd block to another as long as
the MATLAB pool remains open, data does not persist from one instance of
a MATLAB pool to another.
For more information about using distributed arrays, spmd, and Composites,
see Chapter 3, “Single Program Multiple Data (spmd)”.
ver
When you enter this command, MATLAB displays information about the
version of MATLAB you are running, including a list of all toolboxes installed
on your system and their version numbers.
1-12
2
Introduction
The basic concept of a parfor-loop in MATLAB software is the same as the
standard MATLAB for-loop: MATLAB executes a series of statements (the
loop body) over a range of values. Part of the parfor body is executed on the
MATLAB client (where the parfor is issued) and part is executed in parallel
on MATLAB workers. The necessary data on which parfor operates is sent
from the client to workers, where most of the computation happens, and the
results are sent back to the client and pieced together.
2-2
Getting Started with parfor
You cannot use a parfor-loop when an iteration in your loop depends on the
results of other iterations. Each iteration must be independent of all others.
Since there is a communications cost involved in a parfor-loop, there might
be no advantage to using one when you have only a small number of simple
calculations. The example of this section are only to illustrate the behavior
of parfor-loops, not necessarily to demonstrate the applications best suited
to them.
To begin the examples of this section, allocate local MATLAB workers for
the evaluation of your loop iterations:
matlabpool
2-3
2 Parallel for-Loops (parfor)
Creating a parfor-Loop
The safest assumption about a parfor-loop is that each iteration of the
loop is evaluated by a different MATLAB worker. If you have a for-loop in
which all iterations are completely independent of each other, this loop is a
good candidate for a parfor-loop. Basically, if one iteration depends on the
results of another iteration, these iterations are not independent and cannot
be evaluated in parallel, so the loop does not lend itself easily to conversion
to a parfor-loop.
clear A clear A
for i = 1:8 parfor i = 1:8
A(i) = i; A(i) = i;
end end
A A
Notice that each element of A is equal to its index. The parfor-loop works
because each element depends only upon its iteration of the loop, and upon
no other iterations. for-loops that merely repeat such independent tasks are
ideally suited candidates for parfor-loops.
2-4
Getting Started with parfor
However, suppose you use a nonindexed variable inside the loop, or a variable
whose indexing does not depend on the loop variable i. Try these examples
and notice the values of d and i afterward:
clear A clear A
d = 0; i = 0; d = 0; i = 0;
for i = 1:4 parfor i = 1:4
d = i*2; d = i*2;
A(i) = d; A(i) = d;
end end
A A
d d
i i
Although the elements of A come out the same in both of these examples, the
value of d does not. In the for-loop above on the left, the iterations execute
in sequence, so afterward d has the value it held in the last iteration of the
loop. In the parfor-loop on the right, the iterations execute in parallel, not in
sequence, so it would be impossible to assign d a definitive value at the end
of the loop. This also applies to the loop variable, i. Therefore, parfor-loop
behavior is defined so that it does not affect the values d and i outside the
loop at all, and their values remain the same before and after the loop.
So, a parfor-loop requires that each iteration be independent of the other
iterations, and that all code that follows the parfor-loop not depend on the
loop iteration sequence.
2-5
2 Parallel for-Loops (parfor)
Reduction Assignments
The next two examples show parfor-loops using reduction assignments. A
reduction is an accumulation across iterations of a loop. The example on the
left uses x to accumulate a sum across 10 iterations of the loop. The example
on the right generates a concatenated array, 1:10. In both of these examples,
the execution order of the iterations on the workers does not matter: while
the workers calculate individual results, the client properly accumulates or
assembles the final loop result.
x = 0; x2 = [];
parfor i = 1:10 n = 10;
x = x + i; parfor i = 1:n
end x2 = [x2, i];
x end
x2
If the loop iterations operate in random sequence, you might expect the
concatenation sequence in the example on the right to be nonconsecutive.
However, MATLAB recognizes the concatenation operation and yields
deterministic results.
f = zeros(1,50);
f(1) = 1;
f(2) = 2;
parfor n = 3:50
f(n) = f(n-1) + f(n-2);
end
When you are finished with your loop examples, clear your workspace and
close or release your pool of workers:
clear
matlabpool close
2-6
Getting Started with parfor
Displaying Output
When running a parfor-loop on a MATLAB pool, all command-line output
from the workers displays in the client Command Window, except output from
variable assignments. Because the workers are MATLAB sessions without
displays, any graphical output (for example, figure windows) from the pool
does not display at all.
2-7
2 Parallel for-Loops (parfor)
Programming Considerations
In this section...
“MATLAB Path” on page 2-8
“Error Handling” on page 2-8
“Limitations” on page 2-9
“Using Objects in parfor Loops” on page 2-13
“Performance Considerations” on page 2-13
“Compatibility with Earlier Versions of MATLAB Software” on page 2-14
MATLAB Path
All workers executing a parfor-loop must have the same MATLAB path
configuration as the client, so that they can execute any functions called in the
body of the loop. Therefore, whenever you use cd, addpath, or rmpath on the
client, it also executes on all the workers, if possible. For more information,
see the matlabpool reference page. When the workers are running on a
different platform than the client, use the function pctRunOnAll to properly
set the MATLAB path on all workers.
Error Handling
When an error occurs during the execution of a parfor-loop, all iterations
that are in progress are terminated, new ones are not initiated, and the loop
terminates.
Errors and warnings produced on workers are annotated with the worker ID
and displayed in the client’s Command Window in the order in which they
are received by the client MATLAB.
2-8
Programming Considerations
Limitations
parfor i=1:n
...
a = f(5);
...
end
Transparency
The body of a parfor-loop must be transparent, meaning that all references to
variables must be “visible” (i.e., they occur in the text of the program).
X = 5;
parfor ii = 1:4
eval('X');
end
2-9
2 Parallel for-Loops (parfor)
MATLAB does successfully execute eval and evalc statements that appear in
functions called from the parfor body.
B = @sin;
for ii = 1:100
A(ii) = B(ii);
end
B = @sin;
parfor ii = 1:100
A(ii) = feval(B, ii);
end
2-10
Programming Considerations
Nondistributable Functions
If you use a function that is not strictly computational in nature (e.g., input,
plot, keyboard) in a parfor-loop or in any function called by a parfor-loop,
the behavior of that function occurs on the worker. The results might include
hanging the worker process or having no visible effect at all.
Nested Functions
The body of a parfor-loop cannot make reference to a nested function.
However, it can call a nested function by means of a function handle.
Nested parfor-Loops
The body of a parfor-loop cannot contain another parfor-loop. But it can call
a function that contains another parfor-loop.
for ii = 1:10
for jj = 1:10
A(ii, jj) = 10*ii + jj - 1;
end
end
disp(A)
You decide that you want to convert the outer loop into a parfor, so that each
parfor iteration can perform the inner for-loop. In principle, the nested loops
look like this.
2-11
2 Parallel for-Loops (parfor)
parfor ii = 1:10
for jj = 1:10
A(ii, jj) = 10*ii + jj - 1;
end
end
However, as written, this cannot work, because other than the parfor index,
the index variables of a sliced array must be broadcast variables and they
cannot be varied inside a parfor iteration. That is, the variable jj as used for
indexing into A is not allowed to vary inside the parfor iteration. (For more
information, see “Classification of Variables” on page 2-15.)
parfor ii = 1:10
v = zeros(1,10); % initialize row vector
for jj = 1:10
v(jj) = 10*ii + jj - 1;
end
A(ii, :) = v;
end
disp(A)
2-12
Programming Considerations
P-Code Scripts
You can call P-code script files from within a parfor-loop, but P-code script
cannot contain a parfor-loop.
Performance Considerations
Slicing Arrays
If a variable is initialized before a parfor-loop, then used inside the
parfor-loop, it has to be passed to each MATLAB worker evaluating the loop
iterations. Only those variables used inside the loop are passed from the
client workspace. However, if all occurrences of the variable are indexed by
the loop variable, each worker receives only the part of the array it needs. For
more information, see “Where to Create Arrays” on page 2-30.
2-13
2 Parallel for-Loops (parfor)
The past and current functionality of the parfor keyword is outlined in the
following table:
2-14
Advanced Topics
Advanced Topics
In this section...
“About Programming Notes” on page 2-15
“Classification of Variables” on page 2-15
“Improving Performance” on page 2-30
Classification of Variables
• “Overview” on page 2-15
• “Loop Variable” on page 2-16
• “Sliced Variables” on page 2-17
• “Broadcast Variables” on page 2-21
• “Reduction Variables” on page 2-21
• “Temporary Variables” on page 2-28
Overview
When a name in a parfor-loop is recognized as referring to a variable, it is
classified into one of the following categories. A parfor-loop generates an
2-15
2 Parallel for-Loops (parfor)
Classification Description
Loop Serves as a loop index for arrays
Sliced An array whose segments are operated on by different
iterations of the loop
Broadcast A variable defined before the loop whose value is used
inside the loop, but never assigned inside the loop
Reduction Accumulates a value across iterations of the loop,
regardless of iteration order
Temporary Variable created inside the loop, but unlike sliced or
reduction variables, not available outside the loop
Loop Variable
The following restriction is required, because changing i in the parfor body
invalidates the assumptions MATLAB makes about communication between
the client and workers.
2-16
Advanced Topics
This example attempts to modify the value of the loop variable i in the body
of the loop, and thus is invalid:
parfor i = 1:n
i = i + 1;
a(i) = i;
end
Sliced Variables
A sliced variable is one whose value can be broken up into segments, or slices,
which are then operated on separately by workers and by the MATLAB client.
Each iteration of the loop works on a different slice of the array. Using sliced
variables is important because this type of variable can reduce communication
between the client and workers. Only those slices needed by a worker are sent
to it, and only when it starts working on a particular range of indices.
parfor i = 1:length(A)
B(i) = f(A(i));
end
2-17
2 Parallel for-Loops (parfor)
Type of First-Level Indexing. For a sliced variable, the first level of indexing is
enclosed in either parentheses, (), or braces, {}.
This table lists the forms for the first level of indexing for arrays sliced and
not sliced.
After the first level, you can use any type of valid MATLAB indexing in the
second and further levels.
The variable A shown here on the left is not sliced; that shown on the right
is sliced:
A.q{i,12} A{i,12}.q
The variable A shown here on the left is not sliced because A is indexed by i
and i+1 in different places; that shown on the right is sliced:
The example above on the right shows some occurrences of a sliced variable
with first-level parenthesis indexing and with first-level brace indexing in the
same loop. This is acceptable.
Form of Indexing. Within the list of indices for a sliced variable, one of these
indices is of the form i, i+k, i-k, k+i, or k-i, where i is the loop variable and
2-18
Advanced Topics
With i as the loop variable, the A variables shown here on the left are not
sliced; those on the right are sliced:
A(i+f(k),j,:,3) A(i+k,j,:,3)
A(i,20:30,end) A(i,:,end)
A(i,:,s.field1) A(i,:,k)
When you use other variables along with the loop variable to index an array,
you cannot set these variables inside the loop. In effect, such variables are
constant over the execution of the entire parfor statement. You cannot
combine the loop variable with itself to form an index expression.
A(i,:) = [];
A(end + 1) = i;
The reason A is not sliced in either case is because changing the shape of a
sliced array would violate assumptions governing communication between
the client and workers.
Sliced Input and Output Variables. All sliced variables have the
characteristics of being input or output. A sliced variable can sometimes have
both characteristics. MATLAB transmits sliced input variables from the client
to the workers, and sliced output variables from workers back to the client. If
a variable is both input and output, it is transmitted in both directions.
2-19
2 Parallel for-Loops (parfor)
a = 0;
z = 0;
r = rand(1,10);
parfor ii = 1:10
a = ii;
z = z + ii;
b(ii) = r(ii);
end
parfor ii = 1:n
if someCondition
A(ii) = 32;
else
A(ii) = 17;
end
loop code that uses A(ii)
end
A = 1:10;
parfor ii = 1:10
if rand < 0.5
A(ii) = 0;
end
end
2-20
Advanced Topics
Broadcast Variables
A broadcast variable is any variable other than the loop variable or a sliced
variable that is not affected by an assignment inside the loop. At the start of
a parfor-loop, the values of any broadcast variables are sent to all workers.
Although this type of variable can be useful or even essential, broadcast
variables that are large can cause a lot of communication between client and
workers. In some cases it might be more efficient to use temporary variables
for this purpose, creating and assigning them inside the loop.
Reduction Variables
MATLAB supports an important exception, called reductions, to the rule that
loop iterations must be independent. A reduction variable accumulates a
value that depends on all the iterations together, but is independent of the
iteration order. MATLAB allows reduction variables in parfor-loops.
X = X + expr X = expr + X
X = X - expr See Associativity in Reduction
Assignments in “Further
Considerations with Reduction
Variables” on page 2-23
X = X .* expr X = expr .* X
X = X * expr X = expr * X
X = X & expr X = expr & X
X = X | expr X = expr | X
X = [X, expr] X = [expr, X]
X = [X; expr] X = [expr; X]
X = {X, expr} X = {expr, X}
X = {X; expr} X = {expr; X}
X = min(X, expr) X = min(expr, X)
X = max(X, expr) X = max(expr, X)
2-21
2 Parallel for-Loops (parfor)
If the loop were a regular for-loop, the variable X in each iteration would get
its value either before entering the loop or from the previous iteration of the
loop. However, this concept does not apply to parfor-loops:
Required (static): For any reduction variable, the same reduction function
or operation must be used in all reduction assignments for that variable.
The parfor-loop on the left is not valid because the reduction assignment uses
+ in one instance, and [,] in another. The parfor-loop on the right is valid:
2-22
Advanced Topics
The parfor-loop on the left below is not valid because the order of items in
the concatenation is not consistent throughout the loop. The parfor-loop
on the right is valid:
2-23
2 Parallel for-Loops (parfor)
If f is a variable, then for all practical purposes its value at run time is
a function handle. However, this is not strictly required; as long as the
right-hand side can be evaluated, the resulting value is stored in X.
The parfor-loop below on the left will not execute correctly because the
statement f = @times causes f to be classified as a temporary variable and
therefore is cleared at the beginning of each iteration. The parfor on the
right is correct, because it does not assign to f inside the loop:
f = @(x,k)x * k; f = @(x,k)x * k;
parfor i = 1:n parfor i = 1:n
a = f(a,i); a = f(a,i);
% loop body continued % loop body continued
f = @times; % Affects f end
end
Note that the operators && and || are not listed in the table in “Reduction
Variables” on page 2-21. Except for && and ||, all the matrix operations of
MATLAB have a corresponding function f, such that u op v is equivalent
to f(u,v). For && and ||, such a function cannot be written because u&&v
and u||v might or might not evaluate v, but f(u,v) always evaluates v
before calling f. This is why && and || are excluded from the table of allowed
reduction assignments for a parfor-loop.
2-24
Advanced Topics
To be associative, the function f must satisfy the following for all a, b, and c:
f(a,f(b,c)) = f(f(a,b),c)
The classification rules for variables, including reduction variables, are purely
syntactic. They cannot determine whether the f you have supplied is truly
associative or not. Associativity is assumed, but if you violate this, different
executions of the loop might result in different answers.
For example, the statement on the left yields 1, while the statement on the
right returns 1 + eps:
With the exception of the minus operator (-), all the special cases listed in the
table in “Reduction Variables” on page 2-21 have a corresponding (perhaps
approximately) associative function. MATLAB calculates the assignment
X = X - expr by using X = X + (-expr). (So, technically, the function for
calculating this reduction assignment is plus, not minus.) However, the
assignment X = expr - X cannot be written using an associative function,
which explains its exclusion from the table.
f(a,b) = f(b,a)
2-25
2 Parallel for-Loops (parfor)
Recommended: Except in the cases of *, [,], [;], {,}, and {;}, the
function f of a reduction assignment should be commutative. If f is not
commutative, different executions of the loop might result in different
answers.
f(e,a) = a = f(a,e)
Examples of identity elements for some functions are listed in this table.
2-26
Advanced Topics
There is no way to specify the identity element for a function. In these cases,
the behavior of parfor is a little less efficient than it is for functions with a
known identity element, but the results are correct.
function mc = comparemax(A, B)
% Custom reduction function for 2-element vector input
Inside the loop, each iteration calls the reduction function (comparemax),
passing in a pair of 2-element vectors:
• The accumulated maximum and its iteration index (this is the reduction
variable, cummax)
2-27
2 Parallel for-Loops (parfor)
If the data value of the current iteration is greater than the maximum in
cummmax, the function returns a vector of the new value and its iteration
number. Otherwise, the function returns the existing maximum and its
iteration number.
The code for the loop looks like the following, with each iteration calling the
reduction function comparemax to compare its own data [dat i] to that
already accumulated in cummax.
Temporary Variables
A temporary variable is any variable that is the target of a direct, nonindexed
assignment, but is not a reduction variable. In the following parfor-loop, a
and d are temporary variables:
a = 0;
z = 0;
r = rand(1,10);
parfor i = 1:10
a = i; % Variable a is temporary
z = z + i;
if i <= 5
d = 2*a; % Variable d is temporary
end
end
2-28
Advanced Topics
MATLAB does not send temporary variables back to the client. A temporary
variable in the context of the parfor statement has no effect on a variable
with the same name that exists outside the loop, again in contrast to ordinary
for-loops.
b = true;
parfor i = 1:n
if b && some_condition(i)
do_something(i);
b = false;
end
...
end
2-29
2 Parallel for-Loops (parfor)
s = 0;
parfor i = 1:n
s = s + f(i);
...
if (s > whatever)
...
end
end
If the only occurrences of s were the two in the first statement of the body, it
would be classified as a reduction variable. But in this example, s is not a
reduction variable because it has a use outside of reduction assignments in
the line s > whatever. Because s is the target of an assignment (in the first
statement), it is a temporary, so MATLAB issues an error about this fact, but
points out the possible connection with reduction.
Note that if you change parfor to for, the use of s outside the reduction
assignment relies on the iterations being performed in a particular order. The
point here is that in a parfor-loop, it matters that the loop “does not care”
about the value of a reduction variable as it goes along. It is only after the
loop that the reduction value becomes usable.
Improving Performance
2-30
Advanced Topics
Try the following examples running a matlabpool locally, and notice the
difference in time execution for each loop. First open a local matlabpool:
matlabpool
Then enter the following examples. (If you are viewing this documentation in
the MATLAB help browser, highlight each segment of code below, right-click,
and select Evaluate Selection in the context menu to execute the block in
MATLAB. That way the time measurement will not include the time required
to paste or type.)
tic; tic;
n = 200; n = 200;
M = magic(n); parfor i = 1:n
R = rand(n); M = magic(n);
parfor i = 1:n R = rand(n);
A(i) = sum(M(i,:).*R(n+1-i,:)); A(i) = sum(M(i,:).*R(n+1-i,:));
end end
toc toc
2-31
2 Parallel for-Loops (parfor)
2-32
3
Introduction
The single program multiple data (spmd) language construct allows seamless
interleaving of serial and parallel programming. The spmd statement lets
you define a block of code to run simultaneously on multiple labs. Variables
assigned inside the spmd statement on the labs allow direct access to their
values from the client by reference via Composite objects.
The “multiple data” aspect means that even though the spmd statement runs
identical code on all labs, each lab can have different, unique data for that
code. So multiple data sets can be accommodated by multiple labs.
Typical applications appropriate for spmd are those that require running
simultaneous execution of a program on multiple data sets, when
communication or synchronization is required between the labs. Some
common cases are:
3-2
Using spmd Constructs
• Programs that take a long time to execute — spmd lets several labs compute
solutions simultaneously.
• Programs operating on large data sets — spmd lets the data be distributed
to multiple labs.
To begin the examples of this section, allocate local MATLAB labs for the
evaluation of your spmd statement:
matlabpool
If you do not want to use default settings, you can specify in the matlabpool
statement which configuration or how many labs to use. For example, to use
only three labs with your default configuration, type:
matlabpool 3
matlabpool MyConfigName
matlabpool size
This command returns a value indicating the number of labs in the current
pool. If the command returns 0, there is currently no pool open.
3-3
3 Single Program Multiple Data (spmd)
Note If there is no MATLAB pool open, an spmd statement runs locally in the
MATLAB client without any parallel execution, provided you have Parallel
Computing Toolbox software installed. In other words, it runs in your client
session as though it were a single lab.
When you are finished using a MATLAB pool, close it with the command:
matlabpool close
spmd
<statements>
end
spmd (n)
<statements>
end
This statement requires that n labs run the spmd code. n must be less than
or equal to the number of labs in the open MATLAB pool. If the pool is large
enough, but n labs are not available, the statement waits until enough labs
are available. If n is 0, the spmd statement uses no labs, and runs locally on
the client, the same as if there were not a pool currently open.
spmd (m, n)
<statements>
end
In this case, the spmd statement requires a minimum of m labs, and it uses
a maximum of n labs.
3-4
Using spmd Constructs
matlabpool
spmd (3)
R = rand(4,4);
end
matlabpool close
Note All subsequent examples in this chapter assume that a MATLAB pool is
open and remains open between sequences of spmd statements.
Unlike a parfor-loop, the labs used for an spmd statement each have a unique
value for labindex. This lets you specify code to be run on only certain labs,
or to customize execution, usually for the purpose of accessing unique data.
spmd (3)
if labindex==1
R = rand(9,9);
else
R = rand(4,4);
end
end
Load unique data on each lab according to labindex, and use the same
function on each lab to compute a result from the data:
spmd (3)
labdata = load(['datafile_' num2str(labindex) '.ascii'])
result = MyFunction(labdata)
end
3-5
3 Single Program Multiple Data (spmd)
control communications between the labs, transfer data between them, and
use codistributed arrays among them. For a list of toolbox functions that
facilitate these capabilities, see the Function Reference sections “Interlab
Communication Within a Parallel Job” on page 13-9 and “Distributed and
Codistributed Arrays” on page 13-3.
spmd (3)
RR = rand(30, codistributor());
end
Each lab has a 30-by-10 segment of the codistributed array RR. For
more information about codistributed arrays, see Chapter 5, “Math with
Codistributed Arrays”.
Displaying Output
When running an spmd statement on a MATLAB pool, all command-line
output from the workers displays in the client Command Window. Because
the workers are MATLAB sessions without displays, any graphical output (for
example, figure windows) from the pool does not display at all.
3-6
Accessing Data with Composites
Introduction
Composite objects in the MATLAB client session let you directly access data
values on the labs. Most often you assigned these variables within spmd
statements. In their display and usage, Composites resemble cell arrays.
There are two ways to create Composites:
After the spmd statement, those data values are accessible on the client as
Composites. Composite objects resemble cell arrays, and behave similarly.
On the client, a Composite has one element per lab. For example, suppose
you open a MATLAB pool of three local workers and run an spmd statement
on that pool:
3-7
3 Single Program Multiple Data (spmd)
3 5 7
4 9 2
MM{2}
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
A variable might not be defined on every lab. For the labs on which a variable
is not defined, the corresponding Composite element has no value. Trying to
read that element throws an error.
spmd
if labindex > 1
HH = rand(4);
end
end
HH
Lab 1: No data
Lab 2: class = double, size = [4 4]
Lab 3: class = double, size = [4 4]
You can also set values of Composite elements from the client. This causes a
transfer of data, storing the value on the appropriate lab even though it is not
executed within an spmd statement:
MM{3} = eye(4);
Now when you do enter an spmd statement, the value of the variable MM on
lab 3 is as set:
spmd
if labindex == 3, MM, end
end
Lab 3:
MM =
1 0 0 0
3-8
Accessing Data with Composites
0 1 0 0
0 0 1 0
0 0 0 1
Data transfers from lab to client when you explicitly assign a variable in the
client workspace using a Composite element:
8 1 6
3 5 7
4 9 2
matlabpool close
The values are retained on the labs until the corresponding Composites are
cleared on the client, or until the MATLAB pool is closed. The following
example illustrates data value lifespan with spmd blocks, using a pool of four
workers:
3-9
3 Single Program Multiple Data (spmd)
spmd
AA = labindex; % Initial setting
end
AA(:) % Composite
[1]
[2]
[3]
[4]
spmd
AA = AA * 2; % Multiply existing value
end
AA(:) % Composite
[2]
[4]
[6]
[8]
clear AA % Clearing in client also clears on labs
matlabpool close
PP = Composite()
By default, this creates a Composite with an element for each lab in the
MATLAB pool. You can also create Composites on only a subset of the labs in
the pool. See the Composite reference page for more details. The elements of
the Composite can now be set as usual on the client, or as variables inside
an spmd statement. When you set an element of a Composite, the data is
immediately transferred to the appropriate lab:
for ii = 1:numel(PP)
3-10
Accessing Data with Composites
PP{ii} = ii;
end
3-11
3 Single Program Multiple Data (spmd)
Distributing Arrays
In this section...
“Distributed Versus Codistributed Arrays” on page 3-12
“Creating Distributed Arrays” on page 3-12
“Creating Codistributed Arrays” on page 3-13
3-12
Distributing Arrays
The first two of these techniques do not involve spmd in creating the array,
but you can see how spmd might be used to manipulate arrays created this
way. For example:
3-13
3 Single Program Multiple Data (spmd)
3-14
Programming Considerations
Programming Considerations
In this section...
“MATLAB Path” on page 3-15
“Error Handling” on page 3-15
“Limitations” on page 3-15
MATLAB Path
All labs executing an spmd statement must have the same MATLAB path
configuration as the client, so that they can execute any functions called in
their common block of code. Therefore, whenever you use cd, addpath, or
rmpath on the client, it also executes on all the labs, if possible. For more
information, see the matlabpool reference page. When the labs are running
on a different platform than the client, use the function pctRunOnAll to
properly set the MATLAB path on all labs.
Error Handling
When an error occurs on a lab during the execution of an spmd statement, the
error is reported to the client. The client tries to interrupt execution on all
labs, and throws an error to the user.
Errors and warnings produced on labs are annotated with the lab ID and
displayed in the client’s Command Window in the order in which they are
received by the MATLAB client.
Limitations
Transparency
The body of an spmd statement must be transparent, meaning that all
references to variables must be “visible” (i.e., they occur in the text of the
program).
3-15
3 Single Program Multiple Data (spmd)
X = 5;
spmd
eval('X');
end
To clear a specific variable from a worker, clear its Composite from the client
workspace. Alternatively, you can free up most of the memory used by a
variable by setting its value to empty, presumably when it is no longer needed
in your spmd statement:
spmd
<statements....>
X = [];
end
MATLAB does successfully execute eval and evalc statements that appear in
functions called from the spmd body.
Nested Functions
Inside a function, the body of an spmd statement cannot make any direct
reference to a nested function. However, it can call a nested function by
means of a variable defined as a function handle to the nested function.
Because the spmd body executes on workers, variables that are updated by
nested functions called inside an spmd statement do not get updated in the
workspace of the outer function.
3-16
Programming Considerations
Anonymous Functions
The body of an spmd statement cannot define an anonymous function.
However, it can reference an anonymous function by means of a function
handle.
Nested parfor-Loops
The body of a parfor-loop cannot contain an spmd statement, and an spmd
statement cannot contain a parfor-loop.
3-17
3 Single Program Multiple Data (spmd)
3-18
4
Interactive Parallel
Computation with pmode
Introduction
pmode lets you work interactively with a parallel job running simultaneously
on several labs. Commands you type at the pmode prompt in the Parallel
Command Window are executed on all labs at the same time. Each lab
executes the commands in its own workspace on its own variables.
The way the labs remain synchronized is that each lab becomes idle when it
completes a command or statement, waiting until all the labs working on this
job have completed the same statement. Only when all the labs are idle, do
they then proceed together to the next pmode command.
In contrast to spmd, pmode provides a desktop with a display for each lab
running the job, where you can enter commands, see results, access each lab’s
workspace, etc. What pmode does not let you do is to freely interleave serial
and parallel work, like spmd does. When you exit your pmode session, its
job is effectively destroyed, and all information and data on the labs is lost.
Starting another pmode session always begins from a clean state.
4-2
Getting Started with pmode
This starts four local labs, creates a parallel job to run on those labs, and
opens the Parallel Command Window.
You can control where the command history appears. For this exercise, the
position is set by clicking Window > History Position > Above Prompt,
but you can set it according to your own preference.
2 To illustrate that commands at the pmode prompt are executed on all labs,
ask for help on a function.
4-3
4 Interactive Parallel Computation with pmode
3 Set a variable at the pmode prompt. Notice that the value is set on all
the labs.
P>> x = pi
4 A variable does not necessarily have the same value on every lab. The
labindex function returns the ID particular to each lab working on this
parallel job. In this example, the variable x exists with a different value in
the workspace of each lab.
P>> x = labindex
5 Return the total number of labs working on the current parallel job with
the numlabs function.
4-4
Getting Started with pmode
P>> segment = [1 2; 3 4; 5 6]
4-5
4 Interactive Parallel Computation with pmode
7 Assign a unique value to the array on each lab, dependent on the lab
number. With a different value on each lab, this is a variant array.
8 Until this point in the example, the variant arrays are independent, other
than having the same name. Use the codistributed.build function to
aggregate the array segments into a coherent array, distributed among
the labs.
This combines four separate 3-by-2 arrays into one 3-by-8 codistributed
array. The codistributor1d object indicates that the array is distributed
along its second dimension (columns), with 2 columns on each of the four
labs. On each lab, segment provided the data for the local portion of the
whole array.
9 Now, when you operate on the codistributed array whole, each lab handles
the calculations on only its portion, or segment, of the array, not the whole
array.
4-6
Getting Started with pmode
10 Although the codistributed array allows for operations on its entirety, you
can use the getLocalPart function to access the portion of a codistributed
array on a particular lab.
11 If you need the entire array in one workspace, use the gather function.
Notice, however, that this gathers the entire array into the workspaces of
all the labs. See the gather reference page for the syntax to gather the
array into the workspace of only one lab.
12 Because the labs ordinarily do not have displays, if you want to perform
any graphical tasks involving your data, such as plotting, you must do this
from the client workspace. Copy the array to the client workspace by typing
the following commands in the MATLAB (client) Command Window.
4-7
4 Interactive Parallel Computation with pmode
whos combined
combined
4-8
Getting Started with pmode
4-9
4 Interactive Parallel Computation with pmode
four labs start on your local machine and a parallel job is created to run on
them. The first time you run pmode with this configuration, you get a tiled
display of the four labs.
Show commands
in lab output
Lab outputs
in tiled
arrangement
Command
history
Command
line
4-10
Parallel Command Window
The Parallel Command Window offers much of the same functionality as the
MATLAB desktop, including command line, output, and command history.
When you select one or more lines in the command history and right-click,
you see the following context menu.
You have several options for how to arrange the tiles showing your lab
outputs. Usually, you will choose an arrangement that depends on the format
of your data. For example, the data displayed until this point in this section,
as in the previous figure, is distributed by columns. It might be convenient to
arrange the tiles side by side.
4-11
4 Interactive Parallel Computation with pmode
Alternatively, if the data is distributed by rows, you might want to stack the
lab tiles vertically. For the following figure, the data is reformatted with
the command
Select vertical
arrangement
Drag to adjust
tile sizes
4-12
Parallel Command Window
You can control the relative positions of the command window and the lab
output. The following figure shows how to set the output to display beside the
input, rather than above it.
1. Select tabbed
display
3. Select labs
shown in
this tab
2. Select tab
4-13
4 Interactive Parallel Computation with pmode
You can have multiple labs send their output to the same tile or tab. This
allows you to have fewer tiles or tabs than labs.
In this case, the window provides shading to help distinguish the outputs
from the various labs.
Multiple labs
in same tab
4-14
Running pmode on a Cluster
If you omit <config-name>, pmode uses the default configuration (see the
defaultParallelConfig reference page).
For details on all the command options, see the pmode reference page.
4-15
4 Interactive Parallel Computation with pmode
Plotting in pmode
Because the labs running a job in pmode are MATLAB sessions without
displays, they cannot create plots or other graphic outputs on your desktop.
1 Use the gather function to collect the entire array into the workspace of
one lab.
2 Transfer the whole array from any lab to the MATLAB client with pmode
lab2client.
Create a 1-by-100 codistributed array of 0s. With four labs, each lab has a
1-by-25 segment of the whole array.
P>> D = zeros(1,100,codistributor1d())
Use a for-loop over the distributed range to populate the array so that it
contains a sine wave. Each lab does one-fourth of the array.
Gather the array so that the whole array is contained in the workspace of
lab 1.
4-16
Plotting in pmode
Transfer the array from the workspace of lab 1 to the MATLAB client
workspace, then plot the array from the client. Note that both commands are
entered in the MATLAB (client) Command Window.
pmode lab2client P 1
plot(P)
This is not the only way to plot codistributed data. One alternative method,
especially useful when running noninteractive parallel jobs, is to plot the data
to a file, then view it from a later MATLAB session.
4-17
4 Interactive Parallel Computation with pmode
Displaying a GUI
The labs that run the tasks of a parallel job are MATLAB sessions without
displays. As a result, these labs cannot display graphical tools and so you
cannot do things like plotting from within pmode. The general approach to
accomplish something graphical is to transfer the data into the workspace
of the MATLAB client using
4-18
Troubleshooting
Troubleshooting
In this section...
“Connectivity Testing” on page 4-19
“Hostname Resolution” on page 4-19
“Socket Connections” on page 4-19
Connectivity Testing
For testing connectivity between the client machine and the machines of
your compute cluster, you can use Admin Center. For more information
about Admin Center, including how to start it and how to test connectivity,
see “Admin Center” in the MATLAB Distributed Computing Server
documentation.
Hostname Resolution
If a lab cannot resolve the hostname of the computer running the MATLAB
client, use pctconfig to change the hostname by which the client machine
advertises itself.
Socket Connections
If a lab cannot open a socket connection to the MATLAB client, try the
following:
4-19
4 Interactive Parallel Computation with pmode
4-20
5
This chapter describes the distribution or partition of data across several labs,
and the functionality provided for operations on that data in spmd statements,
parallel jobs, and pmode. The sections are as follows.
Array Types
In this section...
“Introduction” on page 5-2
“Nondistributed Arrays” on page 5-2
“Codistributed Arrays” on page 5-4
Introduction
All built-in data types and data structures supported by MATLAB software
are also supported in the MATLAB parallel computing environment. This
includes arrays of any number of dimensions containing numeric, character,
logical values, cells, or structures; but not function handles or user-defined
objects. In addition to these basic building blocks, the MATLAB parallel
computing environment also offers different types of arrays.
Nondistributed Arrays
When you create a nondistributed array, MATLAB constructs a separate array
in the workspace of each lab and assigns a common variable to them. Any
operation performed on that variable affects all individual arrays assigned
to it. If you display from lab 1 the value assigned to this variable, all labs
respond by showing the array of that name that resides in their workspace.
Replicated Arrays
A replicated array resides in the workspaces of all labs, and its size and
content are identical on all labs. When you create the array, MATLAB assigns
it to the same variable on all labs. If you display in spmd the value assigned
to this variable, all labs respond by showing the same array.
5-2
Array Types
Variant Arrays
A variant array also resides in the workspaces of all labs, but its content
differs on one or more labs. When you create the array, MATLAB assigns
a different value to the same variable on all labs. If you display the value
assigned to this variable, all labs respond by showing their version of the
array.
A replicated array can become a variant array when its value becomes unique
on each lab.
spmd
B = magic(3); %replicated on all labs
B = B + labindex; %now a variant array, different on each lab
end
5-3
5 Math with Codistributed Arrays
Private Arrays
A private array is defined on one or more, but not all labs. You could create
this array by using the lab index in a conditional statement, as shown here:
spmd
if labindex >= 3, A = magic(3) + labindex - 1, end
end
LAB 1 LAB 2 LAB 3 LAB 4
| | |
A is | A is | 10 3 8 | 11 4 9
undefined | undefined | 5 7 9 | 6 8 10
| 6 11 4 | 7 12 5
Codistributed Arrays
With replicated and variant arrays, the full content of the array is stored
in the workspace of each lab. Codistributed arrays, on the other hand, are
partitioned into segments, with each segment residing in the workspace of a
different lab. Each lab has its own array segment to work with. Reducing the
size of the array that each lab has to store and process means a more efficient
use of memory and faster processing, especially for large data sets.
This example distributes a 3-by-10 replicated array A over four labs. The
resulting array D is also 3-by-10 in size, but only a segment of the full array
resides on each lab.
spmd
A = [11:20; 21:30; 31:40];
D = codistributed(A);
getLocalPart(D)
end
5-4
Working with Codistributed Arrays
For example, to distribute an 80-by-1000 array to four labs, you can partition
it either by columns, giving each lab an 80-by-250 segment, or by rows, with
each lab getting a 20-by-1000 segment. If the array dimension does not divide
evenly over the number of labs, MATLAB partitions it as evenly as possible.
5-5
5 Math with Codistributed Arrays
spmd
A = zeros(80, 1000);
D = codistributed(A);
Lab 1: This lab stores D(:,1:250).
Lab 2: This lab stores D(:,251:500).
Lab 3: This lab stores D(:,501:750).
Lab 4: This lab stores D(:,751:1000).
end
Each lab has access to all segments of the array. Access to the local segment
is faster than to a remote segment, because the latter requires sending and
receiving data between labs and thus takes more time.
>> spmd
II = codistributed.eye(8)
end
Lab 1:
This lab stores II(:,1:2).
LocalPart: [8x2 double]
Codistributor: [1x1 codistributor1d]
Lab 2:
This lab stores II(:,3:4).
LocalPart: [8x2 double]
Codistributor: [1x1 codistributor1d]
Lab 3:
This lab stores II(:,5:6).
LocalPart: [8x2 double]
Codistributor: [1x1 codistributor1d]
Lab 4:
This lab stores II(:,7:8).
LocalPart: [8x2 double]
Codistributor: [1x1 codistributor1d]
5-6
Working with Codistributed Arrays
To see the actual data in the local segment of the array, use the getLocalPart
function.
5-7
5 Math with Codistributed Arrays
The next line uses the codistributed function to construct a single 4-by-8
matrix D that is distributed along the second dimension of the array:
spmd
D = codistributed(A);
getLocalPart(D)
end
Arrays A and D are the same size (4-by-8). Array A exists in its full size on
each lab, while only a segment of array D exists on each lab.
5-8
Working with Codistributed Arrays
whos
Name Size Bytes Class
See the codistributed function reference page for syntax and usage
information.
This example creates a 4-by-250 variant array A on each of four labs and then
uses codistributor to distribute these segments across four labs, creating a
4-by-1000 codistributed array. Here is the variant array, A:
spmd
A = [1:250; 251:500; 501:750; 751:1000] + 250 * (labindex - 1);
end
Now combine these segments into an array that is distributed by the first
dimension (rows). The array is now 16-by-250, with a 4-by-250 segment
residing on each lab:
spmd
D = codistributed.build(A, codistributor1d(1,[4 4 4 4],[16 250]))
end
Lab 1:
This lab stores D(1:4,:).
5-9
5 Math with Codistributed Arrays
whos
Name Size Bytes Class
You could also use replicated arrays in the same fashion, if you wanted
to create a codistributed array whose segments were all identical to start
with. See the codistributed function reference page for syntax and usage
information.
5-10
Working with Codistributed Arrays
Local Arrays
That part of a codistributed array that resides on each lab is a piece of a
larger array. Each lab can work on its own segment of the common array, or
it can make a copy of that segment in a variant or private array of its own.
This local copy of a codistributed array segment is called a local array.
spmd(4)
A = [1:80; 81:160; 161:240];
D = codistributed(A);
size(D)
L = getLocalPart(D);
size(L)
end
3 80
3 20
Each lab recognizes that the codistributed array D is 3-by-80. However, notice
that the size of the local part, L, is 3-by-20 on each lab, because the 80 columns
of D are distributed over four labs.
5-11
5 Math with Codistributed Arrays
Continuing the previous example, take the local variant arrays L and put
them together as segments to build a new codistributed array X.
spmd
codist = codistributor1d(2,[20 20 20 20],[3 80]);
X = codistributed.build(L, codist);
size(X)
end
3 80
Dimension: 2
Partition: [20 20 20 20]
5-12
Working with Codistributed Arrays
spmd
C = getCodistributor(X);
part = C.Partition
dim = C.Dimension
end
spmd
D = rand(8, 16, codistributor());
size(getLocalPart(D))
5-13
5 Math with Codistributed Arrays
end
8 4
spmd
X = redistribute(D, codistributor1d(1));
size(getLocalPart(X))
end
2 16
5-14
Working with Codistributed Arrays
Lab 1:
4 3
Lab 2:
4 3
Lab 3:
4 2
Lab 4:
4 2
Restore the undistributed segments to the full array form by gathering the
segments:
With codistributed arrays, these values are not so easily obtained. For
example, the second segment of an array (that which resides in the workspace
of lab 2) has a starting index that depends on the array distribution. For a
200-by-1000 array with a default distribution by columns over four labs, the
starting index on lab 2 is 251. For a 1000-by-200 array also distributed by
columns, that same index would be 51. As for the ending index, this is not
given by using the end keyword, as end in this case refers to the end of the
entire array; that is, the last subscript of the final segment. The length of
each segment is also not given by using the length or size functions, as they
only return the length of the entire array.
5-15
5 Math with Codistributed Arrays
The MATLAB colon operator and end keyword are two of the basic tools
for indexing into nondistributed arrays. For codistributed arrays, MATLAB
provides a version of the colon operator, called codistributed.colon. This
actually is a function, not a symbolic operator like colon.
Note When using arrays to index into codistributed arrays, you can use
only replicated or codistributed arrays for indexing. The toolbox does not
check to ensure that the index is replicated, as that would require global
communications. Therefore, the use of unsupported variants (such as
labindex) to index into codistributed arrays might create unexpected results.
If you run this code on a pool of four workers you get this result:
Lab 1:
Element is in position 225000 on lab 1.
If you run this code on a pool of five workers you get this result:
Lab 2:
Element is in position 25000 on lab 2.
5-16
Working with Codistributed Arrays
Notice if you use a pool of a different size, the element ends up in a different
location on a different lab, but the same code can be used to locate the element.
2-Dimensional Distribution
As an alternative to distributing by a single dimension of rows or columns,
you can distribute a matrix by blocks using '2dbc' or two-dimensional
block-cyclic distribution. Instead of segments that comprise a number of
complete rows or columns of the matrix, the segments of the codistributed
array are 2-dimensional square blocks.
For example, consider a simple 8-by-8 matrix with ascending element values.
You can create this array in an spmd statement, parallel job, or pmode. This
example uses pmode for a visual display.
P>> A = reshape(1:64, 8, 8)
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
8 16 24 32 40 48 56 64
Suppose you want to distribute this array among four labs, with a 4-by-4
block as the local part on each lab. In this case, the lab grid is a 2-by-2
arrangement of the labs, and the block size is a square of four elements on
a side (i.e., each block is a 4-by-4 square). With this information, you can
define the codistributor object:
5-17
5 Math with Codistributed Arrays
Now you can use this codistributor object to distribute the original matrix:
This distributes the array among the labs according to this scheme:
LAB 1 LAB 2
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
8 16 24 32 40 48 56 64
LAB 3 LAB 4
If the lab grid does not perfectly overlay the dimensions of the codistributed
array, you can still use '2dbc' distribution, which is block cyclic. In this case,
you can imagine the lab grid being repeatedly overlaid in both dimensions
until all the original matrix elements are included.
Using the same original 8-by-8 matrix and 2-by-2 lab grid, consider a block
size of 3 instead of 4, so that 3-by-3 square blocks are distributed among the
labs. The code looks like this:
The first “row” of the lab grid is distributed to lab 1 and lab 2, but that contains
only six of the eight columns of the original matrix. Therefore, the next two
columns are distributed to lab 1. This process continues until all columns in
5-18
Working with Codistributed Arrays
the first rows are distributed. Then a similar process applies to the rows as
you proceed down the matrix, as shown in the following distribution scheme:
Original matrix
1 9 17 25 33 41 49 57
LAB 1 LAB 2
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59 LAB 1 LAB 2
4 12 20 28 36 44 52 60
LAB 3 LAB 4
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62 LAB 3 LAB 4
7 15 23 31 39 47 55 63
8 16 24 32 40 48 56 64
The diagram above shows a scheme that requires four overlays of the lab
grid to accommodate the entire original matrix. The following pmode session
shows the code and resulting distribution of data to each of the labs:
5-19
5 Math with Codistributed Arrays
5-20
Using a for-Loop Over a Distributed Range (for-drange)
Parallelizing a for-Loop
If you already have a coarse-grained application to perform, but you do
not want to bother with the overhead of defining jobs and tasks, you can
take advantage of the ease-of-use that pmode provides. Where an existing
program might take hours or days to process all its independent data sets,
you can shorten that time by distributing these independent computations
over your cluster.
The following changes make this code operate in parallel, either interactively
in spmd or pmode, or in a parallel job:
5-21
5 Math with Codistributed Arrays
results(i) = processDataSet(i);
end
res = gather(results, 1);
if labindex == 1
plot(1:numDataSets, res);
print -dtiff -r300 fig.tiff;
save \\central\myResults\today.mat res
end
Note that the length of the for iteration and the length of the codistributed
array results need to match in order to index into results within a for
drange loop. This way, no communication is required between the labs. If
results was simply a replicated array, as it would have been when running
the original code in parallel, each lab would have assigned into its part of
results, leaving the remaining parts of results 0. At the end, results would
have been a variant, and without explicitly calling labSend and labReceive
or gcat, there would be no way to get the total results back to one (or all) labs.
When using the load function, you need to be careful that the data files are
accessible to all labs if necessary. The best practice is to use explicit paths to
files on a shared file system.
Correspondingly, when using the save function, you should be careful to only
have one lab save to a particular file (on a shared file system) at a time. Thus,
wrapping the code in if labindex == 1 is recommended.
Because results is distributed across the labs, this example uses gather to
collect the data onto lab 1.
A lab cannot plot a visible figure, so the print function creates a viewable
file of the plot.
5-22
Using a for-Loop Over a Distributed Range (for-drange)
To illustrate this characteristic, you can try the following example, in which
one for loop works, but the other does not.
At the pmode prompt, create two codistributed arrays, one an identity matrix,
the other set to zeros, distributed across four labs.
D = eye(8, 8, codistributor())
E = zeros(8, 8, codistributor())
By default, these arrays are distributed by columns; that is, each of the
four labs contains two columns of each array. If you use these arrays in a
for-drange loop, any calculations must be self-contained within each lab. In
other words, you can only perform calculations that are limited within each
lab to the two columns of the arrays that the labs contain.
For example, suppose you want to set each column of array E to some multiple
of the corresponding column of array D:
This statement sets the j-th column of E to j times the j-th column of D. In
effect, while D is an identity matrix with 1s down the main diagonal, E has
the sequence 1, 2, 3, etc., down its main diagonal.
This works because each lab has access to the entire column of D and the
entire column of E necessary to perform the calculation, as each lab works
independently and simultaneously on two of the eight columns.
Suppose, however, that you attempt to set the values of the columns of E
according to different columns of D:
This method fails, because when j is 2, you are trying to set the second
column of E using the third column of D. These columns are stored in different
labs, so an error occurs, indicating that communication between the labs is
not allowed.
5-23
5 Math with Codistributed Arrays
Restrictions
To use for-drange on a codistributed array, the following conditions must
exist:
To loop over all elements in the array, you can use for-drange on the
dimension of distribution, and regular for-loops on all other dimensions. The
following example executes in an spmd statement running on a MATLAB
pool of 4 labs:
spmd
PP = codistributed.zeros(6,8,12);
RR = rand(6,8,12,codistributor())
% Default distribution:
% by third dimension, evenly across 4 labs.
for ii = 1:6
for jj = 1:8
for kk = drange(1:12)
PP(ii,jj,kk) = RR(ii,jj,kk) + labindex;
end
end
end
end
PP
5-24
Using MATLAB® Functions on Codistributed Arrays
help codistributed/functionname
For example,
help codistributed/normest
The following table lists the enhanced MATLAB functions that operate on
codistributed arrays.
5-25
5 Math with Codistributed Arrays
5-26
6
Programming Overview
This chapter provides information you need for programming with Parallel
Computing Toolbox software. Further details of evaluating functions in
a cluster, programming distributed jobs, and programming parallel jobs
are covered in later chapters. This chapter describes features common to
programming all kinds of jobs. The sections are as follows.
Product Introduction
In this section...
“Overview” on page 6-2
“Toolbox and Server Components” on page 6-3
Overview
Parallel Computing Toolbox and MATLAB Distributed Computing Server
software let you solve computationally and data-intensive problems using
MATLAB and Simulink on multicore and multiprocessor computers. Parallel
processing constructs such as parallel for-loops and code blocks, distributed
arrays, parallel numerical algorithms, and message-passing functions let
you implement task-parallel and data-parallel algorithms at a high level
in MATLAB without programming for specific hardware and network
architectures.
A job is some large operation that you need to perform in your MATLAB
session. A job is broken down into segments called tasks. You decide how best
to divide your job into tasks. You could divide your job into identical tasks,
but tasks do not have to be identical.
The MATLAB session in which the job and its tasks are defined is called the
client session. Often, this is on the machine where you program MATLAB.
The client uses Parallel Computing Toolbox software to perform the definition
of jobs and tasks. MATLAB Distributed Computing Server software is the
product that performs the execution of your job by evaluating each of its tasks
and returning the result to your client session.
The job manager is the part of the engine that coordinates the execution of
jobs and the evaluation of their tasks. The job manager distributes the tasks
for evaluation to the server’s individual MATLAB sessions called workers.
Use of the MathWorks® job manager is optional; the distribution of tasks to
workers can also be performed by a third-party scheduler, such as Microsoft®
Windows HPC Server (including CCS) or Platform LSF® schedulers.
See the “Glossary” on page Glossary-1 for definitions of the parallel computing
terms used in this manual.
6-2
Product Introduction
MATLAB Worker
MATLAB Distributed
Computing Server
MATLAB Worker
MATLAB Distributed
Computing Server
Each worker is given a task from the running job by the job manager, executes
the task, returns the result to the job manager, and then is given another
task. When all tasks for a running job have been assigned to workers, the job
manager starts running the next job with the next available worker.
6-3
6 Programming Overview
Note For testing your application locally or other purposes, you can configure
a single computer as client, worker, and job manager. You can also have more
than one worker session or more than one job manager session on a machine.
Task
Worker
Job Results
Client
All Results Scheduler Task
or Worker
Job Manager Results
Job
Client Task
All Results
Worker
Results
6-4
Product Introduction
Worker
Scheduler
Client or Worker
Job Manager
Worker
Client
Client
Worker
Scheduler
or Worker
Client Job Manager
Worker
Local Scheduler
A feature of Parallel Computing Toolbox software is the ability to run a local
scheduler and up to eight workers on the client machine, so that you can run
distributed and parallel jobs without requiring a remote cluster or MATLAB
Distributed Computing Server software. In this case, all the processing
required for the client, scheduling, and task evaluation is performed on the
same computer. This gives you the opportunity to develop, test, and debug
your distributed or parallel application before running it on your cluster.
Third-Party Schedulers
As an alternative to using the MathWorks job manager, you can use a
third-party scheduler. This could be a Microsoft Windows HPC Server
(including CCS), Platform LSF scheduler, PBS Pro® scheduler, TORQUE
scheduler, mpiexec, or a generic scheduler.
6-5
6 Programming Overview
6-6
Product Introduction
mdce Service
If you are using the MathWorks job manager, every machine that hosts a
worker or job manager session must also run the mdce service.
The mdce service controls the worker and job manager sessions and recovers
them when their host machines crash. If a worker or job manager machine
crashes, when the mdce service starts up again (usually configured to start
at machine boot time), it automatically restarts the job manager and worker
sessions to resume their sessions from before the system crash. These
processes are covered more fully in the MATLAB Distributed Computing
Server System Administrator’s Guide.
When you create a job in the client session, the job actually exists in the job
manager or in the scheduler’s data location. The client session has access to
the job through a job object. Likewise, tasks that you define for a job in the
client session exist in the job manager or in the scheduler’s data location, and
you access them through task objects.
6-7
6 Programming Overview
This example runs the job as three tasks in three separate MATLAB worker
sessions, reporting the results back to the session from which you ran dfeval.
For more information about dfeval and in what circumstances you can use it,
see Chapter 7, “Evaluating Functions in a Cluster”.
6-8
Using Parallel Computing Toolbox™ Software
This example illustrates the basic steps in creating and running a job that
contains a few simple tasks. Each task evaluates the sum function for an
input array.
1 Identify a scheduler. Use findResource to indicate that you are using the
local scheduler and create the object sched, which represents the scheduler.
(For more information, see “Find a Job Manager” on page 8-8 or “Creating
and Running Jobs” on page 8-21.)
2 Create a job. Create job j on the scheduler. (For more information, see
“Create a Job” on page 8-10.)
j = createJob(sched)
3 Create three tasks within the job j. Each task evaluates the sum of the
array that is passed as an input argument. (For more information, see
“Create Tasks” on page 8-12.)
4 Submit the job to the scheduler queue for evaluation. The scheduler then
distributes the job’s tasks to MATLAB workers that are available for
evaluating. The local scheduler actually starts a MATLAB worker session
for each task, up to eight at one time. (For more information, see “Submit a
Job to the Job Queue” on page 8-13.)
submit(j);
5 Wait for the job to complete, then get the results from all the tasks of the
job. (For more information, see “Retrieve the Job’s Results” on page 8-14.)
waitForState(j)
results = getAllOutputArguments(j)
results =
[2]
[4]
[6]
6-9
6 Programming Overview
6 Destroy the job. When you have the results, you can permanently remove
the job from the scheduler’s data location.
destroy(j)
Getting Help
• “Command-Line Help” on page 6-10
• “Help Browser” on page 6-11
Command-Line Help
You can get command-line help on the toolbox object functions by using the
syntax
help distcomp.objectType/functionName
help distcomp.job/createTask
The available choices for objectType are listed in the Chapter 11, “Object
Reference”.
Listing Available Functions. To find the functions available for each type of
object, type
methods(obj)
For example, to see the functions available for job manager objects, type
jm = findResource('scheduler','type','jobmanager');
methods(jm)
job1 = createJob(jm)
methods(job1)
6-10
Using Parallel Computing Toolbox™ Software
task1 = createTask(job1,1,@rand,{3})
methods(task1)
Help Browser
You can open the Help browser with the doc command. To open the browser
on a specific reference page for a function or property, type
doc distcomp/RefName
where RefName is the name of the function or property whose reference page
you want to read.
For example, to open the Help browser on the reference page for the
createJob function, type
doc distcomp/createJob
To open the Help browser on the reference page for the UserData property,
type
doc distcomp/UserData
6-11
6 Programming Overview
1 Run code normally on your local machine. First verify all your
functions so that as you progress, you are not trying to debug the functions
and the distribution at the same time. Run your functions in a single
instance of MATLAB software on your local computer. For programming
suggestions, see “Techniques for Improving Performance” in the MATLAB
documentation.
3 Modify your code for division. Decide how you want your code divided.
For a distributed job, determine how best to divide it into tasks; for
example, each iteration of a for-loop might define one task. For a parallel
job, determine how best to take advantage of parallel processing; for
example, a large array can be distributed across all your labs.
4 Use pmode to develop parallel functionality. Use pmode with the local
scheduler to develop your functions on several workers (labs) in parallel.
As you progress and use pmode on the remote cluster, that might be all you
need to complete your work.
6-12
Program Development Guidelines
batch execution, and in the case of a distributed job, that its computations
are properly divided into tasks.
6 Run the distributed job on only one cluster node. Run your
distributed job with one task to verify that remote distribution is
working between your client and the cluster, and to verify file and path
dependencies.
Note The client session of MATLAB must be running the Java™ Virtual
Machine (JVM™) to use Parallel Computing Toolbox software. Do not start
MATLAB with the -nojvm flag.
6-13
6 Programming Overview
The figure below illustrated the stages in the life cycle of a job. In the
job manager, the jobs are shown categorized by their state. Some of
the functions you use for managing a job are createJob, submit, and
getAllOutputArguments.
Worker
Scheduler
Queued Running Worker
Job Job
Pending Job Job Worker
Job Job
Job Job Worker
submit
createJob Job Finished
Job Job Worker
Client Job
getAllOutputArguments Job
Job
Stages of a Job
The following table describes each stage in the life cycle of a job.
6-14
Life Cycle of a Job
6-15
6 Programming Overview
Defining Configurations
Configurations allow you to define certain parameters and properties, then
have your settings applied when creating objects in the MATLAB client. The
functions that support the use of configurations are
6-16
Programming with User Configurations
The first time you open the Configurations Manager, it lists only one
configuration called local, which at first is the default configuration and has
only default settings.
6-17
6 Programming Overview
6-18
Programming with User Configurations
Note Fields that indicate “Unset” or that you leave empty, have no effect
on their property values. For those properties, the configuration does not
alter the values that you had set programmatically before applying the
configuration.
6-19
6 Programming Overview
3 In the Jobs tab, enter 4 and 4 for the maximum and minimum number of
workers. This specifies that for jobs using this configuration, they require
at least four workers and use no more than four workers. Therefore, the
job runs on exactly four workers, even if it has to wait until four workers
are available before starting.
4 Click OK to save the configuration and close the dialog box. Your new
configuration now appears in the Configurations Manager listing.
6-20
Programming with User Configurations
d Edit the description field to change its text to My job manager and any
workers.
6-21
6 Programming Overview
6 Select the Jobs tab. Remove the 4 from each of the fields for minimum and
maximum workers.
You now have two configurations that differ only in the number of workers
required for running a job.
6-22
Programming with User Configurations
After creating a job, you can apply either configuration to that job as a way
of specifying how many workers it should run on.
2 Click File > Export. (Alternatively, you can right-click the configuration
in the listing and select Export.)
3 In the Export Configuration dialog box, specify a location and name for the
file. The default file name is the same as the name of the configuration it
contains, with a .mat extension appended; these do not need to be the
same, so you can alter the names if you want to.
6-23
6 Programming Overview
2 In the Import Configuration dialog box, browse to find the .mat file for the
configuration you want to import. Select the file and click Import.
Note MATLAB Compiler does not support configurations that use the local
scheduler or local workers.
Validating Configurations
The Configurations Manager includes a tool for validating configurations.
6-24
Programming with User Configurations
While the tests are running, the Configurations Manager displays their
progress as shown here.
You can adjust the timeout allowed for each stage of the testing. If your
cluster does not have enough workers available to perform the validation, the
test times out and returns a failure.
6-25
6 Programming Overview
Note You cannot run a configuration validation if you have a MATLAB pool
open.
The configuration listing displays the overall validation result for each
configuration. The following figure shows overall validation results for one
configuration that passed and one that failed. The selected configuration
is the one that failed.
For each stage of the validation testing, you can click Details to get more
information about that stage. This information includes any error messages,
debug logs, and other data that might be useful in diagnosing problems or
helping to determine proper configuration or network settings.
The Configuration Validation tool keeps the test results available until the
current MATLAB session closes.
6-26
Programming with User Configurations
defaultParallelConfig('MyJMconfig1')
matlabpool open
Finding Schedulers
When executing the findResource function, you can use configurations to
identify a particular scheduler and apply property values. For example,
This command finds the scheduler defined by the settings of the configuration
named our_jobmanager and sets property values on the scheduler object
based on settings in the configuration. The advantage of configurations is
that you can alter your scheduler choices without changing your MATLAB
application code, merely by changing the configuration settings
6-27
6 Programming Overview
For a third-party scheduler such as Platform LSF, the command might look
like
Creating Jobs
Because the properties of scheduler, job, and task objects can be defined in a
configuration, you do not have to define them in your application. Therefore,
the code itself can accommodate any type of scheduler. For example,
The configuration defined as MyConfig must define any and all properties
necessary and appropriate for your scheduler and configuration, and the
configuration must not include any parameters inconsistent with your setup.
All changes necessary to use a different scheduler can now be made in the
configuration, without any modification needed in the application.
get(job1, 'Configuration')
our_jobmanager_config
6-28
Programming Tips and Notes
CHECKPOINTBASE\HOSTNAME_WORKERNAME_mlworker_log\work
For example, if the worker named worker22 is running on host nodeA52, and
its CHECKPOINTBASE value is C:\TEMP\MDCE\Checkpoint, the starting current
directory for that worker session is
6-29
6 Programming Overview
C:\TEMP\MDCE\Checkpoint\nodeA52_worker22_mlworker_log\work
clear functions
clears all Parallel Computing Toolbox objects from the current MATLAB
session. They still remain in the job manager. For information on recreating
these objects in the client session, see “Recovering Objects” on page 8-18.
Interrupting a Job
Because jobs and tasks are run outside the client session, you cannot use
Ctrl+C (^C) in the client session to interrupt them. To control or interrupt
the execution of jobs and tasks, use such functions as cancel, destroy,
demote, promote, pause, and resume.
6-30
Programming Tips and Notes
Speeding Up a Job
You might find that your code runs slower on multiple workers than it does
on one desktop computer. This can occur when task startup and stop time
is not negligible relative to the task run time. The most common mistake in
this regard is to make the tasks too small, i.e., too fine-grained. Another
common mistake is to send large amounts of input or output data with each
task. In both of these cases, the time it takes to transfer data and initialize
a task is far greater than the actual time it takes for the worker to evaluate
the task function.
6-31
6 Programming Overview
Introduction
The parallel profiler provides an extension of the profile command and the
profile viewer specifically for parallel jobs, to enable you to see how much time
each lab spends evaluating each function and how much time communicating
or waiting for communications with the other labs. Before using the parallel
profiler, familiarize yourself with the standard profiler and its views, as
described in “Profiling for Improving Performance”.
Note The parallel profiler works on parallel jobs, including inside pmode. It
does not work on parfor-loops.
To turn on the parallel profiler to start collecting data, enter the following
line in your parallel job task code file, or type at the pmode prompt in the
Parallel Command Window:
mpiprofile on
Now the profiler is collecting information about the execution of code on each
lab and the communications between the labs. Such information includes:
6-32
Using the Parallel Profiler
With the parallel profiler on, you can proceed to execute your code while the
profiler collects the data.
In the pmode Parallel Command Window, to find out if the profiler is on, type:
For a complete list of options regarding profiler data details, clearing data,
etc., see the mpiprofile reference page.
When the Parallel Command Window (pmode) starts, type the following code
at the pmode prompt:
6-33
6 Programming Overview
The last command opens the Profiler window, first showing the Parallel
Profile Summary (or function summary report) for lab 1.
The function summary report displays the data for each function executed on
a lab in sortable columns with the following headers:
6-34
Using the Parallel Profiler
6-35
6 Programming Overview
Click the name of any function in the list for more details about the execution
of that function. The function detail report for codistributed.mtimes
includes this listing:
The code that is displayed in the report is taken from the client. If the code
has changed on the client since the parallel job ran on the labs, or if the
labs are running a different version of the functions, the display might not
accurately reflect what actually executed.
You can display information for each lab, or use the comparison controls to
display information for several labs simultaneously. Two buttons provide
Automatic Comparison Selection, allowing you to compare the data from
the labs that took the most versus the least amount of time to execute the code,
or data from the labs that spent the most versus the least amount of time in
performing interlab communication. Manual Comparison Selection allows
you to compare data from specific labs or labs that meet certain criteria.
6-36
Using the Parallel Profiler
The following listing from the summary report shows the result of using
the Automatic Comparison Selection of Compare (max vs. min
TotalTime). The comparison shows data from lab 3 compared to lab 1
because these are the labs that spend the most versus least amount of time
executing the code.
6-37
6 Programming Overview
The following figure shows a summary of all the functions executed during the
profile collection time. The Manual Comparison Selection of max Time
Aggregate means that data is considered from all the labs for all functions to
determine which lab spent the maximum time on each function. Next to each
function’s name is the lab that took the longest time to execute that function.
The other columns list the data from that lab.
6-38
Using the Parallel Profiler
The next figure shows a summary report for the labs that spend the most
versus least time for each function. A Manual Comparison Selection of
max Time Aggregate against min Time >0 Aggregate generated this
summary. Both aggregate settings indicate that the profiler should consider
data from all labs for all functions, for both maximum and minimum. This
report lists the data for codistributed.mtimes from labs 3 and 1, because
they spent the maximum and minimum times on this function. Similarly,
other functions are listed.
6-39
6 Programming Overview
6-40
Using the Parallel Profiler
6-41
6 Programming Overview
Plots like those in the previous two figures can help you determine the best
way to balance work among your labs, perhaps by altering the partition
scheme of your codistributed arrays.
6-42
Using the Parallel Profiler
6-43
6 Programming Overview
Benchmarking Performance
In this section...
“Demos” on page 6-44
“HPC Challenge Benchmarks” on page 6-44
Demos
Several benchmarking demos can help you understand and evaluate
performance of the parallel computing products. You can access these demos
in the Help Browser under the Parallel Computing Toolbox node: expand
the node for Demos then Benchmarks.
6-44
Troubleshooting and Debugging
6-45
6 Programming Overview
The worker that ran the task did not have access to the function
function_name. One solution is to make sure the location of the function’s
file, function_name.m, is included in the job’s PathDependencies property.
Another solution is to transfer the function file to the worker by adding
function_name.m to the FileDependencies property of the job.
6-46
Troubleshooting and Debugging
• If using a generic scheduler, make sure the submit function redirects error
messages to a log file.
• The MATLAB worker failed to start due to licensing errors, the executable
is not on the default path on the worker machine, or is not installed in the
location where the scheduler expected it to be.
• MATLAB could not read/write the job input/output files in the scheduler’s
data location. The data location may not be accessible to all the worker
nodes, or the user that MATLAB runs as does not have permission to
read/write the job files.
• If using a generic scheduler
- The environment variable MDCE_DECODE_FUNCTION was not defined
before the MATLAB worker started.
- The decode function was not on the worker’s path.
• If using mpiexec
- The passphrase to smpd was incorrect or missing.
- The smpd daemon was not running on all the specified machines.
Task Errors
If your job returned no results (i.e., getAllOutputArguments(job) returns an
empty cell array), it is probable that the job failed and some of its tasks have
their ErrorMessage and ErrorIdentifier properties set.
You can use the following code to identify tasks with error messages:
This code displays the nonempty error messages of the tasks found in the job
object yourjob.
6-47
6 Programming Overview
Debug Logs
If you are using a supported third-party scheduler, you can use the
getDebugLog function to read the debug log from the scheduler for a particular
job or task.
For example, find the failed job on your LSF scheduler, and read its debug log.
The following sections can help you identify the general nature of some
connection problems.
findResource('scheduler','type','jobmanager')
• The client cannot contact the job manager host via multicast. Try to fully
specify where to look for the job manager by using the LookupURL property
in your call to findResource:
findResource('scheduler','type','jobmanager', ...
'LookupURL','JobMgrHostName')
6-48
Troubleshooting and Debugging
• Firewalls do not allow traffic from the job manager to the client.
• The job manager cannot resolve the short hostname of the client computer.
Use pctconfig to change the hostname that the job manager will use for
contacting the client.
6-49
6 Programming Overview
6-50
7
Evaluating Functions in a
Cluster
In many cases, the tasks of a job are all the same, or there are a limited
number of different kinds of tasks in a job. Parallel Computing Toolbox
software offers a solution for these cases that alleviates you from having to
define individual tasks and jobs when evaluating a function in a cluster of
workers. The two ways of evaluating a function on a cluster are described in
the following sections:
Scope of dfeval
When you evaluate a function in a cluster of computers with dfeval, you
provide basic required information, such as the function to be evaluated,
the number of tasks to divide the job into, and the variable into which the
results are returned. Synchronous (sync) evaluation in a cluster means that
your MATLAB session is blocked until the evaluation is complete and the
results are assigned to the designated variable. So you provide the necessary
information, while Parallel Computing Toolbox software handles all the
job-related aspects of the function evaluation.
When executing the dfeval function, the toolbox performs all these steps
of running a job:
2 Creates a job
By allowing the system to perform all the steps for creating and running jobs
with a single function call, you do not have access to the full flexibility offered
by Parallel Computing Toolbox software. However, this narrow functionality
meets the requirements of many straightforward applications. To focus the
scope of dfeval, the following limitations apply:
7-2
Evaluating Functions Synchronously
• You can pass property values to the job object; but you cannot set any
task-specific properties, including callback functions, unless you use
configurations.
• All the tasks in the job must have the same number of input arguments.
• All the tasks in the job must have the same number of output arguments.
• If you are using a third-party scheduler instead of the job manager, you
must use configurations in your call to dfeval. See “Programming with
User Configurations” on page 6-16, and the reference page for dfeval.
• You do not have direct access to the job manager, job, or task objects, i.e.,
there are no objects in your MATLAB workspace to manipulate (though
you can get them using findResource and the properties of the scheduler
object). Note that dfevalasync returns a job object.
• Without access to the objects and their properties, you do not have control
over the handling of errors.
Arguments of dfeval
Suppose the function myfun accepts three input arguments, and generates two
output arguments. To run a job with four tasks that call myfun, you could type
The number of elements of the input argument cell arrays determines the
number of tasks in the job. All input cell arrays must have the same number
of elements. In this example, there are four tasks.
Because myfun returns two arguments, the results of your job will be assigned
to two cell arrays, X and Y. These cell arrays will have four elements each, for
the four tasks. The first element of X will have the first output argument from
the first task, the first element of Y will have the second argument from the
first task, and so on.
The following table shows how the job is divided into tasks and where the
results are returned.
7-3
7 Evaluating Functions in a Cluster
So using one dfeval line would be equivalent to the following code, except
that dfeval can run all the statements simultaneously on separate machines.
For further details and examples of the dfeval function, see the dfeval
reference page.
You can use dfeval to run this function on four sets of data using four tasks
in a single job. The input data can be represented by the four vectors,
[1 2 6]
[10 20 60]
[100 200 600]
[1000 2000 6000]
7-4
Evaluating Functions Synchronously
A quick look at the first set of data tells you that its mean is 3, while its
median is 2. So,
[x,y] = averages(1,2,6)
x =
3
y =
2
When calling dfeval, its input requires that the data be grouped together
such that the first input argument to each task function is in the first cell
array argument to dfeval, all second input arguments to the task functions
are grouped in the next cell array, and so on. Because we want to evaluate
four sets of data with four tasks, each of the three cell arrays will have four
elements. In this example, the first arguments for the task functions are 1,
10, 100, and 1000. The second inputs to the task functions are 2, 20, 200, and
2000. With the task inputs arranged thus, the call to dfeval looks like this.
A =
[ 3]
[ 30]
[ 300]
[3000]
B =
[ 2]
[ 20]
[ 200]
[2000]
Notice that the first task evaluates the first element of the three cell arrays.
The results of the first task are returned as the first elements of each of the
two output values. In this case, the first task returns a mean of 3 and median
of 2. The second task returns a mean of 30 and median of 20.
7-5
7 Evaluating Functions in a Cluster
If the original function were written to accept one input vector, instead of
three input values, it might make the programming of dfeval simpler. For
example, suppose your task function were
Now the function requires only one argument, so a call to dfeval requires
only one cell array. Furthermore, each element of that cell array can be a
vector containing all the values required for an individual task. The first
vector is sent as a single argument to the first task, the second vector to the
second task, and so on.
A =
[ 3]
[ 30]
[ 300]
[3000]
B =
[ 2]
[ 20]
[ 200]
[2000]
If you cannot vectorize your function, you might have to manipulate your
data arrangement for using dfeval. Returning to our original data in this
example, suppose you want to start with data in three vectors.
v1 = [1 2 6];
v2 = [10 20 60];
v3 = [100 200 600];
v4 = [1000 2000 6000];
7-6
Evaluating Functions Synchronously
1 2 6
10 20 60
100 200 600
1000 2000 6000
c1 = num2cell(dataset(:,1));
c2 = num2cell(dataset(:,2));
c3 = num2cell(dataset(:,3));
Now you can use these cell arrays as your input arguments for dfeval.
A =
[ 3]
[ 30]
[ 300]
[3000]
B =
[ 2]
[ 20]
[ 200]
[2000]
7-7
7 Evaluating Functions in a Cluster
The dfevalasync function operates in the same way as dfeval, except that it
does not block the MATLAB command line, and it does not directly return
results.
Note that you have to specify the number of output arguments that each
task will return (2, in this example).
The MATLAB session does not wait for the job to execute, but returns the
prompt immediately. Instead of assigning results to cell array variables, the
function creates a job object in the MATLAB workspace that you can use to
access job status and results.
You can use the MATLAB session to perform other operations while the job is
being run on the cluster. When you want to get the job’s results, you should
make sure it is finished before retrieving the data.
waitForState(job1, 'finished')
results = getAllOutputArguments(job1)
results =
[ 3] [ 2]
[ 30] [ 20]
[ 300] [ 200]
[3000] [2000]
The structure of the output arguments is now slightly different than it was for
dfeval. The getAllOutputArguments function returns all output arguments
from all tasks in a single cell array, with one row per task. In this example,
7-8
Evaluating Functions Asynchronously
each row of the cell array results will have two elements. So, results{1,1}
contains the first output argument from the first task, results{1,2} contains
the second argument from the first task, and so on.
For further details and examples of the dfevalasync function, see the
dfevalasync reference page.
7-9
7 Evaluating Functions in a Cluster
7-10
8
Programming Distributed
Jobs
A distributed job is one whose tasks do not directly communicate with each
other. The tasks do not need to run simultaneously, and a worker might
run several tasks of the same job in succession. Typically, all tasks perform
the same or similar functions on different data sets in an embarrassingly
parallel configuration.
This section details the steps of a typical programming session with Parallel
Computing Toolbox software using a local scheduler:
Note that the objects that the client session uses to interact with the scheduler
are only references to data that is actually contained in the scheduler’s data
location, not in the client session. After jobs and tasks are created, you can
close your client session and restart it, and your job is still stored in the data
8-2
Using a Local Scheduler
location. You can find existing jobs using the findJob function or the Jobs
property of the scheduler object.
sched = findResource('scheduler','type','local');
Create a Job
You create a job with the createJob function. This statement creates a job
in the scheduler’s data location, creates the job object job1 in the client
session, and if you omit the semicolon at the end of the command, displays
some information about the job.
job1 = createJob(sched)
Job ID 1 Information
====================
UserName : eng864
State : pending
SubmitTime :
StartTime :
Running Duration :
- Data Dependencies
FileDependencies : {}
PathDependencies : {}
- Associated Task(s)
Number Pending : 0
Number Running : 0
Number Finished : 0
TaskID of errors :
You can use the get function to see all the properties of this job object.
8-3
8 Programming Distributed Jobs
get(job1)
Configuration: ''
Name: 'Job1'
ID: 1
UserName: 'eng864'
Tag: ''
State: 'pending'
CreateTime: 'Mon Jan 08 15:40:18 EST 2007'
SubmitTime: ''
StartTime: ''
FinishTime: ''
Tasks: [0x1 double]
FileDependencies: {0x1 cell}
PathDependencies: {0x1 cell}
JobData: []
Parent: [1x1 distcomp.localscheduler]
UserData: []
Note that the job’s State property is pending. This means the job has not yet
been submitted (queued) for running, so you can now add tasks to it.
The scheduler’s display now indicates the existence of your job, which is the
pending one.
sched
Type : local
ClusterOsType : pc
DataLocation : C:\WINNT\Profiles\eng864\App...
HasSharedFilesystem : true
- Assigned Jobs
Number Pending : 1
Number Queued : 0
Number Running : 0
Number Finished : 0
8-4
Using a Local Scheduler
ClusterMatlabRoot : D:\apps\matlab
Create Tasks
After you have created your job, you can create tasks for the job using
the createTask function. Tasks define the functions to be evaluated by
the workers during the running of the job. Often, the tasks of a job are all
identical. In this example, five tasks will each generate a 3-by-3 matrix
of random numbers.
get(job1,'Tasks')
ans =
Tasks: 5 by 1
=============
submit(job1)
The local scheduler starts up to eight workers and distributes the tasks of
job1 to its workers for evaluation.
8-5
8 Programming Distributed Jobs
waitForState(job1)
results = getAllOutputArguments(job1);
results{1:5}
After the job is complete, you can repeat the commands to examine the
updated status of the scheduler, job, and task objects:
sched
job1
get(job1,'Tasks')
8-6
Using a Local Scheduler
The local scheduler has no interaction with any other scheduler, nor with any
other workers that might also be running on your client machine under the
mdce service. Multiple MATLAB sessions on your computer can each start
its own local scheduler with its own eight workers, but these groups do not
interact with each other, so you cannot combine local groups of workers to
increase your local cluster size.
When you end your MATLAB client session, its local scheduler and any
workers that happen to be running at that time also stop immediately.
8-7
8 Programming Distributed Jobs
This section details the steps of a typical programming session with Parallel
Computing Toolbox software using a MathWorks job manager:
Note that the objects that the client session uses to interact with the job
manager are only references to data that is actually contained in the job
manager process, not in the client session. After jobs and tasks are created,
you can close your client session and restart it, and your job is still stored in
the job manager. You can find existing jobs using the findJob function or the
Jobs property of the job manager object.
8-8
Using a Job Manager
jm = findResource('scheduler','type','jobmanager', ...
'Name','MyJobManager','LookupURL','MyJMhost')
jm =
Jobmanager Information
======================
Type : jobmanager
ClusterOsType : 'pc'
DataLocation : database on MyJobManager@MyJMhost
- Assigned Jobs
Number Pending : 0
Number Queued : 0
Number Running : 0
Number Finished : 0
UserName : myloginname
SecurityLevel : 0
Name : MyJobManager
Hostname : MyJMhost
HostAddress(s) : 123.123.123.123
State : running
ClusterSize : 2
NumberOfIdleWorkers : 2
NumberOfBusyWorkers : 0
You can view all the accessible properties of the job manager object with
the get function:
get(jm)
8-9
8 Programming Distributed Jobs
If your network supports multicast, you can omit property values to search
on, and findResource returns all available job managers.
all_managers = findResource('scheduler','type','jobmanager')
You can then examine the properties of each job manager to identify which
one you want to use.
for i = 1:length(all_managers)
get(all_managers(i))
end
When you have identified the job manager you want to use, you can isolate
it and create a single object.
jm = all_managers(3)
Create a Job
You create a job with the createJob function. Although this command
executes in the client session, it actually creates the job on the job manager,
jm, and creates a job object, job1, in the client session.
job1 = createJob(jm)job1 =
Job ID 1 Information
====================
UserName : myloginname
AuthorizedUsers : {}
State : pending
SubmitTime :
StartTime :
Running Duration :
- Data Dependencies
FileDependencies : {}
PathDependencies : {}
- Associated Task(s)
8-10
Using a Job Manager
Number Pending : 0
Number Running : 0
Number Finished : 0
TaskID of errors :
MaximumNumberOfWorkers : Inf
MinimumNumberOfWorkers : 1
Timeout : Inf
RestartWorker : false
QueuedFcn :
RunningFcn :
FinishedFcn :
Use get to see all the accessible properties of this job object.
get(job1)
Note that the job’s State property is pending. This means the job has not
been queued for running yet, so you can now add tasks to it.
jm
jm =
Jobmanager Information
======================
Type : jobmanager
ClusterOsType : 'pc'
DataLocation : database on MyJobManager@MyJMhost
- Assigned Jobs
Number Pending : 1
Number Queued : 0
Number Running : 0
8-11
8 Programming Distributed Jobs
Number Finished : 0
UserName : myloginname
SecurityLevel : 0
Name : MyJobManager
Hostname : MyJMhost
HostAddress(s) : 123.123.123.123
State : running
ClusterSize : 2
NumberOfIdleWorkers : 2
NumberOfBusyWorkers : 0
You can transfer files to the worker by using the FileDependencies property
of the job object. For details, see the FileDependencies reference page and
“Sharing Code” on page 8-14.
Create Tasks
After you have created your job, you can create tasks for the job using
the createTask function. Tasks define the functions to be evaluated by
the workers during the running of the job. Often, the tasks of a job are all
identical. In this example, each task will generate a 3-by-3 matrix of random
numbers.
get(job1,'Tasks')
ans =
distcomp.task: 5-by-1
8-12
Using a Job Manager
Alternatively, you can create the five tasks with one call to createTask by
providing a cell array of five cell arrays defining the input arguments to each
task.
submit(job1)
The job manager distributes the tasks of job1 to its registered workers for
evaluation.
2 Run the jobStartup function the first time evaluating a task for this job.
You can specify this function in FileDependencies or PathDependencies.
If the same worker evaluates subsequent tasks for this job, jobStartup
does not run between tasks.
4 If the worker is part of forming a new MATLAB pool, run the poolStartup
function. (This occurs when executing matlabpool open or when running
other types of jobs that form and use a MATLAB pool.)
8-13
8 Programming Distributed Jobs
results = getAllOutputArguments(job1);
results{1:5}
Sharing Code
Because the tasks of a job are evaluated on different machines, each machine
must have access to all the files needed to evaluate its tasks. The basic
mechanisms for sharing code are explained in the following sections:
8-14
Using a Job Manager
You must define each worker session’s path so that it looks for files in the
right places. You can define the path
Access to files among shared resources can depend upon permissions based
on the user name. You can set the user name with which the job manager
and worker services of MATLAB Distributed Computing Server software
run by setting the MDCEUSER value in the mdce_def file before starting
the services. For Microsoft Windows operating systems, there is also
MDCEPASS for providing the account password for the specified user. For an
explanation of service default settings and the mdce_def file, see “Defining
the Script Defaults” in the MATLAB Distributed Computing Server System
Administrator’s Guide.
8-15
8 Programming Distributed Jobs
could include MATLAB code necessary for task evaluation, or the input
data for processing or output data resulting from task evaluation. All these
properties are described in detail in their own reference pages:
There is a default maximum amount of data that can be sent in a single call
for setting properties. This limit applies to the OutputArguments property as
well as to data passed into a job as input arguments or FileDependencies. If
the limit is exceeded, you get an error message. For more information about
this data transfer size limit, see “Object Data Size Limitations” on page 6-45.
These additional files can initialize and clean up a worker session as it begins
or completes evaluations of tasks for a job:
8-16
Using a Job Manager
matlabroot/toolbox/distcomp/user
You can edit these files to include whatever MATLAB code you want the
worker to execute at the indicated times.
Alternatively, you can create your own versions of these files and pass them to
the job as part of the FileDependencies property, or include the path names
to their locations in the PathDependencies property.
8-17
8 Programming Distributed Jobs
Therefore, if you have submitted your job to the job queue for execution, you
can quit your client session of MATLAB, and the job will be executed by the
job manager. The job manager maintains its job and task objects. You can
retrieve the job results later in another client session.
Recovering Objects
A client session of Parallel Computing Toolbox software can access any of the
objects in MATLAB Distributed Computing Server software, whether the
current client session or another client session created these objects.
You create job manager and worker objects in the client session by using
the findResource function. These client objects refer to sessions running in
the engine.
jm = findResource('scheduler','type','jobmanager', ...
'Name','Job_Mgr_123','LookupURL','JobMgrHost')
If your network supports multicast, you can find all available job managers by
omitting any specific property information.
jm_set = findResource('scheduler','type','jobmanager')
The array jm_set contains all the job managers accessible from the client
session. You can index through this array to determine which job manager
is of interest to you.
jm = jm_set(2)
When you have access to the job manager by the object jm, you can create
objects that reference all those objects contained in that job manager. All the
jobs contained in the job manager are accessible in its Jobs property, which is
an array of job objects.
8-18
Using a Job Manager
all_jobs = get(jm,'Jobs')
You can index through the array all_jobs to locate a specific job.
Alternatively, you can use the findJob function to search in a job manager for
particular job identified by any of its properties, such as its State.
finished_jobs = findJob(jm,'State','finished')
This command returns an array of job objects that reference all finished jobs
on the job manager jm.
For example, find and destroy all finished jobs in your job manager that
belong to the user joep.
jm = findResource('scheduler','type','jobmanager', ...
'Name','MyJobManager','LookupURL','JobMgrHost')
finished_jobs = findJob(jm,'State','finished','UserName','joep')
8-19
8 Programming Distributed Jobs
destroy(finished_jobs)
clear finished_jobs
The destroy function permanently removes these jobs from the job manager.
The clear function removes the object references from the local MATLAB
workspace.
Starting a Job Manager from a Clean State. When a job manager starts,
by default it starts so that it resumes its former session with all jobs intact.
Alternatively, a job manager can start from a clean state with all its former
history deleted. Starting from a clean state permanently removes all job and
task data from the job manager of the specified name on a particular host.
8-20
Using a Fully Supported Third-Party Scheduler
This section details the steps of a typical programming session with Parallel
Computing Toolbox software for jobs distributed to workers by a fully
supported third-party scheduler.
This section assumes you have an LSF, PBS Pro, TORQUE, or Windows
HPC Server (including CCS and HPC Server 2008) scheduler installed
and running on your network. For more information about LSF, see
http://www.platform.com/Products/. For more information about
Windows HPC Server, see http://www.microsoft.com/hpc.
8-21
8 Programming Distributed Jobs
You specify the scheduler type for findResource to search for with one of
the following:
sched = findResource('scheduler','type','lsf')
sched = findResource('scheduler','type','pbspro')
sched = findResource('scheduler','type','torque')
Alternatively, you can use a parallel configuration to find the scheduler and
set the object properties with a single findResource statement.
If DataLocation is not set, the default location for job data is the current
working directory of the MATLAB client the first time you use findResource
to create an object for this type of scheduler. All settable property values on a
scheduler object are local to the MATLAB client, and are lost when you close
the client session or when you remove the object from the client workspace
with delete or clear all.
Note In a shared file system, all nodes require access to the directory specified
in the scheduler object’s DataLocation directory. See the DataLocation
reference page for information on setting this property for a mixed-platform
environment.
8-22
Using a Fully Supported Third-Party Scheduler
You can look at all the property settings on the scheduler object. If no jobs are
in the DataLocation directory, the Jobs property is a 0-by-1 array.
get(sched)
Configuration: ''
Type: 'lsf'
DataLocation: '\\share\scratch\jobdata'
HasSharedFilesystem: 1
Jobs: [0x1 double]
ClusterMatlabRoot: '\\apps\matlab\'
ClusterOsType: 'unix'
UserData: []
ClusterSize: Inf
ClusterName: 'CENTER_MATRIX_CLUSTER'
MasterName: 'masterhost.clusternet.ourdomain.com'
SubmitArguments: ''
ParallelSubmissionWrapperScript: [1x92 char]
You specify 'hpcserver' as the scheduler type for findResource to search for.
sched = findResource('scheduler','type','hpcserver')
8-23
8 Programming Distributed Jobs
Alternatively, you can use a parallel configuration to find the scheduler and
set the object properties with a single findResource statement.
If DataLocation is not set, the default location for job data is the current
working directory of the MATLAB client the first time you use findResource
to create an object for this type of scheduler. All settable property values on a
scheduler object are local to the MATLAB client, and are lost when you close
the client session or when you remove the object from the client workspace
with delete or clear all.
Note Because Windows HPC Server requires a shared file system, all
nodes require access to the directory specified in the scheduler object’s
DataLocation directory.
You can look at all the property settings on the scheduler object. If no jobs are
in the DataLocation directory, the Jobs property is a 0-by-1 array.
get(sched)
Configuration: ''
Type: 'hpcserver'
DataLocation: '\\share\scratch\jobdata'
HasSharedFilesystem: 1
Jobs: [0x1 double]
ClusterMatlabRoot: '\\apps\matlab\'
ClusterOsType: 'pc'
UserData: []
ClusterSize: Inf
SchedulerHostname: 'server04'
UseSOAJobSubmission: 0
JobTemplate: ''
JobDescriptionFile: ''
ClusterVersion: 'HPCServer2008'
8-24
Using a Fully Supported Third-Party Scheduler
Create a Job
You create a job with the createJob function, which creates a job object in
the client session. The job data is stored in the directory specified by the
scheduler object’s DataLocation property.
j = createJob(sched)
This statement creates the job object j in the client session. Use get to see
the properties of this job object.
get(j)
Configuration: ''
Name: 'Job1'
ID: 1
UserName: 'eng1'
Tag: ''
State: 'pending'
CreateTime: 'Fri Jul 29 16:15:47 EDT 2005'
SubmitTime: ''
StartTime: ''
FinishTime: ''
Tasks: [0x1 double]
FileDependencies: {0x1 cell}
PathDependencies: {0x1 cell}
JobData: []
Parent: [1x1 distcomp.lsfscheduler]
UserData: []
This output varies only slightly between jobs that use LSF and Windows
HPC Server schedulers, but is quite different from a job that uses a job
manager. For example, jobs on LSF or Windows HPC Server schedulers have
no callback functions.
The job’s State property is pending. This state means the job has not been
queued for running yet. This new job has no tasks, so its Tasks property
is a 0-by-1 array.
get(sched, 'Jobs')
8-25
8 Programming Distributed Jobs
Note In a shared file system, MATLAB clients on many computers can access
the same job data on the network. Properties of a particular job or task should
be set from only one computer at a time.
Create Tasks
After you have created your job, you can create tasks for the job. Tasks define
the functions to be evaluated by the workers during the running of the job.
Often, the tasks of a job are all identical except for different arguments or
data. In this example, each task will generate a 3-by-3 matrix of random
numbers.
get(j,'Tasks')
ans =
distcomp.simpletask: 5-by-1
Alternatively, you can create the five tasks with one call to createTask by
providing a cell array of five cell arrays defining the input arguments to each
task.
8-26
Using a Fully Supported Third-Party Scheduler
submit(j)
4 If the worker is part of forming a new MATLAB pool, run the poolStartup
function. (This occurs when executing matlabpool open or when running
other types of jobs that form and use a MATLAB pool.)
8-27
8 Programming Distributed Jobs
The job runs asynchronously with the MATLAB client. If you need to wait for
the job to complete before you continue in your MATLAB client session, you
can use the waitForState function.
waitForState(j)
The default state to wait for is finished. This function causes MATLAB to
pause until the State property of j is 'finished'.
Note When you use an LSF scheduler in a nonshared file system, the
scheduler might report that a job is in the finished state even though the LSF
scheduler might not yet have completed transferring the job’s files.
results = getAllOutputArguments(j);
results{1:5}
8-28
Using a Fully Supported Third-Party Scheduler
Sharing Code
Because different machines evaluate the tasks of a job, each machine must
have access to all the files needed to evaluate its tasks. The following sections
explain the basic mechanisms for sharing data:
You must define each worker session’s path so that it looks for files in the
correct places. You can define the path by
8-29
8 Programming Distributed Jobs
Three additional files can initialize and clean a worker session as it begins or
completes evaluations of tasks for a job:
8-30
Using a Fully Supported Third-Party Scheduler
matlabroot/toolbox/distcomp/user
You can edit these files to include whatever MATLAB code you want the
worker to execute at the indicated times.
Alternatively, you can create your own versions of these files and pass them to
the job as part of the FileDependencies property, or include the pathnames
to their locations in the PathDependencies property.
Managing Objects
Objects that the client session uses to interact with the scheduler are only
references to data that is actually contained in the directory specified by
the DataLocation property. After jobs and tasks are created, you can shut
down your client session, restart it, and your job will still be stored in that
remote location. You can find existing jobs using the Jobs property of the
recreated scheduler object.
The following sections describe how to access these objects and how to
permanently remove them:
8-31
8 Programming Distributed Jobs
Therefore, if you have submitted your job to the scheduler job queue for
execution, you can quit your client session of MATLAB, and the job will be
executed by the scheduler. The scheduler maintains its job and task data.
You can retrieve the job results later in another client session.
Recovering Objects
A client session of Parallel Computing Toolbox software can access any of the
objects in the DataLocation, whether the current client session or another
client session created these objects.
You create scheduler objects in the client session by using the findResource
function.
When you have access to the scheduler by the object sched, you can create
objects that reference all the data contained in the specified location for that
scheduler. All the job and task data contained in the scheduler data location
are accessible in the scheduler object’s Jobs property, which is an array of job
objects.
You can index through the array all_jobs to locate a specific job.
Alternatively, you can use the findJob function to search in a scheduler object
for a particular job identified by any of its properties, such as its State.
8-32
Using a Fully Supported Third-Party Scheduler
This command returns an array of job objects that reference all finished jobs
on the scheduler sched, whose data is found in the specified DataLocation.
Destroying Jobs
Jobs in the scheduler continue to exist even after they are finished. From
the command line in the MATLAB client session, you can call the destroy
function for any job object. If you destroy a job, you destroy all tasks contained
in that job. The job and task data is deleted from the DataLocation directory.
For example, find and destroy all finished jobs in your scheduler whose data
is stored in a specific directory.
8-33
8 Programming Distributed Jobs
Overview
Parallel Computing Toolbox software provides a generic interface that lets you
interact with third-party schedulers, or use your own scripts for distributing
tasks to other nodes on the cluster for evaluation.
To evaluate a task, a worker requires five parameters that you must pass from
the client to the worker. The parameters can be passed any way you want to
transfer them, but because a particular one must be an environment variable,
the examples in this section pass all parameters as environment variables.
8-34
Using the Generic Scheduler Interface
Environment Environment
MATLAB client variables variables MATLAB worker
Submit Decode
function function
Scheduler
where sched is the scheduler object in the client session, created with the
findResource function. In this case, the submit function gets called with its
three default arguments: scheduler, job, and properties object, in that order.
The function declaration line of the function might look like this:
Inside the function of this example, the three argument objects are known as
scheduler, job, and props.
You can write a submit function that accepts more than the three default
arguments, and then pass those extra arguments by including them in the
definition of the SubmitFcn property.
8-35
8 Programming Distributed Jobs
time_limit = 300
testlocation = 'Plant30'
set(sched, 'SubmitFcn', {@mysubmitfunc, time_limit, testlocation})
In this example, the submit function requires five arguments: the three
defaults, along with the numeric value of time_limit and the string value of
testlocation. The function’s declaration line might look like this:
• To identify the decode function that MATLAB workers run when they start
• To make information about job and task data locations available to the
workers via their decode function
• To instruct your scheduler how to start a MATLAB worker on the cluster
for each task of your job
Client node
MATLAB client
Parallel Environment variables
Computing job.SubmitFcn
Toolbox MDCE_DECODE_FUNCTION
setenv MDCE_STORAGE_CONSTRUCTOR
submit Submit MDCE_STORAGE_LOCATION
function
MDCE_JOB_LOCATION
MDCE_TASK_LOCATION
Scheduler
8-36
Using the Generic Scheduler Interface
You do not set the values of any of these properties. They are automatically
set by the toolbox so that you can program your submit function to forward
them to the worker nodes.
8-37
8 Programming Distributed Jobs
With these values passed into your submit function, the function can pass
them to the worker nodes by any of several means. However, because the
name of the decode function must be passed as an environment variable, the
examples that follow pass all the other necessary property values also as
environment variables.
The submit function writes the values of these object properties out to
environment variables with the setenv function.
Passes Environment
Scheduler Scheduler Command Variables
Condor® condor_submit Not by default.
Command can pass
all or specific variables.
LSF bsub Yes, by default.
PBS qsub Command must specify
which variables to pass.
Sun™ Grid Engine qsub Command must specify
which variables to pass.
Your submit function might also use some of these properties and others
when constructing and invoking your scheduler command. scheduler, job,
and props (so named only for this example) refer to the first three arguments
to the submit function.
8-38
Using the Generic Scheduler Interface
This example function uses only the three default arguments. You can
have additional arguments passed into your submit function, as discussed
in “MATLAB Client Submit Function” on page 8-35.
2 Identify the values you want to send to your environment variables. For
convenience, you define local variables for use in this function.
decodeFcn = 'mydecodefunc';
jobLocation = get(props, 'JobLocation');
taskLocations = get(props, 'TaskLocations'); %This is a cell array
storageLocation = get(props, 'StorageLocation');
storageConstructor = get(props, 'StorageConstructor');
The name of the decode function that must be available on the MATLAB
worker path is mydecodefunc.
3 Set the environment variables, other than the task locations. All the
MATLAB workers use these values when evaluating tasks of the job.
8-39
8 Programming Distributed Jobs
setenv('MDCE_DECODE_FUNCTION', decodeFcn);
setenv('MDCE_JOB_LOCATION', jobLocation);
setenv('MDCE_STORAGE_LOCATION', storageLocation);
setenv('MDCE_STORAGE_CONSTRUCTOR', storageConstructor);
Your submit function can use any names you choose for the environment
variables, with the exception of MDCE_DECODE_FUNCTION; the MATLAB
worker looks for its decode function identified by this variable. If you use
alternative names for the other environment variables, be sure that the
corresponding decode function also uses your alternative variable names.
You can see the variable names used in the standard decode function by
typing
edit parallel.cluster.generic.distributedDecodeFcn
4 Set the task-specific variables and scheduler commands. This is where you
instruct your scheduler to start MATLAB workers for each task.
for i = 1:props.NumberOfTasks
setenv('MDCE_TASK_LOCATION', taskLocations{i});
constructSchedulerCommand;
end
Note If you are not familiar with your network scheduler, ask your system
administrator for help.
8-40
Using the Generic Scheduler Interface
When working with the decode function, you must be aware of the
Worker node
Environment variables MATLAB worker
MDCE_DECODE_FUNCTION
MDCE_STORAGE_CONSTRUCTOR getenv Decode
MDCE_STORAGE_LOCATION function
MDCE_JOB_LOCATION
MDCE_TASK_LOCATION
matlab...
Scheduler
8-41
8 Programming Distributed Jobs
client node to the worker node. Your scheduler might perform this task for
you automatically; if it does not, you must arrange for this copying.
Note The decode function must be available on the MATLAB worker’s path.
You can get the decode function on the worker’s path by either moving the file
into a directory on the path (for example, matlabroot/toolbox/local), or by
having the scheduler use cd in its command so that it starts the MATLAB
worker from within the directory that contains the decode function.
In practice, the decode function might be identical for all workers on the
cluster. In this case, all workers can use the same decode function file if it is
accessible on a shared drive.
With those values from the environment variables, the decode function must
set the appropriate property values of the object that is its argument. The
property values that must be set are the same as those in the corresponding
submit function, except that instead of the cell array TaskLocations, each
worker has only the individual string TaskLocation, which is one element of
the TaskLocations cell array. Therefore, the properties you must set within
the decode function on its argument object are as follows:
8-42
Using the Generic Scheduler Interface
• StorageConstructor
• StorageLocation
• JobLocation
• TaskLocation
When the object is returned from the decode function to the MATLAB worker
session, its values are used internally for managing job and task data.
8-43
8 Programming Distributed Jobs
You can specify 'generic' as the name for findResource to search for.
(Any scheduler name starting with the string 'generic' creates a generic
scheduler object.)
Generic schedulers must use a shared file system for workers to access job
and task data. Set the DataLocation and HasSharedFilesystem properties
to specify where the job data is stored and that the workers should access job
data directly in a shared file system.
Note All nodes require access to the directory specified in the scheduler
object’s DataLocation directory. See the DataLocation reference page for
information on setting this property for a mixed-platform environment.
If DataLocation is not set, the default location for job data is the current
working directory of the MATLAB client the first time you use findResource
to create an object for this type of scheduler, which might not be accessible
to the worker nodes.
You can look at all the property settings on the scheduler object. If no jobs
are in the DataLocation directory, the Jobs property is a 0-by-1 array. All
settable property values on a scheduler object are local to the MATLAB client,
and are lost when you close the client session or when you remove the object
from the client workspace with delete or clear all.
get(sched)
Configuration: ''
Type: 'generic'
DataLocation: '\\share\scratch\jobdata'
HasSharedFilesystem: 1
8-44
Using the Generic Scheduler Interface
You must set the SubmitFcn property to specify the submit function for this
scheduler.
With the scheduler object and the user-defined submit and decode functions
defined, programming and running a job is now similar to doing so with a job
manager or any other type of scheduler.
2. Create a Job
You create a job with the createJob function, which creates a job object in
the client session. The job data is stored in the directory specified by the
scheduler object’s DataLocation property.
j = createJob(sched)
This statement creates the job object j in the client session. Use get to see
the properties of this job object.
get(j)
Configuration: ''
Name: 'Job1'
ID: 1
UserName: 'neo'
Tag: ''
State: 'pending'
CreateTime: 'Fri Jan 20 16:15:47 EDT 2006'
SubmitTime: ''
StartTime: ''
FinishTime: ''
Tasks: [0x1 double]
8-45
8 Programming Distributed Jobs
Note Properties of a particular job or task should be set from only one
computer at a time.
This generic scheduler job has somewhat different properties than a job that
uses a job manager. For example, this job has no callback functions.
The job’s State property is pending. This state means the job has not been
queued for running yet. This new job has no tasks, so its Tasks property
is a 0-by-1 array.
get(sched)
Configuration: ''
Type: 'generic'
DataLocation: '\\share\scratch\jobdata'
HasSharedFilesystem: 1
Jobs: [1x1 distcomp.simplejob]
ClusterMatlabRoot: '\\apps\matlab\'
ClusterOsType: 'pc'
UserData: []
ClusterSize: Inf
MatlabCommandToRun: 'worker'
SubmitFcn: @mysubmitfunc
ParallelSubmitFcn: []
3. Create Tasks
After you have created your job, you can create tasks for the job. Tasks define
the functions to be evaluated by the workers during the running of the job.
Often, the tasks of a job are identical except for different arguments or data.
In this example, each task generates a 3-by-3 matrix of random numbers.
8-46
Using the Generic Scheduler Interface
get(j,'Tasks')
ans =
distcomp.simpletask: 5-by-1
Alternatively, you can create the five tasks with one call to createTask by
providing a cell array of five cell arrays defining the input arguments to each
task.
submit(j)
The job runs asynchronously. If you need to wait for it to complete before
you continue in your MATLAB client session, you can use the waitForState
function.
waitForState(j)
The default state to wait for is finished or failed. This function pauses
MATLAB until the State property of j is 'finished' or 'failed'.
8-47
8 Programming Distributed Jobs
results = getAllOutputArguments(j);
results{1:5}
matlabroot/toolbox/distcomp/examples/integration
8-48
Using the Generic Scheduler Interface
At the time of publication, there are folders for Condor (condor), PBS (pbs),
and Platform LSF (lsf) schedulers, generic UNIX-based scripts (ssh), Sun
Grid Engine (sge), and mpiexec on Microsoft Windows operating systems
(winmpiexec). In addition, the pbs, lsf, and sge folders have subfolders called
shared, nonshared, and remoteSubmission, which contain scripts for use in
particular cluster configurations. Each of these subfolders contains a file
called README, which provides instruction on where and how to use its scripts.
Filename Description
distributedSubmitFcn.m Submit function for a distributed job
parallelSubmitFcn.m Submit function for a parallel job
distributedJobWrapper.sh Script that is submitted to PBS to start
workers that evaluate the tasks of a
distributed job
parallelJobWrapper.sh Script that is submitted to PBS to start labs
that evaluate the tasks of a parallel job
destroyJobFcn.m Script to destroy a job from the scheduler
extractJobId.m Script to get the job’s ID from the scheduler
getJobStateFcn.m Script to get the job’s state from the scheduler
getSubmitString.m Script to get the submission string for the
scheduler
These files are all programmed to use the standard decode functions provided
with the product, so they do not have specialized decode functions.
8-49
8 Programming Distributed Jobs
The folder for other scheduler types contain similar files. As more files
or solutions for more schedulers might become available at any time,
visit the support page for this product on the MathWorks Web site at
http://www.mathworks.com/support/product/product.html?product=DM.
This Web page also provides contact information in case you have any
questions.
Managing Jobs
While you can use the get, cancel, and destroy methods on jobs that use
the generic scheduler interface, by default these methods access or affect only
the job data where it is stored on disk. To cancel or destroy a job or task
that is currently running or queued, you must provide instructions to the
scheduler directing it what to do and when to do it. To accomplish this, the
toolbox provides a means of saving data associated with each job or task from
the scheduler, and a set of properties to define instructions for the scheduler
upon each cancel or destroy request.
for ii = 1:props.NumberOfTasks
define scheduler command per task
end
submit job to scheduler
data_array = parse data returned from scheduler %possibly NumberOfTasks-by-2 matrix
setJobSchedulerData(scheduler, job, data_array)
8-50
Using the Generic Scheduler Interface
If your scheduler accepts only submissions of individual tasks, you might get
return data pertaining to only each individual tasks. In this case, your submit
function might have code structured like this:
for ii = 1:props.NumberOfTasks
submit task to scheduler
%Per-task settings:
data_array(1,ii) = ... parse string returned from scheduler
data_array(2,ii) = ... save ID returned from scheduler
etc
end
setJobSchedulerData(scheduler, job, data_array)
• myCancelJob.m
• myDestroyJob.m
• myCancelTask.m
• myDestroyTask.m
8-51
8 Programming Distributed Jobs
In a similar way, you can define what do to for destroying a job, and what to
do for canceling and destroying tasks.
• CancelJobFcn
• DestroyJobFcn
• CancelTaskFcn
• DestroyTaskFcn
You can set the properties in the Configurations Manager for your scheduler,
or on the command line:
j1 = createJob(schdlr);
for ii = 1:n
t(ii) = createTask(j1,...)
end
submit(j1)
While it is running or queued, you can cancel or destroy the job or a task.
8-52
Using the Generic Scheduler Interface
This command cancels the task and moves it to the finished state, and
triggers execution of myCancelTask, which sends the appropriate commands
to the scheduler:
cancel(t(4))
This command deletes job data for j1, and triggers execution of myDestroyJob,
which sends the appropriate commands to the scheduler:
destroy(j1)
Whenever you use a toolbox function such as get, waitForState, etc., that
accesses the state of a job on the generic scheduler, after retrieving the state
from storage, the toolbox runs the function specified by the GetJobStateFcn
property, and returns its result in place of the stored state. The function
you write for this purpose must return a valid string value for the State of
a job object.
Summary
The following list summarizes the sequence of events that occur when running
a job that uses the generic scheduler interface:
8-53
8 Programming Distributed Jobs
submit(job)
5 The submit function sets environment variables with values derived from
its arguments.
6 The submit function makes calls to the scheduler — generally, a call for
each task (with environment variables identified explicitly, if necessary).
7 For each task, the scheduler starts a MATLAB worker session on a cluster
node.
10 The decode function sets the properties of its argument object with values
from the environment variables.
11 The MATLAB worker uses these object property values in processing its
task without your further intervention.
8-54
9
Parallel jobs are those in which the workers (or labs) can communicate
with each other during the evaluation of their tasks. The following sections
describe how to program parallel jobs:
Introduction
A parallel job consists of only a single task that runs simultaneously on
several workers, usually with different data. More specifically, the task is
duplicated on each worker, so each worker can perform the task on a different
set of data, or on a particular segment of a large data set. The workers can
communicate with each other as each executes its task. In this configuration,
workers are referred to as labs.
1 Find a scheduler.
3 Create a task.
4 Submit the job for running. For details about what each worker performs
for evaluating a task, see “Submit a Job to the Job Queue” on page 8-13.
The differences between distributed jobs and parallel jobs are summarized
in the following table.
9-2
Introduction
A parallel job has only one task that runs simultaneously on every lab. The
function that the task runs can take advantage of a lab’s awareness of how
many labs are running the job, which lab this is among those running the job,
and the features that allow labs to communicate with each other.
9-3
9 Programming Parallel Jobs
To use this supported interface for parallel jobs, the following conditions
must apply:
• You must have a shared file system between client and cluster machines
• You must be able to submit jobs directly to the scheduler from the client
machine
Note If all these conditions are not met, you must use the generic scheduler
interface with any third-party scheduler running a parallel job, including
pmode, matlabpool, spmd, and parfor. See “Using the Generic Scheduler
Interface” on page 9-8.
9-4
Using a Supported Scheduler
of these column sums are combined with the gplus function to calculate the
total sum of the elements of the original magic square.
This function is saved as the file colsum.m on the path of the MATLAB client.
It will be sent to each lab by the job’s FileDependencies property.
While this example has one lab create the magic square and broadcast it
to the other labs, there are alternative methods of getting data to the labs.
Each lab could create the matrix for itself. Alternatively, each lab could read
its part of the data from a file on disk, the data could be passed in as an
argument to the task function, or the data could be sent in a file contained in
the job’s FileDependencies property. The solution to choose depends on your
network configuration and the nature of the data.
You can create and configure the scheduler object with this code:
9-5
9 Programming Parallel Jobs
When your scheduler object is defined, you create the job object with the
createParallelJob function.
pjob = createParallelJob(sched);
The function file colsum.m (created in “Coding the Task Function” on page
9-4) is on the MATLAB client path, but it has to be made available to the labs.
One way to do this is with the job’s FileDependencies property, which can be
set in the configuration you used, or by:
Here you might also set other properties on the job, for example, setting the
number of workers to use. Again, configurations might be useful in your
particular situation, especially if most of your jobs require many of the same
property settings. To run this example on four labs, you can established this
in the configuration, or by the following client code:
set(pjob, 'MaximumNumberOfWorkers', 4)
set(pjob, 'MinimumNumberOfWorkers', 4)
You create the job’s one task with the usual createTask function. In this
example, the task returns only one argument from each lab, and there are no
input arguments to the colsum function.
submit(pjob)
9-6
Using a Supported Scheduler
Make the MATLAB client wait for the job to finish before collecting the
results. The results consist of one value from each lab. The gplus function in
the task shares data between the labs, so that each lab has the same result.
waitForState(pjob)
results = getAllOutputArguments(pjob)
results =
[136]
[136]
[136]
[136]
9-7
9 Programming Parallel Jobs
Introduction
This section discusses programming parallel jobs using the generic scheduler
interface. This interface lets you execute jobs on your cluster with any
scheduler you might have.
The principles of using the generic scheduler interface for parallel jobs are the
same as those for distributed jobs. The overview of the concepts and details of
submit and decode functions for distributed jobs are discussed fully in “Using
the Generic Scheduler Interface” on page 8-34 in the chapter on Programming
Distributed Jobs.
2 Set the appropriate properties on the scheduler object if they are not
defined in the configuration. Because the scheduler itself is often
common to many users and applications, it is probably best to use a
configuration for programming these properties. See “Programming with
User Configurations” on page 6-16.
9-8
Using the Generic Scheduler Interface
4 Create a task, run the job, and retrieve the results as usual.
matlabroot/toolbox/distcomp/examples/integration
At the time of publication, there are folders for Condor (condor), PBS (pbs),
and Platform LSF (lsf) schedulers, generic UNIX-based scripts (ssh), Sun
Grid Engine (sge), and mpiexec on Microsoft Windows operating systems
(winmpiexec). In addition, the pbs, lsf, and sge folders have subfolders called
shared, nonshared, and remoteSubmission, which contain scripts for use in
particular cluster configurations. Each of these subfolders contains a file
called README, which provides instruction on where and how to use its scripts.
Filename Description
distributedSubmitFcn.m Submit function for a distributed job
parallelSubmitFcn.m Submit function for a parallel job
distributedJobWrapper.sh Script that is submitted to PBS to start
workers that evaluate the tasks of a
distributed job
9-9
9 Programming Parallel Jobs
Filename Description
parallelJobWrapper.sh Script that is submitted to PBS to start labs
that evaluate the tasks of a parallel job
destroyJobFcn.m Script to destroy a job from the scheduler
extractJobId.m Script to get the job’s ID from the scheduler
getJobStateFcn.m Script to get the job’s state from the scheduler
getSubmitString.m Script to get the submission string for the
scheduler
These files are all programmed to use the standard decode functions provided
with the product, so they do not have specialized decode functions. For
parallel jobs, the standard decode function provided with the product is
parallel.cluster.generic.parallelDecodeFcn. You can view the required
variables in this file by typing
edit parallel.cluster.generic.parallelDecodeFcn
The folder for other scheduler types contain similar files. As more files
or solutions for more schedulers might become available at any time,
visit the support page for this product on the MathWorks Web site at
http://www.mathworks.com/support/product/product.html?product=DM.
This Web page also provides contact information in case you have any
questions.
9-10
Further Notes on Parallel Jobs
Suppose you have a codistributed array D, and you want to use the gather
function to assemble the entire array in the workspace of a single lab.
if labindex == 1
assembled = gather(D);
end
The reason this fails is because the gather function requires communication
between all the labs across which the array is distributed. When the if
statement limits execution to a single lab, the other labs required for
execution of the function are not executing the statement. As an alternative,
you can use gather itself to collect the data into the workspace of a single lab:
assembled = gather(D, 1).
9-11
9 Programming Parallel Jobs
In another example, suppose you want to transfer data from every lab to the
next lab on the right (defined as the next higher labindex). First you define
for each lab what the labs on the left and right are.
The reason this code might fail is because, depending on the size of the data
being transferred, the labSend function can block execution in a lab until the
corresponding receiving lab executes its labReceive function. In this case, all
the labs are attempting to send at the same time, and none are attempting to
receive while labSend has them blocked. In other words, none of the labs get
to their labReceive statements because they are all blocked at the labSend
statement. To avoid this particular problem, you can use the labSendReceive
function.
9-12
10
GPU Computing
Introduction
In this section...
“Capabilities” on page 10-2
“Requirements” on page 10-2
“Demos” on page 10-3
Capabilities
This chapter describes how to program MATLAB to use your computer’s
graphics processing unit (GPU) for matrix operations. In many cases,
execution in the GPU is faster than in the CPU, so the techniques described
in this chapter might offer improved performance.
The particular workflows for these capabilities are described in the following
sections of this chapter.
Requirements
The following are required for using the GPU with MATLAB:
10-2
Introduction
Demos
Demos showing the usage of the GPU are available in the Demos node under
Parallel Computing Toolbox in the help browser. You can also access the
product demos by entering the following command at the MATLAB prompt:
10-3
10 GPU Computing
Using GPUArray
In this section...
“Transferring Data Between Workspace and GPU” on page 10-4
“Directly Creating GPU Data” on page 10-5
“Examining Data Characteristics with GPUArray Functions” on page 10-7
“Using Built-in Functions on GPUArray” on page 10-8
N = 6;
M = magic(N);
G = gpuArray(M);
G is now a MATLAB GPUArray object that represents the data of the magic
square stored on the GPU. The data provided as input to gpuArray must
be nonsparse, and either 'single', 'double', 'int8', 'int16', 'int32',
'int64', 'uint8', 'uint16', 'uint32', 'uint64', or 'logical'. (For more
information, see “Data Types” on page 10-26.)
Use the gather function to return data from the GPU back to the MATLAB
workspace:
M2 = gather(G);
10-4
Using GPUArray
G = gpuArray(ones(100, 'uint32'));
D = gather(G);
OK = isequal(D, ones(100, 'uint32'))
X = rand(1000);
G = gpuArray(X)
parallel.gpu.GPUArray
--------------------
Size: [1000 1000]
ClassUnderlying: 'double'
Complexity: 'real'
X = rand(1000);
G = gpuArray(single(X));
G = gpuArray(ones(100, 'uint32'));
10-5
10 GPU Computing
parallel.gpu.GPUArray.ones parallel.gpu.GPUArray.eye
parallel.gpu.GPUArray.zeros parallel.gpu.GPUArray.colon
parallel.gpu.GPUArray.Inf parallel.gpu.GPUArray.true
parallel.gpu.GPUArray.NaN parallel.gpu.GPUArray.false
methods('parallel.gpu.GPUArray')
The static constructors appear at the bottom of the output from this command.
help parallel.gpu.GPUArray/functionname
help parallel.gpu.GPUArray/colon
II = parallel.gpu.GPUArray.eye(1024,'int32')
parallel.gpu.GPUArray:
---------------------
Size: [1024 1024]
ClassUnderlying: 'int32'
Complexity: 'real'
10-6
Using GPUArray
parallel.gpu.GPUArray:
---------------------
Size: [100 100 50]
ClassUnderlying: 'double'
Complexity: 'real'
The default class of the data is double, so you do not have to specify it.
Z = parallel.gpu.GPUArray.zeros(8192, 1)
parallel.gpu.GPUArray:
---------------------
Size: [8192 1]
ClassUnderlying: 'double'
Complexity: 'real'
Function Description
classUnderlying Class of the underlying data in the array
isreal Indication if array data is real
length Length of vector or largest array dimension
ndims Number of dimensions in the array
size Size of array dimensions
G = gpuArray(rand(100));
10-7
10 GPU Computing
s = size(G)
100 100
The following functions and their symbol operators are enhanced to accept
GPUArray input arguments so that they execute on the GPU:
10-8
Using GPUArray
To get specific help on the overloaded functions, and to learn about any
restrictions concerning their support for GPUArray objects, type:
help parallel.gpu.GPUArray/functionname
help parallel.gpu.GPUArray/lu
Ga = gpuArray(rand(1000, 'single'));
Gfft = fft(Ga);
Gb = (real(Gfft) + Ga) * 6;
G = gather(Gb);
The whos command is instructive for showing where each variable’s data
is stored.
whos
Name Size Bytes Class
Notice that all the arrays are stored on the GPU (GPUArray), except for G,
which is the result of the gather function.
10-9
10 GPU Computing
• You can transfer or create data on the GPU, and use the resulting
GPUArray as input to enhanced built-in functions that support them. For
more information and a list of functions that support GPUArray as inputs,
see “Using Built-in Functions on GPUArray” on page 10-8.
• You can run your own MATLAB function file on a GPU.
See the arrayfun reference page for descriptions of the available options.
10-10
Executing MATLAB Code on the GPU
The function allows the gain and offset to be arrays of the same size
as rawdata, so that unique corrections can be applied to individual
measurements. In a typical situation, you might keep the correction data on
the GPU so that you do not have to transfer it for each application:
gn = gpuArray(rand(1000))/100 + 0.995;
offs = gpuArray(rand(1000))/50 - 0.01;
This runs on the GPU because the input arguments gn and offs are already
in GPU memory.
Retrieve the corrected results from the GPU to the MATLAB workspace:
results = gather(corrected);
10-11
10 GPU Computing
10-12
Executing MATLAB Code on the GPU
10-13
10 GPU Computing
Function Description
gpuDeviceCount The number of GPU cards in your computer
gpuDevice Select which card to use, or see which card is
selected and view its properties
gpuDeviceCount
2 With two devices, the first is the default. You can examine its properties
to determine if that is the one you want to use:
gpuDevice
parallel.gpu.CUDADevice handle
Package: parallel.gpu
Properties:
Name: 'Tesla C1060'
Index: 1
ComputeCapability: '1.3'
SupportsDouble: 1
DriverVersion: 3.1
MaxThreadsPerBlock: 512
MaxShmemPerBlock: 16384
MaxThreadBlockSize: [512 512 64]
MaxGridSize: [65535 65535]
10-14
Identifying Your GPU
SIMDWidth: 32
TotalMemory: 4.2948e+09
FreeMemory: 4.2563e+09
MultiprocessorCount: 30
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
DeviceSupported: 1
DeviceSelected: 1
3 To use another device, call gpuDevice with the index of the other card, and
view its properties to verify that it is the one you want. For example, this
step chooses and views the second device (indexing is 1-based):
gpuDevice(2)
Note If you select a device that does not have sufficient compute capability,
you get a warning and you will not be able to use that device.
10-15
10 GPU Computing
k = parallel.gpu.CUDAKernel('myfun.ptx', 'myfun.cu');
10-16
Executing CUDA or PTX Code on the GPU
k = parallel.gpu.CUDAKernel('conv.ptx', 'conv.cu');
Because the inputs are MATLAB variables for data in the MATLAB
workspace, the output is also in the MATLAB workspace.
k = parallel.gpu.CUDAKernel('conv.ptx', 'conv.cu');
i1 = gpuArray(rand(100, 1, 'single'));
i2 = gpuArray(rand(100, 1, 'single'));
Because the inputs are GPUArray objects, the output is also GPUArray. You
can now perform other operations using this input or output data without
further transfers between the MATLAB workspace and the GPU. When all
your GPU computations are complete, gather your final result data into the
MATLAB workspace:
r1 = gather(o1);
r2 = gather(o2);
For example, if the C kernel within a CU file has the following signature:
10-17
10 GPU Computing
the corresponding kernel object (k) in MATLAB has the following properties:
MaxNumLHSArguments: 1
NumRHSArguments: 2
ArgumentTypes: {'inout single vector' 'in single scalar'}
Therefore, to use the kernel object from this code with feval, you need to
provide feval two input arguments (in addition to the kernel object), and
you can use one output argument:
MaxNumLHSArguments: 2
NumRHSArguments: 3
ArgumentTypes: {'in single vector' 'inout single vector' 'inout single vector'}
You can use feval on this code’s kernel (k) with the syntax:
The three input arguments x1, x2, and x3, correspond to the three arguments
that are passed into the C function. The output arguments y1 and y2,
correspond to the values of pInOut1 and pInOut2 after the C kernel has
executed.
10-18
Executing CUDA or PTX Code on the GPU
Property Description
ThreadBlockSize Size of block of threads on the kernel. This can be an
integer vector of length 1, 2, or 3 (since thread blocks can
be up to 3-dimensional). The product of the elements of
ThreadBlockSize must not exceed the MaxThreadsPerBlock
for this kernel, and no element of ThreadBlockSize can
exceed the corresponding element of the gpuDevice property
MaxThreadBlockSize.
MaxThreadsPerBlock Maximum number of threads permissible in a single block
for this CUDA kernel. The product of the elements of
ThreadBlockSize must not exceed this value.
GridSize Size of grid (effectively the number of thread blocks that will
be launched independently by the GPU). This is an integer
vector of length 1 or 2. There is no upper bound on the product
of these numbers, but do note that if a GPU is not being
used in exclusive mode (e.g., it is also being used to drive a
display), there is an upper bound of 5 seconds on any CUDA
kernel, after which the CUDA driver times out the kernel
and returns an error.
SharedMemorySize The amount of dynamic shared memory (in bytes) that each
thread block can use. Each thread block has an available
shared memory region. The size of this region is limited in
current cards to ~16 kB, and is shared with registers on
the multiprocessors. As with all memory, this needs to be
allocated before the kernel is launched. It is also common for
the size of this shared memory region to be tied to the size of
the thread block. Setting this value on the kernel ensures
that each thread in a block can access this available shared
memory region.
EntryPoint (read-only) A string containing the actual entry point name
in the PTX code that this kernel is going to call. An example
might look like '_Z13returnPointerPKfPy'.
MaxNumLHSArguments (read-only) The maximum number of left hand side arguments
that this kernel supports. It cannot be greater than the
number of right hand side arguments, and if any inputs are
constant or scalar it will be less.
10-19
10 GPU Computing
Property Description
NumRHSArguments (read-only) The required number of right hand side arguments
needed to call this kernel. All inputs need to define either the
scalar value of an input, the data for a vector input/output, or
the size of an output argument.
ArgumentTypes (read-only) Cell array of strings, the same length as
NumRHSArguments. Each of the strings indicates what the
expected MATLAB type for that input is (a numeric type such
as uint8, single, or double followed by the word scalar or
vector to indicate if we are passing by reference or value). In
addition, if that argument is only an input to the kernel, it is
prefixed by in; and if it is an input/output, it is prefixed by
inout. This allows you to decide how to efficiently call the
kernel with both MATLAB data and GPUArray, and to see
which of the kernel inputs are being treated as outputs.
k = parallel.gpu.CUDAKernel('conv.ptx', 'conv.cu')
k =
parallel.gpu.CUDAKernel handle
Package: parallel.gpu
Properties:
ThreadBlockSize: [1 1 1]
MaxThreadsPerBlock: 512
GridSize: [1 1]
SharedMemorySize: 0
EntryPoint: '_Z8theEntryPf'
MaxNumLHSArguments: 1
NumRHSArguments: 2
ArgumentTypes: {'in single vector' 'inout single vector'}
10-20
Executing CUDA or PTX Code on the GPU
A single PTX file can contain multiple entry points to different kernels. Each
of these entry points has a unique name. These names are generally mangled
(as in C++ mangling). However, when generated by nvcc the PTX name
always contains the original function name from the CU. For example, if the
CU file defines the kernel function as
When you have multiple entry points, specify the entry name for the
particular kernel when calling CUDAKernel to generate your kernel.
Note The CUDAKernel function searches for your entry name in the PTX file,
and matches on any substring occurrences. Therefore, you should not name
any of your entries as substrings of any others.
In parsing C prototype, the supported C data types are listed in the following
table.
10-21
10 GPU Computing
• The kernel must return nothing, and operate only on its input arguments
(scalars or pointers).
• A kernel is unable to allocate any form of memory, so all outputs must
be pre-allocated before the kernel is executed. Therefore, the sizes of all
outputs must be known before you run the kernel.
• In principle, all pointers passed into the kernel that are not const could
contain output data, since the many threads of the kernel could modify
that data.
10-22
Executing CUDA or PTX Code on the GPU
These rules have some implications. The most notable is that every output
from a kernel must necessarily also be an input to the kernel, since the input
allows the user to define the size of the output (which follows from being
unable to allocate memory on the GPU).
2 Compile the CU code at the shell command line to generate a PTX file
called test.ptx.
3 Create the kernel in MATLAB. Currently this PTX file only has one entry
so you do not need to specify it. If you were to put more kernels in, you
would specify add1 as the entry.
k = parallel.gpu.CUDAKernel('test.ptx', 'test.cu');
10-23
10 GPU Computing
4 Run the kernel with two inputs of 1. By default, a kernel runs on one
thread.
1 The CU code is slightly different from the last example. Both inputs are
pointers, and one is constant because you are not changing it. Each thread
will simply add the elements at its thread index. The thread index must
work out which element this thread should add. (Getting these thread- and
block-specific values is a very common pattern in CUDA programming.)
3 If this code was put in the same CU file as the first example, you need to
specify the entry point name this time to distinguish it.
4 When you run the kernel, you need to set the number of threads correctly
for the vectors you want to add.
10-24
Executing CUDA or PTX Code on the GPU
>> N = 128;
>> k.ThreadBlockSize = N;
>> o = feval(k, ones(N, 1), ones(N, 1));
10-25
10 GPU Computing
Data Types
Code in a function passed to arrayfun for execution on the GPU can use only
these GPU native data types: single, double, int32, uint32, and logical.
Complex Numbers
If the output of a function running on the GPU could potentially be complex,
you must explicitly specify its input arguments as complex. This applies to
gpuArray or to functions called in code run by arrayfun.
The following table lists the functions that might return complex data, along
with the input range over which the output remains real.
10-26
Characteristics and Limitations
MATLAB Compiler
GPU computing with the MATLAB Compiler is not supported.
10-27
10 GPU Computing
10-28
11
Object Reference
Data
codistributed Access data of arrays distributed
among workers in MATLAB pool
codistributor1d 1-D distribution scheme for
codistributed array
codistributor2dbc 2-D block-cyclic distribution scheme
for codistributed array
Composite Access nondistributed data on
multiple labs from client
distributed Access data of distributed arrays
from client
GPUArray Array of data stored on Graphics
Processing Unit (GPU)
Schedulers
ccsscheduler Access Microsoft Windows HPC
Server scheduler
genericscheduler Access generic scheduler
jobmanager Control job queue and execution
localscheduler Access local scheduler on client
machine
lsfscheduler Access Platform LSF scheduler
mpiexec Directly access mpiexec for job
distribution
pbsproscheduler Access PBS Pro scheduler
torquescheduler Access TORQUE scheduler
11-2
Jobs
Jobs
job Define job behavior and properties
when using job manager
matlabpooljob Define MATLAB pool job behavior
and properties when using job
manager
paralleljob Define parallel job behavior and
properties when using job manager
simplejob Define job behavior and properties
when using local or third-party
scheduler
simplematlabpooljob Define MATLAB pool job behavior
and properties when using local or
third-party scheduler
simpleparalleljob Define parallel job behavior and
properties when using local or
third-party scheduler
Tasks
simpletask Define task behavior and properties
when using local or third-party
scheduler
task Define task behavior and properties
when using job manager
Workers
worker Access information about MATLAB
worker session
11-3
11 Object Reference
11-4
12
Constructor findResource
12-2
ccsscheduler
12-3
codistributed
Description Data of distributed arrays that exist on the labs are accessible from the
other labs as codistributed array objects.
Codistributed arrays on labs that you create inside spmd statements can
be accessed via distributed arrays on the client.
12-4
codistributed
12-5
codistributor1d
Constructor codistributor1d
Methods codistributor1d.defaultPartition
Default partition for codistributed
array
globalIndices Global indices for local part of
codistributed array
isComplete True if codistributor object is
complete
12-6
codistributor2dbc
Constructor codistributor2dbc
Methods codistributor2dbc.defaultLabGrid
Default computational grid for
2-D block-cyclic distributed
arrays
globalIndices Global indices for local part of
codistributed array
isComplete True if codistributor object is
complete
12-7
Composite
Constructor Composite
Description Variables that exist on the labs running an spmd statement are
accessible on the client as a Composite object. A Composite resembles a
cell array with one element for each lab. So for Composite C:
12-8
Composite
12-9
distributed
Constructor distributed
Description Data of distributed arrays that exist on the labs are accessible on the
client as a distributed array. A distributed array resembles a normal
array in the way you access and manipulate its elements, but none of
its data exists on the client.
Codistributed arrays that you create inside spmd statements are
accessible via distributed arrays on the client. You can also create a
distributed array explicitly on the client with the distributed function.
12-10
distributed
12-11
genericscheduler
Constructor findResource
12-12
genericscheduler
12-13
genericscheduler
12-14
GPUArray
Constructor gpuArray
Methods The methods for a GPUArray object are too numerous to list here. Most
resemble and behave the same as built-in MATLAB functions. See
“Using GPUArray” on page 10-4. For the complete list, use the methods
function on the GPUArray class:
methods('parallel.gpu.GPUArray')
12-15
GPUDevice
Constructor gpuDevice
Methods For the complete list, use the methods function on the GPUDevice class:
methods('parallel.gpu.GPUDevice')
12-16
job
Purpose Define job behavior and properties when using job manager
Constructor createJob
Description A job object contains all the tasks that define what each worker does
as part of the complete job execution. A job object is used only with a
job manager as scheduler.
12-17
job
12-18
job
12-19
jobmanager
Constructor findResource
Description A jobmanager object provides access to the job manager, which controls
the job queue, distributes job tasks to workers or labs for execution, and
maintains job results. The job manager is provided with the MATLAB
Distributed Computing Server product, and its use as a scheduler is
optional.
12-20
jobmanager
12-21
jobmanager
12-22
localscheduler
Constructor findResource
12-23
localscheduler
12-24
lsfscheduler
Constructor findResource
12-25
lsfscheduler
12-26
matlabpooljob
Purpose Define MATLAB pool job behavior and properties when using job
manager
Constructor createMatlabPoolJob
12-27
matlabpooljob
12-28
matlabpooljob
12-29
mpiexec
Constructor findResource
Description An mpiexec object provides direct access to the mpiexec executable for
distribution of a job’s tasks to workers or labs for execution.
12-30
mpiexec
12-31
paralleljob
Purpose Define parallel job behavior and properties when using job manager
Constructor createParallelJob
Description A paralleljob object contains all the tasks that define what each
lab does as part of the complete job execution. A parallel job runs
simultaneously on all labs and uses communication among the labs
during task evaluation. A paralleljob object is used only with a job
manager as scheduler.
12-32
paralleljob
12-33
paralleljob
12-34
pbsproscheduler
Constructor findResource
12-35
pbsproscheduler
12-36
simplejob
Purpose Define job behavior and properties when using local or third-party
scheduler
Constructor createJob
Description A simplejob object contains all the tasks that define what each worker
does as part of the complete job execution. A simplejob object is used
only with a local or third-party scheduler.
12-37
simplejob
12-38
simplejob
12-39
simplematlabpooljob
Purpose Define MATLAB pool job behavior and properties when using local or
third-party scheduler
Constructor createMatlabPoolJob
12-40
simplematlabpooljob
12-41
simplematlabpooljob
12-42
simpleparalleljob
Purpose Define parallel job behavior and properties when using local or
third-party scheduler
Constructor createParallelJob
Description A simpleparalleljob object contains all the tasks that define what each
lab does as part of the complete job execution. A parallel job runs
simultaneously on all labs and uses communication among the labs
during task evaluation. A simpleparalleljob object is used only with
a local or third-party scheduler.
12-43
simpleparalleljob
12-44
simpleparalleljob
12-45
simpletask
Purpose Define task behavior and properties when using local or third-party
scheduler
Constructor createTask
Description A simpletask object defines what each lab or worker does as part of the
complete job execution. A simpletask object is used only with a local
or third-party scheduler.
12-46
simpletask
12-47
task
Purpose Define task behavior and properties when using job manager
Constructor createTask
Description A task object defines what each lab or worker does as part of the
complete job execution. A task object is used only with a job manager
as scheduler.
12-48
task
12-49
task
12-50
torquescheduler
Constructor findResource
12-51
torquescheduler
12-52
worker
Constructor getCurrentWorker
Description A worker object represents the MATLAB worker session that evaluates
tasks in a job scheduled by a job manager. Only worker sessions started
with the startworker script can be represented by a worker object.
Methods None
12-53
worker
12-54
13
Function Reference
13-2
Distributed and Codistributed Arrays
Interactive Functions
mpiprofile Profile parallel communication and
execution times
pmode Interactive Parallel Command
Window
Toolbox Functions
codistributed Create codistributed array from
replicated local data
codistributed.build Create codistributed array from
distributed data
codistributed.colon Distributed colon operation
codistributor Create codistributor object for
codistributed arrays
13-3
13 Function Reference
13-4
Distributed and Codistributed Arrays
13-5
13 Function Reference
Job Creation
createJob Create job object in scheduler and
client
createMatlabPoolJob Create MATLAB pool job
13-6
Jobs and Tasks
Job Management
cancel Cancel job or task
changePassword Prompt user to change job manager
password
clearLocalPassword Delete local store of user’s job
manager password
13-7
13 Function Reference
13-8
Interlab Communication Within a Parallel Job
Object Control
clear Remove objects from MATLAB
workspace
get Object properties
inspect Open Property Inspector
length Length of object array
methods List functions of object class
set Configure or display object properties
size Size of object array
13-9
13 Function Reference
13-10
Graphics Processing Unit
Utilities
help Help for toolbox functions in
Command Window
pctRunDeployedCleanup Clean up after deployed parallel
applications
13-11
13 Function Reference
13-12
14
Functions — Alphabetical
List
arrayfun
Syntax A = arrayfun(FUN, B)
A = arrayfun(FUN, B, C, ...)
[A, B, ...] = arrayfun(FUN, C, ...)
14-2
arrayfun
s1 = gpuArray(rand(400));
s2 = gpuArray(rand(400));
s3 = gpuArray(rand(400));
[o1, o2] = arrayfun(@aGpuFunction, s1, s2, s3)
o1 =
parallel.gpu.GPUArray:
---------------------
Size: [400 400]
ClassUnderlying: 'double'
Complexity: 'real'
o2 =
parallel.gpu.GPUArray:
---------------------
14-3
arrayfun
Use gather to retrieve the data from the GPU to the MATLAB
workspace.
d = gather(o2);
14-4
batch
Syntax j = batch('aScript')
j = batch(schedobj, 'aScript')
j = batch(fcn, N, {x1, ..., xn})
j = batch(schedobj, fcn, N, {x1, ..., xn})
j = batch(..., 'p1', v1, 'p2', v2, ...)
14-5
batch
batch(...,'Configuration', defaultParallelConfig)
14-6
batch
Remarks As a matter of good programming practice, when you no longer need it,
you should destroy the job created by the batch function so that it does
not continue to consume cluster storage resources.
14-7
batch
Run a batch MATLAB pool job on a remote cluster, using eight workers
for the MATLAB pool in addition to the worker running the batch script.
Capture the diary, and load the results of the job into the workspace.
This job requires a total of nine workers:
Run a batch MATLAB pool job on a local worker, which employs two
other local workers:
Clean up a batch job’s data after you are finished with it:
destroy(j)
14-8
cancel
Syntax cancel(t)
cancel(j)
Description cancel(t) stops the task object, t, that is currently in the pending or
running state. The task’s State property is set to finished, and no
output arguments are returned. An error message stating that the task
was canceled is placed in the task object’s ErrorMessage property, and
the worker session running the task is restarted.
cancel(j) stops the job object, j, that is pending, queued, or running.
The job’s State property is set to finished, and a cancel is executed
on all tasks in the job that are not in the finished state. A job object
that has been canceled cannot be started again.
If the job is running in a job manager, any worker sessions that are
evaluating tasks belonging to the job object will be restarted.
Examples Cancel a task. Note afterward the task’s State, ErrorMessage, and
OutputArguments properties.
job1 = createJob(jm);
t = createTask(job1, @rand, 1, {3,3});
cancel(t)
get(t)
ID: 1
Function: @rand
NumberOfOutputArguments: 1
InputArguments: {[3] [3]}
OutputArguments: {1x0 cell}
CaptureCommandWindowOutput: 0
CommandWindowOutput: ''
14-9
cancel
State: 'finished'
ErrorMessage: 'Task cancelled by user'
ErrorIdentifier: 'distcomp:task:Cancelled'
Timeout: Inf
CreateTime: 'Fri Oct 22 11:38:39 EDT 2004'
StartTime: 'Fri Oct 22 11:38:46 EDT 2004'
FinishTime: 'Fri Oct 22 11:38:46 EDT 2004'
Worker: []
Parent: [1x1 distcomp.job]
UserData: []
RunningFcn: []
FinishedFcn: []
14-10
changePassword
Syntax changePassword(jm)
changePassword(jm, username)
Description changePassword(jm) prompts the user to change the password for the
current user. The user’s current password must be entered as well as
the new password.
changePassword(jm, username) prompts the job manager’s admin
user to change the password for the specified user. The admin user’s
password must be entered as well as the user’s new password. This
enables the admin user to reset a password if the user has forgotten it.
For more information on job manager security, see “Setting Job
Manager Security”.
14-11
clear
Remarks If obj references an object in the job manager, it is cleared from the
workspace, but it remains in the job manager. You can restore obj to
the workspace with the findResource, findJob, or findTask function;
or with the Jobs or Tasks property.
Examples This example creates two job objects on the job manager jm. The
variables for these job objects in the MATLAB workspace are job1 and
job2. job1 is copied to a new variable, job1copy; then job1 and job2
are cleared from the MATLAB workspace. The job objects are then
restored to the workspace from the job object’s Jobs property as j1
and j2, and the first job in the job manager is shown to be identical to
job1copy, while the second job is not.
job1 = createJob(jm);
job2 = createJob(jm);
job1copy = job1;
clear job1 job2;
j1 = jm.Jobs(1);
j2 = jm.Jobs(2);
isequal (job1copy, j1)
ans =
1
isequal (job1copy, j2)
ans =
0
14-12
clearLocalPassword
Syntax clearLocalPassword(jm)
14-13
codistributed
Syntax C = codistributed(X)
C = codistributed(X, codist)
C = codistributed(X, codist, lab)
C = codistributed(C1, codist)
spmd
N = 1000;
X = magic(N); % Replicated on every lab
C1 = codistributed(X); % Partitioned among the labs
14-14
codistributed
end
spmd
N = 1000;
X = magic(N);
C2 = codistributed(X, codistributor1d(1));
end
14-15
codistributed.build
spmd
N = 1001;
globalSize = [N, N];
% Distribute the matrix over the second dimension (columns),
% and let the codistributor derive the partition from the
% global size.
codistr = codistributor1d(2, ...
codistributor1d.unsetPartition, globalSize)
14-16
codistributed.build
14-17
codistributed.cell
Syntax C = codistributed.cell(n)
C = codistributed.cell(m, n, p, ...)
C = codistributed.cell([m, n, p, ...])
C = cell(n, codist)
C = cell(m, n, p, ..., codist)
C = cell([m, n, p, ...], codist)
spmd
C = cell(8, codistributor1d());
end
14-18
codistributed.cell
spmd(4)
C = codistributed.cell(1000);
end
spmd(4)
codist = codistributor1d(2, 1:numlabs);
C = cell(10, 10, codist);
end
14-19
codistributed.colon
Syntax codistributed.colon(a,d,b)
codistributed.colon(a,b)
Examples Partition the vector 1:10 into four subvectors among four labs.
14-20
codistributed.colon
LocalPart: [7 8]
Codistributor: [1x1 codistributor1d]
Lab 4:
This lab stores C(9:10).
LocalPart: [9 10]
Codistributor: [1x1 codistributor1d]
14-21
codistributed.eye
Syntax C = codistributed.eye(n)
C = codistributed.eye(m, n)
C = codistributed.eye([m, n])
C = eye(n, codist)
C = eye(m, n, codist)
C = eye([m, n], codist)
14-22
codistributed.eye
spmd
C = eye(8, codistributor1d());
end
spmd(4)
C = codistributed.eye(1000);
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs);
C = eye(10, 10, 'uint16', codist);
end
14-23
codistributed.false
Syntax F = codistributed.false(n)
F = codistributed.false(m, n, ...)
F = codistributed.false([m, n, ...])
F = false(n, codist)
F = false(m, n, ..., codist)
F = false([m, n, ...], codist)
spmd
F = false(8, codistributor1d());
end
14-24
codistributed.false
spmd(4)
F = false(1000, codistributor());
end
spmd
codist = codistributor('1d', 2, 1:numlabs);
F = false(10, 10, codist);
end
14-25
codistributed.Inf
Syntax C = codistributed.Inf(n)
C = codistributed.Inf(m, n, ...)
C = codistributed.Inf([m, n, ...])
C = Inf(n, codist)
C = Inf(m, n, ..., codist)
C = Inf([m, n, ...], codist)
spmd
C = Inf(8, codistributor1d());
14-26
codistributed.Inf
end
spmd(4)
C = Inf(1000, codistributor())
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs);
C = Inf(10, 10, 'single', codist);
end
14-27
codistributed.NaN
Syntax C = codistributed.NaN(n)
C = codistributed.NaN(m, n, ...)
C = codistributed.NaN([m, n, ...])
C = NaN(n, codist)
C = NaN(m, n, ..., codist)
C = NaN([m, n, ...], codist)
spmd
C = NaN(8, codistributor1d());
14-28
codistributed.NaN
end
spmd(4)
C = NaN(1000, codistributor())
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs);
C = NaN(10, 10, 'single', codist);
end
14-29
codistributed.ones
Syntax C = codistributed.ones(n)
C = codistributed.ones(m, n, ...)
C = codistributed.ones([m, n, ...])
C = ones(n, codist)
C = ones(m, n, codist)
C = ones([m, n], codist)
14-30
codistributed.ones
spmd
C = ones(8, codistributor1d());
end
spmd(4)
C = codistributed.ones(1000, codistributor());
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs);
C = ones(10, 10, 'uint16', codist);
end
14-31
codistributed.rand
Syntax R = codistributed.rand(n)
R = codistributed.rand(m, n, ...)
R = codistributed.rand([m, n, ...])
R = rand(n, codist)
R = rand(m, n, codist)
R = rand([m, n], codist)
14-32
codistributed.rand
spmd
R = codistributed.rand(8, codistributor1d());
end
Remarks When you use rand on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
spmd(4)
R = codistributed.rand(1000, codistributor())
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs);
R = codistributed.rand(10, 10, 'uint16', codist);
end
14-33
codistributed.randn
Syntax RN = codistributed.randn(n)
RN = codistributed.randn(m, n, ...)
RN = codistributed.randn([m, n, ...])
RN = randn(n, codist)
RN = randn(m, n, codist)
RN = randn([m, n], codist)
14-34
codistributed.randn
spmd
RN = codistributed.randn(8, codistributor1d());
end
Remarks When you use randn on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
spmd(4)
RN = codistributed.randn(1000);
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs);
RN = randn(10, 10, 'uint16', codist);
end
14-35
codistributed.spalloc
14-36
codistributed.spalloc
14-37
codistributed.speye
Syntax CS = codistributed.speye(n)
CS = codistributed.speye(m, n)
CS = codistributed.speye([m, n])
CS = speye(n, codist)
CS = speye(m, n, codist)
CS = speye([m, n], codist)
spmd
CS = codistributed.speye(8, codistributor1d());
end
14-38
codistributed.speye
spmd(4)
CS = speye(1000, codistributor())
end
spmd(4)
codist = codistributor1d(2, 1:numlabs);
CS = speye(10, 10, codist);
end
14-39
codistributed.sprand
spmd
CS = codistributed.sprand(8, 8, 0.2, codistributor1d());
end
Remarks When you use sprand on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
14-40
codistributed.sprand
spmd(4)
CS = codistributed.sprand(1000, 1000, .001);
end
spmd(4)
codist = codistributor1d(2, 1:numlabs);
CS = sprand(10, 10, .1, codist);
end
14-41
codistributed.sprandn
spmd
CS = codistributed.sprandn(8, 8, 0.2, codistributor1d());
end
Remarks When you use sprandn on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
14-42
codistributed.sprandn
spmd(4)
CS = codistributed.sprandn(1000, 1000, .001);
end
spmd(4)
codist = codistributor1d(2, 1:numlabs);
CS = sprandn(10, 10, .1, codist);
end
14-43
codistributed.true
Syntax T = codistributed.true(n)
T = codistributed.true(m, n, ...)
T = codistributed.true([m, n, ...])
T = true(n, codist)
T = true(m, n, ..., codist)
T = true([m, n, ...], codist)
spmd
T = true(8, codistributor1d());
end
14-44
codistributed.true
spmd(4)
T = true(1000, codistributor());
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs);
T = true(10, 10, codist);
end
14-45
codistributed.zeros
Syntax C = codistributed.zeros(n)
C = codistributed.zeros(m, n, ...)
C = codistributed.zeros([m, n, ...])
C = zeros(n, codist)
C = zeros(m, n, codist)
C = zeros([m, n], codist)
14-46
codistributed.zeros
spmd
C = zeros(8, codistributor1d());
end
spmd(4)
C = codistributed.zeros(1000, codistributor());
end
spmd(4)
codist = codistributor('1d', 2, 1:numlabs)
C = zeros(10, 10, 'uint16', codist);
end
14-47
codistributor
Description There are two schemes for distributing arrays. The scheme denoted by
the string '1d' distributes an array along a single specified subscript,
the distribution dimension, in a noncyclic, partitioned manner.
The scheme denoted by '2dbc', employed by the parallel matrix
computation software ScaLAPACK, applies only to two-dimensional
arrays, and varies both subscripts over a rectangular computational
grid of labs in a blocked, cyclic manner.
codist = codistributor(), with no arguments, returns a default
codistributor object with zero-valued or empty parameters, which
can then be used as an argument to other functions to indicate that
the function is to create a codistributed array if possible with default
distribution. For example,
Z = zeros(..., codistributor())
R = randn(..., codistributor())
14-48
codistributor
spmd
dim = 2; % distribution dimension
codist = codistributor('1d', dim, [1 2 1 2], [2 6 4]);
if mod(labindex, 2)
L = rand(2,1,4);
else
L = rand(2,2,4);
end
A = codistributed.build(L, codist)
end
A
spmd
dim = 1; % distribution dimension
partn = codistributor1d.defaultPartition(20);
codist = codistributor('1d', dim, partn, [20 5]);
14-49
codistributor
L = magic(5) + labindex;
A = codistributed.build(L, codist)
end
A
14-50
codistributor1d
14-51
codistributor1d
N = 1000;
spmd
codistr = codistributor1d(1); % 1 spec 1st dimension (rows).
C = codistributed.ones(N, codistr);
end
N = 1000;
spmd
codistr = codistributor1d( ...
codistributor1d.unsetDimension, ...
codistributor1d.unsetPartition, ...
[N, N]);
myLocalSize = [N, N]; % start with full size on each lab
% then set myLocalSize to default part of whole array:
myLocalSize(codistr.Dimension) = codistr.Partition(labindex);
myLocalPart = labindex*ones(myLocalSize); % arbitrary values
D = codistributed.build(myLocalPart, codistr);
end
spy(D == 2);
14-52
codistributor1d
14-53
codistributor1d.defaultPartition
Syntax P = codistributor1d.defaultPartition(n)
spmd
P = codistributor1d.defaultPartition(10)
end
14-54
codistributor2dbc
Description The 2-D block-cyclic codistributor can be used only for two-dimensional
arrays. It distributes arrays along two subscripts over a rectangular
computational grid of labs in a block-cyclic manner. For a complete
description of 2-D block-cyclic distribution, default parameters, and
the relationship between block size and lab grid, see “2-Dimensional
Distribution” on page 5-17. The 2-D block-cyclic codistributor is used by
the ScaLAPACK parallel matrix computation software library.
codist = codistributor2dbc() forms a 2-D block-cyclic codistributor
object using default lab grid and block size.
codist = codistributor2dbc(lbgrid) forms a 2-D block-cyclic
codistributor object using the specified lab grid and default block size.
lbgrid must be a two-element vector defining the rows and columns
of the lab grid, and the rows times columns must equal the number of
labs for the codistributed array.
codist = codistributor2dbc(lbgrid, blksize) forms a 2-D
block-cyclic codistributor object using the specified lab grid and block
size.
codist = codistributor2dbc(lbgrid, blksize, orient) allows an
orientation argument. Valid values for the orientation argument are
'row' for row orientation, and 'col' for column orientation of the lab
grid. The default is row orientation.
The resulting codistributor of any of the above syntax is incomplete
because its global size is not specified. A codistributor constructed
this way can be used as an argument to other functions as a template
codistributor when creating codistributed arrays.
14-55
codistributor2dbc
N = 1000;
spmd
codistr = codistributor2dbc();
D = codistributed.ones(N, codistr);
end
N = 1000;
spmd
codistr = codistributor2dbc(...
codistributor2dbc.defaultLabGrid, ...
codistributor2dbc.defaultBlockSize, ...
'row', [N, N]);
myLocalSize = [length(codistr.globalIndices(1)), ...
length(codistr.globalIndices(2))];
myLocalPart = labindex*ones(myLocalSize);
D = codistributed.build(myLocalPart, codistr);
end
spy(D == 2);
14-56
codistributor2dbc.defaultLabGrid
Examples View the computational grid layout of the default distribution scheme
for the open MATLAB pool.
spmd
grid = codistributor2dbc.defaultLabGrid
end
14-57
Composite
Syntax C = Composite()
C = Composite(nlabs)
Examples Create a Composite object with no defined entries, then assign its
values:
14-58
createJob
Description obj = createJob() creates a job using the scheduler identified by the
default parallel configuration and sets the property values of the job as
specified in the default configuration.
obj = createJob(scheduler) creates a job object at the data location
for the identified scheduler, or in the job manager. When you specify a
scheduler without using the configuration option, no configuration
is used, so no configuration properties are applied to the job object.
obj = createJob(..., 'p1', v1, 'p2', v2, ...) creates a job
object with the specified property values. For a listing of the valid
properties of the created object, see the job object reference page (if
using a job manager) or simplejob object reference page (if using a
third-party scheduler). If an invalid property name or property value is
specified, the object will not be created.
Note that the property value pairs can be in any format supported
by the set function, i.e., param-value string pairs, structures, and
param-value cell array pairs. If a structure is used, the structure field
names are job object property names and the field values specify the
property values.
14-59
createJob
obj = createJob();
for i = 1:10
createTask(obj, @rand, 1, {10});
end
submit(obj);
Wait for the job to finish running, and retrieve the job results.
waitForState(obj);
out = getAllOutputArguments(obj);
disp(out{3});
destroy(obj);
14-60
createMatlabPoolJob
j = createMatlabPoolJob('Name', 'testMatlabPooljob');
14-61
createMatlabPoolJob
j.MinimumNumberOfWorkers = 5;
j.MaximumNumberOfWorkers = 10;
submit(j)
waitForState(j, 'finished');
out = getAllOutputArguments(j);
celldisp(out);
destroy(j);
14-62
createParallelJob
14-63
createParallelJob
pjob = createParallelJob();
set(pjob,'MinimumNumberOfWorkers',3);
set(pjob,'MaximumNumberOfWorkers',3);
submit(pjob);
Wait for the job to finish running, and retrieve the job results.
waitForState(pjob);
out = getAllOutputArguments(pjob);
celldisp(out);
out{1} =
0.9501 0.4860 0.4565
0.2311 0.8913 0.0185
0.6068 0.7621 0.8214
out{2} =
0.9501 0.4860 0.4565
0.2311 0.8913 0.0185
14-64
createParallelJob
destroy(pjob);
14-65
createTask
14-66
createTask
14-67
createTask
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
submit(j);
Wait for the job to finish running, and get the output from the task
evaluation.
waitForState(j);
taskoutput = get(obj, 'OutputArguments');
disp(taskoutput{1});
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
t = createTask(j, @rand, 1, {{10,10} {10,10} {10,10}});
14-68
defaultParallelConfig
14-69
defaultParallelConfig
If the default configuration has been deleted, or if it has never been set,
defaultParallelConfig returns 'local' as the default configuration.
Examples Read the name of the default parallel configuration that is currently in
effect, and get a listing of all available configurations.
defaultParallelConfig('MyConfig')
14-70
demote
Description demote(jm, job) demotes the job object job that is queued in the job
manager jm.
If job is not the last job in the queue, demote exchanges the position
of job and the job that follows it in the queue.
Examples Create and submit multiple jobs to the job manager identified by the
default parallel configuration:
jm = findResource();
j1 = createJob('name','Job A');
j2 = createJob('name','Job B');
j3 = createJob('name','Job C');
submit(j1);submit(j2);submit(j3);
demote(jm, j2)
14-71
demote
'Job A'
'Job C'
'Job B'
14-72
destroy
Syntax destroy(obj)
Description destroy(obj) removes the job object reference or task object reference
obj from the local session, and removes the object from the job manager
memory. When obj is destroyed, it becomes an invalid object. You can
remove an invalid object from the workspace with the clear command.
If multiple references to an object exist in the workspace, destroying
one reference to that object invalidates all the remaining references to
it. You should remove these remaining references from the workspace
with the clear command.
The task objects contained in a job will also be destroyed when a job
object is destroyed. This means that any references to those task objects
will also be invalid.
Remarks Because its data is lost when you destroy an object, destroy should be
used after output data has been retrieved from a job object.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm, 'Name', 'myjob');
t = createTask(j, @rand, 1, {10});
destroy(j);
clear t
clear j
14-73
dfeval
14-74
dfeval
tasks that each generate three output arguments, the results of dfeval
are three cell arrays of 10 elements each. When evaluation is complete,
dfeval destroys the job.
y = dfeval( ..., 'P1',V1,'P2',V2,...) accepts additional
arguments for configuring different properties associated with the job.
Valid properties and property values are
14-75
dfeval
Note that dfeval runs synchronously (sync); that is, it does not return
the MATLAB prompt until the job is completed. For further discussion
of the usage of dfeval, see “Evaluating Functions Synchronously” on
page 7-2.
Examples Create three tasks that return a 1-by-1, a 2-by-2, and a 3-by-3 random
matrix.
y = dfeval(@rand,{1 2 3})
y =
[ 0.9501]
[2x2 double]
[3x3 double]
Create two tasks that return random matrices of size 2-by-3 and 1-by-4.
Create two tasks, where the first task creates a 1-by-2 random array
and the second task creates a 3-by-4 array of zeros.
14-76
dfeval
Evaluate the user function myFun using the cluster as defined in the
configuration myConfig.
14-77
dfevalasync
14-78
dfevalasync
When the job is finished, you can obtain the results associated with
the job.
waitForState(job);
data = getAllOutputArguments(job)
data =
[ 3]
[ 7]
[11]
14-79
diary
Syntax diary(job)
diary(job, 'filename')
Description diary(job) displays the Command Window output from the batch job
in the MATLAB Command Window. The Command Window output will
be captured only if the batch command included the 'CaptureDiary'
argument with a value of true.
diary(job, 'filename') causes the Command Window output from
the batch job to be appended to the specified file.
14-80
distributed
Syntax D = distributed(X)
Nsmall = 50;
D1 = distributed(magic(Nsmall));
Nlarge = 1000;
D2 = distributed.rand(Nlarge);
14-81
distributed.cell
Syntax D = distributed.cell(n)
D = distributed.cell(m, n, p, ...)
D = distributed.cell([m, n, p, ...])
D = distributed.cell(1000)
14-82
distributed.eye
Syntax D = distributed.eye(n)
D = distributed.eye(m, n)
D = distributed.eye([m, n])
D = distributed.eye(..., classname)
D = distributed.eye(1000)
14-83
distributed.false
Syntax F = distributed.false(n)
F = distributed.false(m, n, ...)
F = distributed.false([m, n, ...])
F = distributed.false(1000);
14-84
distributed.Inf
Syntax D = distributed.Inf(n)
D = distributed.Inf(m, n, ...)
D = distributed.Inf([m, n, ...])
D = distributed.Inf(..., classname)
D = distributed.Inf(1000)
14-85
distributed.NaN
Syntax D = distributed.NaN(n)
D = distributed.NaN(m, n, ...)
D = distributed.NaN([m, n, ...])
D = distributed.NaN(..., classname)
D = distributed.NaN(1000)
14-86
distributed.ones
Syntax D = distributed.ones(n)
D = distributed.ones(m, n, ...)
D = distributed.ones([m, n, ...])
D = distributed.ones(..., classname)
D = distributed.ones(1000);
14-87
distributed.rand
Syntax R = distributed.rand(n)
R = distributed.rand(m, n, ...)
R = distributed.rand([m, n, ...])
R = distributed.rand(..., classname)
Remarks When you use rand on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
R = distributed.rand(1000);
14-88
distributed.randn
Syntax RN = distributed.randn(n)
RN = distributed.randn(m, n, ...)
RN = distributed.randn([m, n, ...])
RN = distributed.randn(..., classname)
Remarks When you use randn on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
RN = distributed.randn(1000);
14-89
distributed.spalloc
Examples Allocate space for a 1000-by-1000 sparse distributed matrix with room
for up to 2000 nonzero elements, then define several elements:
N = 1000;
SD = distributed.spalloc(N, N, 2*N);
for ii=1:N-1
SD(ii,ii:ii+1) = [ii ii];
end
14-90
distributed.speye
Syntax DS = distributed.speye(n)
DS = distributed.speye(m, n)
DS = distributed.speye([m, n])
N = 1000;
DS = distributed.speye(N);
14-91
distributed.sprand
Remarks When you use sprand on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
14-92
distributed.sprandn
Remarks When you use sprandn on the workers in the MATLAB pool, or in a
distributed or parallel job (including pmode), each worker or lab sets its
random generator seed to a value that depends only on the lab index
or task ID. Therefore, the array on each lab is unique for that job.
However, if you repeat the job, you get the same random data.
14-93
distributed.true
Syntax T = distributed.true(n)
T = distributed.true(m, n, ...)
T = distributed.true([m, n, ...])
T = distributed.true(1000);
14-94
distributed.zeros
Syntax D = distributed.zeros(n)
D = distributed.zeros(m, n, ...)
D = distributed.zeros([m, n, ...])
D = distributed.zeros(..., classname)
D = distributed.zeros(1000);
14-95
dload
Syntax dload
dload filename
dload filename X
dload filename X Y Z ...
dload -scatter ...
[X, Y, Z, ...] = dload('filename', 'X', 'Y', 'Z', ...)
Description dload without any arguments retrieves all variables from the binary
file named matlab.mat. If matlab.mat is not available, the command
generates an error.
dload filename retrieves all variables from a file given a full pathname
or a relative partial pathname. If filename has no extension, dload
looks for filename.mat. dload loads the contents of distributed arrays
and Composite objects onto MATLAB pool workers, other data types are
loaded directly into the workspace of the MATLAB client.
dload filename X loads only variable X from the file. dload filename
X Y Z ... loads only the specified variables. dload does not support
wildcards, nor the -regexp option. If any requested variable is not
present in the file, a warning is issued.
dload -scatter ... distributes nondistributed data if possible. If the
data cannot be distributed, a warning is issued.
[X, Y, Z, ...] = dload('filename', 'X', 'Y', 'Z', ...)
returns the specified variables as separate output arguments (rather
than a structure, which the load function returns). If any requested
variable is not present in the file, an error occurs.
When loading distributed arrays, the data is distributed over the
available MATLAB pool workers using the default distribution scheme.
It is not necessary to have the same size MATLAB pool open when
loading as when saving using dsave.
When loading Composite objects, the data is sent to the available
MATLAB pool workers. If the Composite is too large to fit on the current
14-96
dload
MATLAB pool, the data is not loaded. If the Composite is smaller than
the current MATLAB pool, a warning is issued.
dload fname X Y Z
Use the function form of dload to load distributed arrays P and Q from
file fname.mat:
14-97
dsave
Syntax dsave
dsave filename
dsave filename X
dsave filename X Y Z
Description dsave without any arguments creates the binary file named matlab.mat
and writes to the file all workspace variables, including distributed
arrays and Composite objects. You can retrieve the variable data using
dload.
dsave filename saves all workspace variables to the binary file named
filename.mat. If you do not specify an extension for filename, it
assumes the extension .mat.
dsave filename X saves only variable X to the file.
dsave filename X Y Z saves X, Y, and Z. dsave does not support
wildcards, nor the -regexp option.
dsave does not support saving sparse distributed arrays.
Examples With a MATLAB pool open, create and save several variables to
mydatafile.mat:
14-98
exist
Examples Define a variable on a random number of labs. Check on which labs the
Composite entries are defined, and get all those values:
spmd
if rand() > 0.5
c = labindex;
end
end
ind = exist(c);
cvals = c(ind);
14-99
feval
Description feval(KERN, x1, ..., xn) evaluates the CUDA kernel KERN with
the given arguments x1, ..., xn. The number of input arguments,
n, must equal the value of the NumRHSArguments property of KERN, and
their types must match the description in the ArgumentTypes property
of KERN. The input data can be regular MATLAB data, GPU arrays, or a
mixture of the two.
[y1, ..., ym] = feval(KERN, x1, ..., xn) returns multiple
output arguments from the evaluation of the kernel. Each output
argument corresponds to the value of the non-const pointer inputs to
the CUDA kernel after it has executed. If an input value is a GPU
array, the corresponding output value is also a GPU array. The
number of output arguments, m, must not exceed the value of the
MaxNumLHSArguments property of KERN.
Examples If the CUDA kernel within a CU file has the following signature:
MaxNumLHSArguments: 2
NumRHSArguments: 3
ArgumentTypes: {'in single vector' ...
'inout single vector' 'inout single vector'}
You can use feval on this code’s kernel (KERN) with the syntax:
The three input arguments, x1, x2, and x3, correspond to the three
arguments that are passed into the CUDA function. The output
arguments, y1 and y2, correspond to the values of pInOut1 and pInOut2
14-100
feval
after the CUDA kernel has executed. Thus, if x2 and x3 are GPU
arrays, y1 and y2 are also GPU arrays.
14-101
findJob
Description out = findJob(sched) returns an array, out, of all job objects stored
in the scheduler sched. Jobs in the array are ordered by the ID property
of the jobs, indicating the sequence in which they were created.
[pending queued running completed] = findJob(sched) returns
arrays of all job objects stored in the scheduler sched, by state. Within
pending, running, and completed, the jobs are returned in sequence
of creation. Jobs in the array queued are in the order in which they
are queued, with the job at queued(1) being the next to execute. The
completed jobs include those that failed. Jobs that are destroyed or
whose status is unavailable are not returned by this function.
14-102
findJob
14-103
findResource
14-104
findResource
14-105
findResource
14-106
findResource
all_job_managers = findResource('scheduler','type','jobmanager')
all_job_managers =
distcomp.jobmanager: 1-by-4
Find all job managers accessible from the lookup service on a particular
host.
jm = findResource('scheduler','type','jobmanager', ...
'LookupURL', 'subnet2.hostalpha:6789', 'Name', 'SN2JMgr');
lsf_sched = findResource('scheduler','type','LSF')
Create a local scheduler that will start workers on the client machine
for running your job.
local_sched = findResource('scheduler','type','local')
sched = findResource();
14-107
findResource
14-108
findTask
14-109
findTask
field names are object property names and the field values are the
appropriate property values to match.
When a property value is specified, it must use the same exact value
that the get function returns, including letter case. For example, if get
returns the Name property value as MyTask, then findTask will not find
that object while searching for a Name property value of mytask.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
obj = createJob(jm);
Create the task object t, which refers to the task we just added to obj.
t = findTask(obj)
14-110
for
14-111
for
Examples Find the rank of magic squares. Access only the local portion of a
codistributed array.
m = 10000;
for p = drange(1:numlabs)
z = rand(m, 1) + i*rand(m, 1);
c = sum(abs(z) < 1)
end
k = gplus(c)
p = 4*k/(m*numlabs);
Attempt to compute Fibonacci numbers. This will not work, because the
loop bodies are dependent.
14-112
gather
Syntax X = gather(A)
X = gather(C, lab)
14-113
gather
Examples Distribute a magic square across your labs, then gather the whole
matrix onto every lab and then onto the client. This code results in the
equivalent of M = magic(n) on all labs and the client.
n = 10;
spmd
C = codistributed(magic(n));
M = gather(C) % Gather data on all labs
end
S = gather(C) % Gather data on client
Gather all of the data in C onto lab 1, so that it can be saved from there.
n = 10;
spmd
C = codistributed(magic(n));
out = gather(C, 1);
if labindex == 1
save data.mat out;
end
end
Gather all of the data from a distributed array into D on the client.
n = 10;
D = distributed(magic(n)); % Distribute data to labs
M = gather(D) % Return data to client
G = gpuArray(rand(1024,1));
F = sqrt(G); %input and output both GPUArray
W = gather(G); % Return data to client
whos
Name Size Bytes Class
14-114
gather
14-115
gcat
Syntax Xs = gcat(X)
Xs = gcat(X, dim)
Xs = gcat(X, dim, targetlab)
Description Xs = gcat(X) concatenates the variant array X from each lab in the
second dimension. The result is replicated on all labs.
Xs = gcat(X, dim) concatenates the variant array X from each lab in
the dimension indicated by dim.
Xs = gcat(X, dim, targetlab) performs the reduction, and places
the result into res only on the lab indicated by targetlab. res is set to
[] on all other labs.
Xs = gcat(labindex)
14-116
get
Syntax get(obj)
out = get(obj)
out = get(obj,'PropertyName')
Description get(obj) returns all property names and their current values to the
command line for obj.
out = get(obj) returns the structure out where each field name is the
name of a property of obj, and each field contains the value of that
property.
out = get(obj,'PropertyName') returns the value out of the property
specified by PropertyName for obj. If PropertyName is replaced by a
1-by-n or n-by-1 cell array of strings containing property names, then
get returns a 1-by-n cell array of values to out. If obj is an array of
objects, then out will be an m-by-n cell array of property values where m
is equal to the length of obj and n is equal to the number of properties
specified.
Remarks When specifying a property name, you can do so without regard to case,
and you can make use of property name completion. For example, if jm
is a job manager object, then these commands are all valid and return
the same result.
out = get(jm,'HostAddress');
out = get(jm,'hostaddress');
out = get(jm,'HostAddr');
14-117
get
Examples This example illustrates some of the ways you can use get to return
property values for the job object j1.
get(j1,'State')
ans =
pending
get(j1,'Name')
ans =
MyJobManager_job
out = get(j1);
out.State
ans =
pending
out.Name
ans =
MyJobManager_job
14-118
getAllOutputArguments
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
14-119
getAllOutputArguments
disp(data{1});
destroy(j);
14-120
getCodistributor
Examples Get the codistributor object for a 1-D codistributed array that uses
default distribution on 4 labs:
spmd (4)
I1 = codistributed.eye(64, codistributor1d());
codist1 = getCodistributor(I1)
dim = codist1.Dimension
partn = codist1.Partition
end
Get the codistributor object for a 2-D block cyclic codistributed array
that uses default distribution on 4 labs:
spmd (4)
I2 = codistributed.eye(128, codistributor2dbc());
codist2 = getCodistributor(I2)
blocksz = codist2.BlockSize
partn = codist2.LabGrid
ornt = codist2.Orientation
end
spmd (4)
isComplete(codist1)
14-121
getCodistributor
isComplete(codist2)
end
14-122
getCurrentJob
Arguments job The job object that contains the task currently being
evaluated by the worker session.
Description job = getCurrentJob returns the job object that is the Parent of the
task currently being evaluated by the worker session.
14-123
getCurrentJobmanager
Syntax jm = getCurrentJobmanager
Arguments jm The job manager object that scheduled the task currently
being evaluated by the worker session.
14-124
getCurrentTask
Arguments task The task object that the worker session is currently
evaluating.
Description task = getCurrentTask returns the task object that is currently being
evaluated by the worker session.
14-125
getCurrentWorker
Arguments worker The worker object that is currently evaluating the task
that contains this function.
Examples Create a job with one task, and have the task return the name of the
worker that evaluates it.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
t = createTask(j, @() get(getCurrentWorker,'Name'), 1, {});
submit(j)
waitForState(j)
get(t,'OutputArgument')
ans =
'c5_worker_43'
14-126
getDebugLog
Purpose Read output messages from job run by supported third-party or local
scheduler
Examples Construct a scheduler object so you can create a parallel job. Assume
that you have already defined a configuration called mpiexec to define
the properties of the scheduler object.
job = createParallelJob(mpiexecObj);
createTask(job, @labindex, 1, {});
submit(job);
getDebugLog(mpiexecObj, job);
14-127
getDebugLog
14-128
getFileDependencyDir
ddir = getFileDependencyDir;
cdir = cd(ddir);
cd(cdir);
Properties
FileDependencies
14-129
getJobSchedulerData
Arguments userdata Information that was previously stored for this job.
sched Scheduler object identifying the generic third-party
scheduler running the job.
job Job object identifying the job for which to retrieve data.
14-130
getLocalPart
Syntax L = getLocalPart(A)
returns
14-131
globalIndices
Examples Create a 2-by-22 codistributed array among four labs, and view the
global indices on each lab:
14-132
globalIndices
spmd
C = codistributed.zeros(2, 22, codistributor1d(2,[6 6 5 5]));
if labindex == 1
K = globalIndices(C, 2); % returns K = 1:6.
elseif labindex == 2
[E,F] = globalIndices(C, 2); % returns E = 7, F = 12.
end
K = globalIndices(C, 2, 3); % returns K = 13:17.
[E,F] = globalIndices(C, 2, 4); % returns E = 18, F = 22.
end
spmd
siz = [1000, 1000];
codistr = codistributor1d(2, [], siz);
14-133
gop
Description res = gop(@F, x) is the reduction via the function F of the quantities
x from each lab. The result is duplicated on all labs.
The function F(x,y) should accept two arguments of the same type and
produce one result of that type, so it can be used iteratively, that is,
F(F(x1,x2),F(x3,x4))
res = gop(@plus,x)
res = gop(@max,x)
14-134
gop
res = gop(@horzcat,x)
14-135
gplus
Syntax S = gplus(X)
S = gplus(X, targetlab)
Description S = gplus(X) returns the addition of the variant array X from each lab.
The result S is replicated on all labs.
S = gplus(X, targetlab) performs the addition, and places the result
into S only on the lab indicated by targetlab. S is set to [] on all
other labs.
S = gplus(labindex)
14-136
gpuArray
Syntax G = gpuArray(X)
Description G = gpuArray(X) copies the numeric data X to the GPU, and returns
a GPUArray object. You can operate on this data by passing it to the
feval method of a CUDA kernel object, or by using one of the methods
defined for GPUArray objects in “Using GPUArray” on page 10-4.
The MATLAB data X must be numeric (for example: single, double,
int8, etc.) or logical, and the GPU device must have sufficient free
memory to store the data. X must be a full matrix, not sparse.
If the input argument is already a GPUArray, the output is the same
as the input.
X = rand(10, 'single');
G = gpuArray(X);
isequal(gather(G), X) % Returns true
classUnderlying(G) % Returns 'single'
G2 = G .* G % Uses times method defined for
% GPUArray objects
14-137
gpuDevice
Syntax D = gpuDevice
D = gpuDevice(IDX)
g = gpuDevice
for ii = 1:gpuDeviceCount
g = gpuDevice(ii);
fprintf(1, 'Device %i has ComputeCapability %s \n', ...
g.Index, g.ComputeCapability)
end
14-138
gpuDeviceCount
Syntax n = gpuDeviceCount
Examples Determine how many GPU devices you have available in your computer
and examine the properties of each.
n = gpuDeviceCount;
for ii = 1:n
gpuDevice(ii)
end
14-139
help
Examples Get help on functions from each of the Parallel Computing Toolbox
object classes.
help distcomp.jobmanager/createJob
help distcomp.job/cancel
help distcomp.task/waitForState
class(j1)
ans =
distcomp.job
help distcomp.job/createTask
14-140
importParallelConfig
Examples Import a configuration from the file Config01.mat and use it to open a
pool of MATLAB workers:
conf_1 = importParallelConfig('Config01')
matlabpool('open', conf_1)
14-141
importParallelConfig
def_config = importParallelConfig('ConfigMaster')
defaultParallelConfig(def_config)
14-142
inspect
Syntax inspect(obj)
Description inspect(obj) opens the Property Inspector and allows you to inspect
and set properties for the object obj.
Remarks You can also open the Property Inspector via the Workspace browser by
double-clicking an object.
The Property Inspector does not automatically update its display. To
refresh the Property Inspector, open it again.
Note that properties that are arrays of objects are expandable. In
the figure of the example below, the Tasks property is expanded to
enumerate the individual task objects that make up this property.
These individual task objects can also be expanded to display their
own properties.
14-143
inspect
Examples Open the Property Inspector for the job object j1.
inspect(j1)
14-144
isaUnderlying
Examples N = 1000;
D_uint8 = distributed.ones(1, N, 'uint8');
D_cell = distributed.cell(1, N);
isUint8 = isaUnderlying(D_uint8, 'uint8') % returns true
isDouble = isaUnderlying(D_cell, 'double') % returns false
14-145
iscodistributed
Syntax tf = iscodistributed(X)
spmd
L = ones(100, 1);
D = codistributed.ones(100, 1);
iscodistributed(L) % returns false
iscodistributed(D) % returns true
end
14-146
isComplete
Syntax tf = isComplete(codist)
14-147
isdistributed
Syntax tf = isdistributed(X)
L = ones(100, 1);
D = distributed.ones(100, 1);
isdistributed(L) % returns false
isdistributed(D) % returns true
14-148
isreplicated
Syntax tf = isreplicated(X)
spmd
A = magic(3);
t = isreplicated(A) % returns t = true
B = magic(labindex);
f = isreplicated(B) % returns f = false
end
14-149
jobStartup
Syntax jobStartup(job)
Arguments job The job for which this startup is being executed.
matlabroot/toolbox/distcomp/user/jobStartup.m
You add MATLAB code to the file to define job initialization actions to
be performed on the worker when it first evaluates a task for this job.
Alternatively, you can create a file called jobStartup.m and include it
as part of the job’s FileDependencies property. The version of the file
in FileDependencies takes precedence over the version in the worker’s
MATLAB installation.
For further detail, see the text in the installed jobStartup.m file.
Properties
FileDependencies, PathDependencies
14-150
labBarrier
Syntax labBarrier
Description labBarrier blocks execution of a parallel algorithm until all labs have
reached the call to labBarrier. This is useful for coordinating access to
shared resources such as file I/O.
For a demonstration that uses labSend, labReceive, labBarrier,
and labSendReceive, see the demo Profiling Explicit Parallel
Communication.
Examples In this example, all labs know the shared data filename.
fname = 'c:\data\datafile.mat';
Lab 1 writes some data to the file, which all other labs will read.
if labindex == 1
data = randn(100, 1);
save(fname, 'data');
pause(5) %allow time for file to become available to other labs
end
All labs wait until all have reached the barrier; this ensures that no lab
attempts to load the file until lab 1 writes to it.
labBarrier;
load(fname);
14-151
labBroadcast
Purpose Send data to all labs or receive data sent to all labs
broadcast_id = 1;
if labindex == broadcast_id
data = randn(10);
14-152
labBroadcast
14-153
labindex
Syntax id = labindex
Description id = labindex returns the index of the lab currently executing the
function. labindex is assigned to each lab when a job begins execution,
and applies only for the duration of that job. The value of labindex
spans from 1 to n, where n is the number of labs running the current
job, defined by numlabs.
14-154
labProbe
Purpose Test to see if messages are ready to be received from other lab
14-155
labReceive
Description data = labReceive receives data from any lab with any tag.
data = labReceive(source) receives data from the specified lab with
any tag
data = labReceive('any',tag) receives data from any lab with the
specified tag.
data = labReceive(source,tag) receives data from only the specified
lab with the specified tag.
[data, source, tag] = labReceive returns the source and tag with
the data.
Remarks This function blocks execution in the lab until the corresponding call to
labSend occurs in the sending lab.
For a demonstration that uses labSend, labReceive, labBarrier,
and labSendReceive, see the demo Profiling Explicit Parallel
Communication.
14-156
labReceive
14-157
labSend
Arguments data Data sent to the other lab; any MATLAB data
type.
destination labindex of receiving lab.
tag Nonnegative integer to identify data.
14-158
labSendReceive
Purpose Simultaneously send data to and receive data from another lab
labSend(data, labTo);
received = labReceive(labFrom);
with the important exception that both the sending and receiving of
data happens concurrently. This can eliminate deadlocks that might
otherwise occur if the equivalent call to labSend would block.
If labTo is an empty array, labSendReceive does not send data, but
only receives. If labFrom is an empty array, labSendReceive does not
receive data, but only sends.
received = labSendReceive(labTo, labFrom, data, tag) uses
the specified tag for the communication. tag can be any integer from
0 to 32767.
For a demonstration that uses labSend, labReceive, labBarrier,
and labSendReceive, see the demo Profiling Explicit Parallel
Communication.
14-159
labSendReceive
Examples Create a unique set of data on each lab, and transfer each lab’s data one
lab to the right (to the next higher labindex).
First use magic to create a unique value for the variant array mydata
on each lab.
mydata = magic(labindex)
Lab 1:
mydata =
1
Lab 2:
mydata =
1 3
4 2
Lab 3:
mydata =
8 1 6
3 5 7
4 9 2
Define the lab on either side, so that each lab will receive data from the
lab on the “left” while sending data to the lab on the “right,” cycling
data from the end lab back to the beginning lab.
Transfer the data, sending each lab’s mydata into the next lab’s
otherdata variable, wrapping the third lab’s data back to the first lab.
14-160
labSendReceive
1
Lab 3:
otherdata =
1 3
4 2
Transfer data to the next lab without wrapping data from the last lab
to the first lab.
14-161
length
Syntax length(obj)
length(j1.Tasks)
ans =
9
14-162
load
Syntax load(job)
load(job, 'X')
load(job, 'X', 'Y', 'Z*')
load(job, '-regexp', 'PAT1', 'PAT2')
S = load(job ...)
Description load(job) retrieves all variables from a batch job and assigns them
into the current workspace. If the job is not finished, or if the job
encountered an error while running, load will throw an error.
load(job, 'X') loads only the variable named X from the job.
load(job, 'X', 'Y', 'Z*') loads only the specified variables. The
wildcard '*' loads variables that match a pattern (MAT-file only).
load(job, '-regexp', 'PAT1', 'PAT2') can be used to load all
variables matching the specified patterns using regular expressions.
For more information on using regular expressions, type doc regexp
at the command prompt.
S = load(job ...) returns the contents of job into variable S, which
is a struct containing fields matching the variables retrieved.
Examples Run a batch job and load its results into your client workspace.
j = batch('myScript');
14-163
load
wait(j)
load(j)
load(job, 'a*')
14-164
matlabpool
Syntax matlabpool
matlabpool open
matlabpool open poolsize
matlabpool open configname
matlabpool open configname poolsize
matlabpool poolsize
matlabpool configname
matlabpool configname poolsize
matlabpool(schedobj)
matlabpool(schedobj, 'open')
matlabpool(schedobj, 'open', ...)
matlabpool(schedobj, poolsize)
matlabpool close
matlabpool close force
matlabpool close force configname
matlabpool size
matlabpool('open', ...)
matlabpool('close', ...)
matlabpool( open ,..., FileDependencies , filecell)
matlabpool( addfiledependencies , filecell)
matlabpool updatefiledependencies
14-165
matlabpool
14-166
matlabpool
Remarks When a pool of workers is open, the following commands entered in the
client’s Command Window also execute on all the workers:
• cd
• addpath
14-167
matlabpool
• rmpath
This enables you to set the working directory and the path on all the
workers, so that a subsequent parfor-loop executes in the proper
context.
If any of these commands does not work on the client, it is not executed
on the workers either. For example, if addpath specifies a directory that
the client cannot see or access, the addpath command is not executed on
the workers. However, if the working directory or path can be set on the
client, but cannot be set as specified on any of the workers, you do not
get an error message returned to the client Command Window.
This slight difference in behavior is an issue especially in a
mixed-platform environment where the client is not the same platform
as the workers, where directories local to or mapped from the client
are not available in the same way to the workers, or where directories
are in a nonshared file system. For example, if you have a MATLAB
client running on a Microsoft Windows operating system while the
MATLAB workers are all running on Linux® operating systems, the
same argument to addpath cannot work on both. In this situation, you
can use the function pctRunOnAll to assure that a command runs on
all the workers.
Another difference between client and workers is that any addpath
arguments that are part of the matlabroot folder are not set on the
workers. The assumption is that the MATLAB install base is already
included in the workers’ paths. The rules for addpath regarding
workers in the pool are:
14-168
matlabpool
addpath('P1',
'P2',
'C:\Applications\matlab\T3',
'C:\Applications\matlab\T4',
'P5',
'C:\Applications\matlab\T6',
'P7',
'P8');
Because T3, T4, and T6 are subfolders of matlabroot, they are not set
on the workers’ paths. So on the workers, the pertinent part of the path
resulting from this command is:
P1
P2
<worker original matlabroot folders...>
P5
P7
P8
Examples Start a pool using the default configuration to define the number of labs:
matlabpool
matlabpool local 2
14-169
matlabpool
Start a pool with the default configuration, and pass two code files to
the workers:
Start a MATLAB pool with the scheduler and pool size determined by
the default configuration:
14-170
methods
Syntax methods(obj)
out = methods(obj)
Description methods(obj) returns the names of all methods for the class of which
obj is an instance.
out = methods(obj) returns the names of the methods as a cell array
of strings.
Examples Create job manager, job, and task objects, and examine what methods
are available for each.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
methods(jm)
Methods for class distcomp.jobmanager:
createJob demote pause resume
createParallelJob findJob promote
j1 = createJob(jm);
methods(j1)
Methods for class distcomp.job:
cancel destroy getAllOutputArguments waitForState
createTask findTask submit
14-171
methods
14-172
mpiLibConf
Remarks Under all circumstances, the MPI library must support all MPI-1
functions. Additionally, the MPI library must support null arguments
to MPI_Init as defined in section 4.2 of the MPI-2 standard. The
library must also use an mpi.h header file that is fully compatible
with MPICH2.
When used with the MathWorks job manager or the local scheduler, the
MPI library must support the following additional MPI-2 functions:
• MPI_Open_port
• MPI_Comm_accept
• MPI_Comm_connect
14-173
mpiLibConf
Examples Use the mpiLibConf function to view the current MPI implementation
library:
mpiLibConf
mpich2.dll
14-174
mpiprofile
Syntax mpiprofile
mpiprofile on <options>
mpiprofile off
mpiprofile resume
mpiprofile clear
mpiprofile status
mpiprofile reset
mpiprofile info
mpiprofile viewer
mpiprofile('viewer', <profinfoarray>)
Option Description
-detail mmex This option specifies the set of
functions for which profiling
-detail builtin
statistics are gathered. -detail
mmex (the default) records
information about functions,
subfunctions, and MEX-functions.
-detail builtin additionally
records information about built-in
functions such as eig or labReceive.
14-175
mpiprofile
Option Description
-messagedetail default This option specifies the detail at
which communication information
-messagedetail simplified
is stored.
-messagedetail default collects
information on a per-lab instance.
-messagedetail simplified turns
off collection for *PerLab data
fields, which reduces the profiling
overhead. If you have a very
large cluster, you might want to
use this option; however, you will
not get all the detailed inter-lab
communication plots in the viewer.
For information about the structure
of returned data, see mpiprofile
info below.
-history mpiprofile supports these options
in the same way as the standard
-nohistory
profile.
-historysize <size>
No other profile options are
supported by mpiprofile. These
three options have no effect on
the data displayed by mpiprofile
viewer.
mpiprofile off stops the parallel profiler. To reset the state of the
profiler and disable collecting communication information, you should
also call mpiprofile reset.
mpiprofile resume restarts the profiler without clearing previously
recorded function statistics. This works only in pmode or in the same
MATLAB worker session.
mpiprofile clear clears the profile information.
14-176
mpiprofile
Field Description
BytesSent Records the quantity of data sent
BytesReceived Records the quantity of data received
TimeWasted Records communication waiting time
CommTime Records the communication time
CommTimePerLab Vector of communication receive time for
each lab
TimeWastedPerLab Vector of communication waiting time for
each lab
BytesReceivedPerLab Vector of data received from each lab
The three *PerLab fields are collected only on a per-function basis, and
can be turned off by typing the following command in pmode:
14-177
mpiprofile
Examples In pmode, turn on the parallel profiler, run your function in parallel,
and call the viewer:
mpiprofile on;
% call your function;
mpiprofile viewer;
If you want to obtain the profiler information from a parallel job outside
of pmode (i.e., in the MATLAB client), you need to return output
arguments of mpiprofile info by using the functional form of the
command. Define your function foo(), and make it the task function
in a parallel job:
After the job runs and foo() is evaluated on your cluster, get the data
on the client:
14-178
mpiprofile
A = getAllOutputArguments(yourJob);
14-179
mpiSettings
Syntax mpiSettings('DeadlockDetection','on')
mpiSettings('MessageLogging','on')
mpiSettings('MessageLoggingDestination','CommandWindow')
mpiSettings('MessageLoggingDestination','stdout')
mpiSettings('MessageLoggingDestination','File','filename')
14-180
mpiSettings
mpiSettings has to be called on the lab, not the client. That is, it
should be called within the task function, within jobStartup.m, or
within taskStartup.m.
Examples Set deadlock detection for a parallel job inside the jobStartup.m file
for that job:
Turn off deadlock detection for all subsequent spmd statements that use
the same MATLAB pool:
14-181
numlabs
Syntax n = numlabs
14-182
parfor
You can enter a parfor-loop on multiple lines, but if you put more
than one segment of the loop statement on the same line, separate the
segments with commas or semicolons:
14-183
parfor
parfor i = 1:length(A)
B(i) = f(A(i));
end
parfor i = 1:n
t = f(A(i));
14-184
parfor
u = g(B(i));
C(i) = h(t, u);
end
s = 0;
parfor i = 1:n
if p(i) % assume p is a function
s = s + 1;
end
end
14-185
parfor
14-186
parallel.gpu.CUDAKernel
Purpose Create GPU CUDA kernel object from PTX and CU code
/*
* Add a constant to a vector.
*/
__global__ void addToVector(float * pi, float c, int vecLen) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < vecLen) {
pi[idx] += c;
}
14-187
parallel.gpu.CUDAKernel
'simpleEx.cu');
kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ...
'float *, float, int');
14-188
pause
Syntax pause(jm)
Description pause(jm) pauses the job manager’s queue so that jobs waiting in the
queued state will not run. Jobs that are already running also pause,
after completion of tasks that are already running. No further jobs or
tasks will run until the resume function is called for the job manager.
The pause function does nothing if the job manager is already paused.
14-189
pctconfig
14-190
pctconfig
Remarks The values set by this function do not persist between MATLAB
sessions. To guarantee its effect, call pctconfig before calling any
other Parallel Computing Toolbox functions.
config = pctconfig()
config =
portrange: [27370 27470]
hostname: 'machine32'
Set the current client session port range to 21000-22000 with hostname
fdm4.
pctconfig('hostname', 'desktop24.subnet6.companydomain.com');
14-191
pctRunDeployedCleanup
Syntax pctRunDeployedCleanup
14-192
pctRunOnAll
Description pctRunOnAll command runs the specified command on all the workers
of the matlabpool as well as the client, and prints any command-line
output back to the client Command Window. The specified command
runs in the base workspace of the workers and does not have any return
variables. This is useful if there are setup changes that need to be
performed on all the labs and the client.
pctRunOnAll cd /opt/projects/c1456
14-193
pload
Syntax pload(fileroot)
Arguments fileroot Part of filename common to all saved files being loaded.
Description pload(fileroot) loads the data from the files named [fileroot
num2str(labindex)] into the labs running a parallel job. The files
should have been created by the psave command. The number of
labs should be the same as the number of files. The files should be
accessible to all the labs. Any codistributed arrays are reconstructed
by this function. If fileroot contains an extension, the character
representation of the labindex will be inserted before the extension.
Thus, pload('abc') attempts to load the file abc1.mat on lab 1,
abc2.mat on lab 2, and so on.
Examples Create three variables — one replicated, one variant, and one
codistributed. Then save the data.
clear all;
rep = speye(numlabs);
var = magic(labindex);
D = eye(numlabs,codistributor());
psave('threeThings');
clear all
whos
14-194
pload
Load the previously saved data into the labs. Confirm its presence.
pload('threeThings');
whos
isreplicated(rep)
iscodistributed(D)
14-195
pmode
14-196
pmode
Examples In the following examples, the pmode prompt (P>>) indicates commands
entered in the Parallel Command Window. Other commands are
entered in the MATLAB Command Window.
Start pmode using the default configuration to identify the scheduler
and number of labs.
pmode start
Start pmode using the local configuration with four local labs.
14-197
pmode
Start pmode using the configuration myconfig and eight labs on the
cluster.
P>> x = 2*labindex;
pmode lab2client x 7
P>> pwd
14-198
poolStartup
Purpose File for user-defined options to run on each worker when MATLAB
pool starts
Syntax poolStartup
matlabroot/toolbox/distcomp/user/poolStartup.m
1 FileDependencies
2 PathDependencies
poolStartup is the ideal location for startup code required for parallel
execution on the MATLAB pool. For example, you might want to include
code for using mpiSettings. Because jobStartup and taskStartup
execute before poolStartup, they are not suited to pool-specific code.
In other words, you should use taskStartup for setup code on your
worker regardless of whether the task is from a distributed job, parallel
job, or using a MATLAB pool; while poolStartup is for setup code for
pool usage only.
For further details, see the text in the installed poolStartup.m file.
14-199
poolStartup
Properties
FileDependencies, PathDependencies
14-200
promote
Description promote(jm, job) promotes the job object job, that is queued in the
job manager jm.
If job is not the first job in the queue, promote exchanges the position
of job and the previous job.
Examples Create and submit multiple jobs to the scheduler identified by the
default parallel configuration:
j1 = createJob('name','Job A');
j2 = createJob('name','Job B');
j3 = createJob('name','Job C');
submit(j1);submit(j2);submit(j3);
jm = findResource();
promote(jm, j3)
14-201
promote
'Job A'
'Job C'
'Job B'
14-202
psave
Syntax psave(fileroot)
Description psave(fileroot) saves the data from the labs’ workspace into the
files named [fileroot num2str(labindex)]. The files can be loaded
by using the pload command with the same fileroot, which should
point to a directory accessible to all the labs. If fileroot contains an
extension, the character representation of the labindex is inserted
before the extension. Thus, psave('abc') creates the files 'abc1.mat',
'abc2.mat', etc., one for each lab.
Examples Create three variables — one replicated, one variant, and one
codistributed. Then save the data.
clear all;
rep = speye(numlabs);
var = magic(labindex);
D = eye(numlabs,codistributor());
psave('threeThings');
clear all
whos
14-203
psave
Load the previously saved data into the labs. Confirm its presence.
pload('threeThings');
whos
isreplicated(rep)
iscodistributed(D)
14-204
redistribute
spmd
% First, create a magic square distributed by columns:
M = codistributed(magic(10), codistributor1d(2, [1 2 3 4]));
14-205
resume
Syntax resume(jm)
14-206
set
Syntax set(obj)
props = set(obj)
set(obj,'PropertyName')
props = set(obj,'PropertyName')
set(obj,'PropertyName',PropertyValue,...)
set(obj,PN,PV)
set(obj,S)
set(obj,'configuration', 'ConfigurationName',...)
Description set(obj) displays all configurable properties for obj. If a property has
a finite list of possible string values, these values are also displayed.
props = set(obj) returns all configurable properties for obj and their
possible values to the structure props. The field names of props are the
property names of obj, and the field values are cell arrays of possible
14-207
set
Remarks You can use any combination of property name/property value pairs,
structure arrays, and cell arrays in one call to set. Additionally, you
can specify a property name without regard to case, and you can make
use of property name completion. For example, if j1 is a job object, the
following commands are all valid and have the same result:
set(j1,'Timeout',20)
set(j1,'timeout',20)
set(j1,'timeo',20)
14-208
set
Examples This example illustrates some of the ways you can use set to configure
property values for the job object j1.
set(j1,'Name','Job_PT109','Timeout',60);
S.Name = 'Job_PT109';
S.Timeout = 60;
set(j1,S);
14-209
setJobSchedulerData
14-210
setupForParallelExecution
14-211
setupForParallelExecution
Examples From any client, set up the scheduler to run parallel jobs only on
Windows-based (PC) workers.
From any client, set up the scheduler to run parallel jobs only on
UNIX-based workers.
14-212
size
Syntax d = size(obj)
[m,n] = size(obj)
[m1,m2,m3,...,mn] = size(obj)
m = size(obj,dim)
14-213
sparse
Syntax SD = sparse(FD)
SC = sparse(m, n, codist)
SC = sparse(m, n, codist, 'noCommunication')
spmd
SC = logical(sparse(m, n, codistributor1d()));
end
spmd(4)
C = sparse(1000, 1000, codistributor1d())
end
14-214
sparse
spmd(4)
codist = codistributor1d(2, 1:numlabs)
C = sparse(10, 10, codist);
end
R = distributed.rand(1000);
D = floor(2*R); % D also is distributed
SD = sparse(D); % SD is sparse distributed
14-215
spmd
Description The general form of an spmd (single program, multiple data) statement
is:
spmd
statements
end
14-216
spmd
Remarks For information about restrictions and limitations when using spmd, see
“Limitations” on page 3-15.
matlabpool(3)
spmd
% build magic squares in parallel
q = magic(labindex + 2);
end
for ii=1:length(q)
% plot each magic square
figure, imagesc(q{ii});
end
matlabpool close
14-217
submit
Syntax submit(obj)
Description submit(obj) queues the job object, obj, in the scheduler queue. The
scheduler used for this job was determined when the job was created.
Examples Find the job manager named jobmanager1 using the lookup service
on host JobMgrHost.
j1 = createJob(jm1);
submit(j1);
14-218
subsasgn
Description subsasgn assigns remote values to Composite objects. The values reside
on the labs in the current MATLAB pool.
C(i) = {B} sets the entry of C on lab i to the value B.
C(1:end) = {B} sets all entries of C to the value B.
C([i1, i2]) = {B1, B2} assigns different values on labs i1 and i2.
C{i} = B sets the entry of C on lab i to the value B.
14-219
subsref
Syntax B = C(i)
B = C([i1, i2, ...])
B = C{i}
[B1, B2, ...] = C{[i1, i2, ...]}
Description subsref retrieves remote values of a Composite object from the labs in
the current MATLAB pool.
B = C(i) returns the entry of Composite C from lab i as a cell array.
B = C([i1, i2, ...]) returns multiple entries as a cell array.
B = C{i} returns the value of Composite C from lab i as a single entry.
[B1, B2, ...] = C{[i1, i2, ...]} returns multiple entries.
14-220
taskFinish
Syntax taskFinish(task)
matlabroot/toolbox/distcomp/user/taskFinish.m
You add MATLAB code to the file to define task finalization actions to
be performed on the worker every time it finishes evaluating a task
for this job.
Alternatively, you can create a file called taskFinish.m and include it
as part of the job’s FileDependencies property. The version of the file
in FileDependencies takes precedence over the version in the worker’s
MATLAB installation.
For further detail, see the text in the installed taskFinish.m file.
Properties
FileDependencies, PathDependencies
14-221
taskStartup
Syntax taskStartup(task)
matlabroot/toolbox/distcomp/user/taskStartup.m
You add MATLAB code to the file to define task initialization actions to
be performed on the worker every time it evaluates a task for this job.
Alternatively, you can create a file called taskStartup.m and include
it as part of the job’s FileDependencies property. The version of the
file in FileDependencies takes precedence over the version in the
worker’s MATLAB installation.
For further detail, see the text in the installed taskStartup.m file.
Properties
FileDependencies, PathDependencies
14-222
wait
Syntax wait(obj)
wait(obj, 'state')
wait(obj, 'state', timeout)
Description wait(obj) blocks execution in the client session until the job identified
by the object obj reaches the 'finished' state or fails. This occurs
when all the job’s tasks are finished processing on remote workers.
wait(obj, 'state') blocks execution in the client session until the
specified job object changes state to the value of 'state'. The valid
states to wait for are 'queued', 'running', and 'finished'.
If the object is currently or has already been in the specified state,
a wait is not performed and execution returns immediately. For
example, if you execute wait(job, 'queued') for a job already in the
'finished' state, the call returns immediately.
wait(obj, 'state', timeout) blocks execution until either the job
reaches the specified 'state', or timeout seconds elapse, whichever
happens first.
Examples Submit a job to the queue, and wait for it to finish running before
retrieving its results.
14-223
wait
submit(job);
wait(job, 'finished')
results = getAllOutputArguments(job)
Submit a batch job and wait for it to finish before retrieving its variables.
job = batch('myScript');
wait(job)
load(job)
14-224
waitForState
Syntax waitForState(obj)
waitForState(obj, 'state')
waitForState(obj, 'state', timeout)
OK = waitForState(..., timeout)
Arguments obj Job or task object whose change in state to wait for.
'state' Value of the object’s State property to wait for.
timeout Maximum time to wait, in seconds.
OK Boolean true if wait succeeds, false if times out.
14-225
waitForState
Examples Submit a job to the queue, and wait for it to finish running before
retrieving its results.
submit(job)
waitForState(job, 'finished')
results = getAllOutputArguments(job)
14-226
15
Property Reference
Job Manager
BusyWorkers Workers currently running tasks
ClusterOsType Specify operating system of nodes on
which scheduler will start workers
ClusterSize Number of workers available to
scheduler
Configuration Specify configuration to apply to
object or toolbox function
HostAddress IP address of host running job
manager or worker session
HostName Name of host running job manager
or worker session
IdleWorkers Idle workers available to run tasks
IsUsingSecureCommunication True if job manager and workers use
secure communication
Jobs Jobs contained in job manager
service or in scheduler’s data
location
Name Name of job manager, job, or worker
object
NumberOfBusyWorkers Number of workers currently
running tasks
NumberOfIdleWorkers Number of idle workers available to
run tasks
PromptForPassword Specify if system should prompt for
password when authenticating user
SecurityLevel Security level controlling access to
job manager and its jobs
State Current state of task, job, job
manager, or worker
Type Type of scheduler object
15-2
Schedulers
Schedulers
CancelJobFcn Specify function to run when
canceling job on generic scheduler
CancelTaskFcn Specify function to run when
canceling task on generic scheduler
ClusterMatlabRoot Specify MATLAB root for cluster
ClusterName Name of Platform LSF cluster
ClusterOsType Specify operating system of nodes on
which scheduler will start workers
ClusterSize Number of workers available to
scheduler
ClusterVersion Version of HPC Server scheduler
Configuration Specify configuration to apply to
object or toolbox function
DataLocation Specify directory where job data is
stored
DestroyJobFcn Specify function to run when
destroying job on generic scheduler
DestroyTaskFcn Specify function to run when
destroying task on generic scheduler
EnvironmentSetMethod Specify means of setting
environment variables for mpiexec
scheduler
GetJobStateFcn Specify function to run when
querying job state on generic
scheduler
15-3
15 Property Reference
15-4
Jobs
Jobs
AuthorizedUsers Specify users authorized to access
job
Configuration Specify configuration to apply to
object or toolbox function
CreateTime When task or job was created
FileDependencies Directories and files that worker can
access
FinishedFcn Specify callback to execute after task
or job runs
FinishTime When task or job finished
ID Object identifier
JobData Data made available to all workers
for job’s tasks
MaximumNumberOfWorkers Specify maximum number of
workers to perform job tasks
MinimumNumberOfWorkers Specify minimum number of workers
to perform job tasks
15-5
15 Property Reference
15-6
Tasks
Tasks
AttemptedNumberOfRetries Number of times failed task was
rerun
CaptureCommandWindowOutput Specify whether to return Command
Window output
CommandWindowOutput Text produced by execution of task
object’s function
Configuration Specify configuration to apply to
object or toolbox function
CreateTime When task or job was created
Error Task error information
ErrorIdentifier Task error identifier
ErrorMessage Message from task error
FailedAttemptInformation Information returned from failed
task
FinishedFcn Specify callback to execute after task
or job runs
FinishTime When task or job finished
Function Function called when evaluating
task
ID Object identifier
InputArguments Input arguments to task object
MaximumNumberOfRetries Specify maximum number of times
to rerun failed task
NumberOfOutputArguments Number of arguments returned by
task function
OutputArguments Data returned from execution of task
Parent Parent object of job or task
RunningFcn Specify function file to execute when
job or task starts running
15-7
15 Property Reference
Workers
Computer Information about computer on
which worker is running
CurrentJob Job whose task this worker session
is currently evaluating
CurrentTask Task that worker is currently
running
HostAddress IP address of host running job
manager or worker session
HostName Name of host running job manager
or worker session
JobManager Job manager that this worker is
registered with
Name Name of job manager, job, or worker
object
PreviousJob Job whose task this worker
previously ran
PreviousTask Task that this worker previously ran
State Current state of task, job, job
manager, or worker
15-8
16
Properties — Alphabetical
List
AttemptedNumberOfRetries
Description If a task reruns because of certain system failures, the task property
AttemptedNumberOfRetries stores a count of the number of attempted
reruns.
16-2
AuthorizedUsers
16-3
AuthorizedUsers
Values You can populate AuthorizedUsers with the names of any users. At
security levels 1–3, the users must be recognized by the job manager as
authenticated in the session in which you are setting the property.
Examples This example creates a job named Job33, then adds the users sammy and
bob to the job’s AuthorizedUsers.
16-4
BlockSize
Properties
LabGrid, Orientation
16-5
BusyWorkers
Description The BusyWorkers property value indicates which workers are currently
running tasks for the job manager.
Values As workers complete tasks and assume new ones, the lists of workers
in BusyWorkers and IdleWorkers can change rapidly. If you examine
these two properties at different times, you might see the same worker
on both lists if that worker has changed its status between those times.
If a worker stops unexpectedly, the job manager’s knowledge of that as
a busy or idle worker does not get updated until the job manager runs
the next job and tries to send a task to that worker.
Examples Examine the workers currently running tasks for a particular job
manager.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
workers_running_tasks = get(jm, 'BusyWorkers')
16-6
CancelJobFcn
Description CancelJobFcn specifies a function to run when you call cancel for a job
running on a generic scheduler. This function lets you communicate
with the scheduler, to provide any instructions beyond the normal
toolbox action of changing the state of the job. To identify the job for the
scheduler, the function should include a call to getJobSchedulerData.
For more information and examples on using these functions and
properties, see “Managing Jobs” on page 8-50.
Properties
CancelTaskFcn, DestroyJobFcn, DestroyTaskFcn
16-7
CancelTaskFcn
Properties
CancelJobFcn, DestroyJobFcn, DestroyTaskFcn
16-8
CaptureCommandWindowOutput
Examples Set all tasks in a job to retain any command window output generated
during task evaluation.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
createTask(j, @myfun, 1, {x});
createTask(j, @myfun, 1, {x});
.
.
.
alltasks = get(j, 'Tasks');
set(alltasks, 'CaptureCommandWindowOutput', true)
16-9
CaptureCommandWindowOutput
16-10
ClusterMatlabRoot
16-11
ClusterName
Description ClusterName indicates the name of the LSF cluster on which this
scheduler will run your jobs.
16-12
ClusterOsType
Purpose Specify operating system of nodes on which scheduler will start workers
Values The valid values for this property are 'pc', 'unix', and'mixed'.
Properties
ClusterName, MasterName, SchedulerHostname
16-13
ClusterSize
Values For job managers this property is read-only. The value for a job manager
represents the number of workers registered with that job manager.
For local or third-party schedulers, this property is settable,
and its value specifies the maximum number of workers or labs
that this scheduler can start for running a job. A parallel job’s
MaximumNumberOfWorkers property value must not exceed the value
of ClusterSize.
16-14
ClusterVersion
Values This property can have the value 'CCS' (for CCS) or 'HPCServer2008'
(for HPC Server 2008).
Remarks If you change the value of ClusterVersion, this resets the values of
ClusterSize, JobTemplate, and UseSOAJobSubmission.
16-15
codistributor2dbc.defaultBlockSize
Properties
BlockSize, LabGrid
16-16
CommandWindowOutput
Examples Get the Command Window output from all tasks in a job.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
createTask(j, @myfun, 1, {x});
createTask(j, @myfun, 1, {x});
.
.
alltasks = get(j, 'Tasks')
set(alltasks, 'CaptureCommandWindowOutput', true)
16-17
CommandWindowOutput
submit(j)
outputmessages = get(alltasks, 'CommandWindowOutput')
16-18
Computer
Description The Computer property of a worker is set to the string that would be
returned from running the computer function on that worker.
Values Some possible values for the Computer property are GLNX86, GLNXA64,
MACI, PCWIN, and PCWIN64. For more information about specific values,
see the computer function reference page.
Properties
HostAddress, HostName, WorkerMachineOsType
16-19
Configuration
jm = findResource('scheduler','configuration','myConfig')
job1 = createJob(jm,'Configuration','jobmanager')
job2 = createJob(jm)
set(job2,'Configuration','myjobconfig')
16-20
Configuration
16-21
CreateTime
Description CreateTime holds a date number specifying the time when a task or job
was created, in the format 'day mon dd hh:mm:ss tz yyyy'.
Values CreateTime is assigned the job manager’s system time when a task
or job is created.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
get(j,'CreateTime')
ans =
Mon Jun 28 10:13:47 EDT 2004
Properties
FinishTime, StartTime, SubmitTime
16-22
CurrentJob
Description CurrentJob indicates the job whose task the worker is evaluating at
the present time.
16-23
CurrentTask
Description CurrentTask indicates the task that the worker is evaluating at the
present time.
16-24
DataLocation
sch = findResource('scheduler','name','LSF')
set(sch, 'DataLocation','/depot/jobdata')
16-25
DataLocation
16-26
DestroyJobFcn
Properties
CancelJobFcn, CancelTaskFcn, DestroyTaskFcn
16-27
DestroyTaskFcn
Properties
CancelJobFcn, CancelTaskFcn, DestroyJobFcn
16-28
Dimension
Properties
Partition
16-29
EnvironmentSetMethod
Values A value of '-env' instructs the mpiexec scheduler to insert into the
mpiexec command line additional directives of the form -env VARNAME
value.
A value of 'setenv' instructs the mpiexec scheduler to set the
environment variables in the environment that launches mpiexec.
16-30
Error
Description If an error occurs during the task evaluation, Error contains the
MException object thrown. See the MException reference page for more
information about returned information.
Values Error is empty before an attempt to run a task. Error remains empty if
the evaluation of a task object’s function does not produce an error or if
a task does not complete because of cancellation or worker crash.
16-31
ErrorIdentifier
16-32
ErrorMessage
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
a = [1 2 3 4]; %Note: matrix not square
t = createTask(j, @inv, 1, {a});
submit(j)
get(t,'ErrorMessage')
ans =
Error using ==> inv
Matrix must be square.
16-33
FailedAttemptInformation
Description If a task reruns because of certain system failures, the task property
FailedAttemptInformation stores information related to the failure
and rerun attempts.
16-34
FileDependencies
16-35
FileDependencies
Remarks There is a default limitation on the size of data transfers via the
FileDependencies property. For more information on this limit, see
“Object Data Size Limitations” on page 6-45. For alternative means of
making data available to workers, see “Sharing Code” on page 8-29.
Examples Example 1
Make available to a job’s workers the contents of the directories fd1
and fd2, and the file fdfile1.m.
Example 2
Suppose in your client MATLAB session you have the following folders
on your MATLAB path:
dirA
dirA\subdir1
dirA\subdir2
dirB
16-36
FileDependencies
Transfer the content of these folders to the worker machines, and add
all these folders to the paths of the worker MATLAB sessions. On the
client, execute the following code:
In the task function that executes on the workers, include the following
code:
Properties
PathDependencies
16-37
FinishedFcn
Description FinishedFcn specifies the function file to execute when a job or task
completes its execution.
The callback executes in the local MATLAB session, that is, the session
that sets the property, the MATLAB client.
Examples Create a job and set its FinishedFcn property using a function handle
to an anonymous function that sends information to the display.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm, 'Name', 'Job_52a');
16-38
FinishedFcn
function clientTaskCompleted(task,eventdata)
disp(['Finished task: ' num2str(task.ID)])
Run the job and note the output messages from the job and task
FinishedFcn callbacks.
submit(j)
Finished task: 1
Job_52a finished
16-39
FinishTime
Description FinishTime holds a date number specifying the time when a task or job
finished executing, in the format 'day mon dd hh:mm:ss tz yyyy'.
If a task or job is stopped or is aborted due to an error condition,
FinishTime will hold the time when the task or job was stopped or
aborted.
Values FinishTime is assigned the job manager’s system time when the task
or job has finished.
Examples Create and submit a job, then get its StartTime and FinishTime.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
t1 = createTask(j, @rand, 1, {12,12});
t2 = createTask(j, @rand, 1, {12,12});
t3 = createTask(j, @rand, 1, {12,12});
t4 = createTask(j, @rand, 1, {12,12});
submit(j)
waitForState(j,'finished')
get(j,'StartTime')
ans =
Mon Jun 21 10:02:17 EDT 2004
get(j,'FinishTime')
ans =
Mon Jun 21 10:02:52 EDT 2004
16-40
FinishTime
Properties
CreateTime, StartTime, SubmitTime
16-41
Function
Properties
InputArguments, NumberOfOutputArguments, OutputArguments
16-42
GetJobStateFcn
Purpose Specify function to run when querying job state on generic scheduler
Properties
State, SubmitFcn
16-43
HasSharedFilesystem
16-44
HostAddress
Examples Create a job manager object and examine its HostAddress property.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
get(jm, 'HostAddress')
ans =
123.123.123.123
Properties
Computer, HostName, WorkerMachineOsType
16-45
HostName
Description You can match the HostName property to find a desired job manager
or worker when creating the job manager or worker object with
findResource.
Examples Create a job manager object and examine its HostName property.
jm = findResource('scheduler','type','jobmanager', ...
'Name', 'MyJobManager')
get(jm, 'HostName')
ans =
JobMgrHost
Properties
Computer, HostAddress, WorkerMachineOsType
16-46
ID
Description Each object has a unique identifier within its parent object. The ID
value is assigned at the time of object creation. You can use the ID
property value to distinguish one object from another, such as different
tasks in the same job.
Values The first job created in a job manager has the ID value of 1, and jobs are
assigned ID values in numerical sequence as they are created after that.
The first task created in a job has the ID value of 1, and tasks are
assigned ID values in numerical sequence as they are created after that.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm)
createTask(j, @rand, 1, {2,4});
createTask(j, @rand, 1, {2,4});
tasks = get(j, 'Tasks');
get(tasks, 'ID')
ans =
[1]
[2]
The ID values are the only unique properties distinguishing these two
tasks.
16-47
ID
Properties
Jobs, Tasks
16-48
IdleWorkers
Description The IdleWorkers property value indicates which workers are currently
available to the job manager for the performance of job tasks.
Values As workers complete tasks and assume new ones, the lists of workers
in BusyWorkers and IdleWorkers can change rapidly. If you examine
these two properties at different times, you might see the same worker
on both lists if that worker has changed its status between those times.
If a worker stops unexpectedly, the job manager’s knowledge of that as
a busy or idle worker does not get updated until the job manager runs
the next job and tries to send a task to that worker.
Examples Examine which workers are available to a job manager for immediate
use to perform tasks.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
get(jm, 'NumberOfIdleWorkers')
16-49
InputArguments
Values The forms and values of the input arguments are totally dependent
on the task function.
Examples Create a task requiring two input arguments, then examine the task’s
InputArguments property.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
t = createTask(j, @rand, 1, {2, 4});
get(t, 'InputArguments')
ans =
[2] [4]
Properties
Function, OutputArguments
16-50
IsUsingSecureCommunication
Properties
PromptForPassword, SecurityLevel, UserName
16-51
JobData
Description The JobData property holds data that eventually gets stored in the local
memory of the worker machines, so that it does not have to be passed
to the worker for each task in a job that the worker evaluates. Passing
the data only once per job to each worker is more efficient than passing
data with each task.
Note, that to access the data contained in a job’s JobData property,
the worker session evaluating the task needs to have access to the job,
which it gets from a call to the function getCurrentJob, as discussed in
the example below.
Examples Create job1 and set its JobData property value to the contents of
array1.
job1 = createJob(jm)
set(job1, 'JobData', array1)
createTask(job1, @myfunction, 1, {task_data})
Now the contents of array1 are available to all the tasks in the job.
Because the job itself must be accessible to the tasks, myfunction must
include a call to the function getCurrentJob. That is, the task function
myfunction needs to call getCurrentJob to get the job object through
which it can get the JobData property. So myfunction should contain
lines like the following:
cj = getCurrentJob
array1 = get(cj, 'JobData')
16-52
JobData
16-53
JobDescriptionFile
Purpose Name of XML job description file for Microsoft Windows HPC Server
scheduler
Description The XML file you specify by the JobDescriptionFile property defines
the base state from which the job is created. The file must exist on
the MATLAB path or the property must specify the full path name
to the file.
Any job properties that are specified as part of MATLAB job objects
(e.g., MinimumNumberOfWorkers, MaximumNumberOfWorkers, etc., for
parallel or MATLAB pool jobs) override the values specified in the job
description file. Scheduler properties (e.g., nonempty JobTemplate
property) also override the values specified in the job description file.
For SOA jobs the values in the job description file are ignored.
For version 2 of Windows HPC Server 2008, the values for HPC Server
job properties specified in the job description file must be compatible
with the values in the job template that is applied to the job (either the
default job template or the job template specified by the JobTemplate
property). Incompatibilities between property values specified by the
job description file and the job template might result in an error when
you submit a job. For example, if the job template imposes property
restrictions that you violate in your job description file, you get an error.
For information about job description files, consult Microsoft online
documentation at:
http://technet.microsoft.com/en-us/library/cc972801(WS.10).aspx
16-54
JobManager
Description JobManager indicates the job manager that the worker that the worker
is registered with.
16-55
Jobs
Description The Jobs property contains an array of all the job objects in a scheduler.
Job objects will be in the order indicated by their ID property, consistent
with the sequence in which they were created, regardless of their
State. (To see the jobs categorized by state or the scheduled execution
sequence for jobs in the queue, use the findJob function.)
Examples Examine the Jobs property for a job manager, and use the resulting
array of objects to set property values.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j1 = createJob(jm);
j2 = createJob(jm);
j3 = createJob(jm);
j4 = createJob(jm);
.
.
.
all_jobs = get(jm, 'Jobs')
set(all_jobs, 'MaximumNumberOfWorkers', 10);
16-56
Jobs
Properties
Tasks
16-57
JobTemplate
Description JobTemplate identifies the name of a job template to use with your HPC
Server scheduler. The property value is not case-sensitive.
With HPC Server 2008, if you do not specify a value for the JobTemplate
property, the scheduler uses the default job template to run the job. Ask
your system administrator which job template you should use.
For SOA jobs, the specified job template used for submitting SOA jobs
must not impose any restrictions on the name of the job, otherwise
these jobs fail.
16-58
LabGrid
Properties
BlockSize, Orientation
16-59
MasterName
Description MasterName indicates the name of the LSF cluster master node.
16-60
MatlabCommandToRun
16-61
MaximumNumberOfRetries
Description If a task cannot complete because of certain system failures, the job
manager can attempt to rerun the task. MaximumNumberOfRetries
specifies how many times to try to run the task after such failures. The
task reruns until it succeeds or until it reaches the specified maximum
number of attempts.
16-62
MaximumNumberOfWorkers
Values You can set the value to anything equal to or greater than the value of
the MinimumNumberOfWorkers property.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
set(j, 'MaximumNumberOfWorkers', 12);
In this example, the job will use no more than 12 workers, regardless
of how many tasks are in the job and how many workers are available
on the cluster.
16-63
MinimumNumberOfWorkers
Values The default value is 1. You can set the value anywhere from 1 up to or
equal to the value of the MaximumNumberOfWorkers property.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
set(j, 'MinimumNumberOfWorkers', 6);
In this example, when the job is queued, it will not begin running tasks
until at least six workers are available to perform task evaluations.
16-64
MpiexecFileName
Remarks See your network administrator to find out which mpiexec you should
run. The default value of the property points the mpiexec included in
your MATLAB installation.
Properties
SubmitArguments
16-65
Name
Description The descriptive name of a job manager or worker is set when its
service is started, as described in "Customizing Engine Services" in the
MATLAB Distributed Computing Server System Administrator’s Guide.
This is reflected in the Name property of the object that represents the
service. You can use the name of the job manager or worker service
to search for the particular service when creating an object with the
findResource function.
You can configure Name as a descriptive name for a job object at any
time before the job is submitted to the queue.
Examples Construct a job manager object by searching for the name of the service
you want to use.
jm = findResource('scheduler','type','jobmanager', ...
'Name','MyJobManager');
j = createJob(jm);
get(j, 'Name')
ans =
MyJobManager_job
16-66
Name
Change the job’s Name property and verify the new setting.
set(j,'Name','MyJob')
get(j,'Name')
ans =
MyJob
16-67
NumberOfBusyWorkers
Examples Examine the number of workers currently running tasks for a job
manager.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
get(jm, 'NumberOfBusyWorkers')
16-68
NumberOfIdleWorkers
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
get(jm, 'NumberOfIdleWorkers')
16-69
NumberOfOutputArguments
Description When you create a task with the createTask function, you define how
many output arguments are expected from the task function.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
t = createTask(j, @rand, 1, {2, 4});
get(t,'NumberOfOutputArguments')
ans =
1
Properties
OutputArguments
16-70
Orientation
Properties
BlockSize, LabGrid
16-71
OutputArguments
Values The forms and values of the output arguments are totally dependent
on the task function.
Examples Create a job with a task and examine its result after running the job.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
t = createTask(j, @rand, 1, {2, 4});
submit(j)
16-72
OutputArguments
Properties
Function, InputArguments, NumberOfOutputArguments
16-73
ParallelSubmissionWrapperScript
Properties
ClusterName, ClusterMatlabRoot, MasterName, SubmitArguments
16-74
ParallelSubmitFcn
Purpose Specify function to run when parallel job submitted to generic scheduler
Properties
MatlabCommandToRun, SubmitFcn
16-75
Parent
Description A job’s Parent property indicates the job manager or scheduler object
that contains the job. A task’s Parent property indicates the job object
that contains the task.
16-76
Partition
returns [3 3 2 2] .
Properties
Dimension
16-77
PathDependencies
sch = findResource('scheduler','name','LSF')
job1 = createJob(sch)
p = {'/central/funcs','/dept1/funcs', ...
'\\OurDomain\central\funcs','\\OurDomain\dept1\funcs'}
set(job1, 'PathDependencies', p)
16-78
PathDependencies
16-79
PreviousJob
Description PreviousJob indicates the job whose task the worker most recently
evaluated.
16-80
PreviousTask
Description PreviousTask indicates the task that the worker most recently
evaluated.
16-81
PromptForPassword
Purpose Specify if system should prompt for password when authenticating user
Properties
IsUsingSecureCommunication, SecurityLevel, UserName
16-82
QueuedFcn
Purpose Specify function file to execute when job is submitted to job manager
queue
Description QueuedFcn specifies the function file to execute when a job is submitted
to a job manager queue.
The callback executes in the local MATLAB session, that is, the session
that sets the property.
Examples Create a job and set its QueuedFcn property, using a function handle to
an anonymous function that sends information to the display.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm, 'Name', 'Job_52a');
set(j, 'QueuedFcn', ...
@(job,eventdata) disp([job.Name ' now queued for execution.']))
.
.
.
16-83
QueuedFcn
submit(j)
Job_52a now queued for execution.
Properties
FinishedFcn, RunningFcn
16-84
RcpCommand
Description When using a nonshared file system, the command specified by this
property’s value is used on the cluster to copy files from the client
machine. The syntax of the command must be compatible with standard
rcp. On MicrosoftWindows operating systems, the cluster machines
must have a suitable installation of rcp.
16-85
ResourceTemplate
Description The value of this property is used to build the resource selection portion
of the qsub command, generally identified by the -l flag. The toolbox
uses this to identify the number of tasks in a parallel job, and you might
want to fill out other selection subclauses (such as the OS type of the
workers). You should specify a value for this property that includes the
literal string ^N^ , which the toolbox will replace with the number of
workers in the parallel job prior to submission.
Values You might set the property value as follows, to accommodate your
cluster size and to set the “wall time” limit of the job (i.e., how long it is
allowed to run in real time) to one hour:
16-86
RestartWorker
Purpose Specify whether to restart MATLAB workers before evaluating job tasks
Description In some cases, you might want to restart MATLAB on the workers
before they evaluate any tasks in a job. This action resets defaults,
clears the workspace, frees available memory, and so on.
Values Set RestartWorker to true (or logical 1) if you want the job to restart
the MATLAB session on any workers before they evaluate their first
task for that job. The workers are not reset between tasks of the same
job. Set RestartWorker to false (or logical 0) if you do not want
MATLAB restarted on any workers. When you perform get on the
property, the value returned is logical 1 or logical 0. The default value
is 0, which does not restart the workers.
Examples Create a job and set it so that MATLAB workers are restarted before
evaluating tasks in a job.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
set(j, 'RestartWorker', true)
.
.
.
submit(j)
16-87
RshCommand
Purpose Remote execution command used on worker nodes during parallel job
Description Used on only UNIX operating systems, the value of this property is the
command used at the beginning of running parallel jobs, typically to
start MPI daemon processes on the nodes allocated to run MATLAB
workers. The remote execution must be able to proceed without user
interaction, for example, without prompting for user credentials.
16-88
RunningFcn
Purpose Specify function file to execute when job or task starts running
Description RunningFcn specifies the function file to execute when a job or task
begins its execution.
The callback executes in the local MATLAB client session, that is, the
session that sets the property.
Examples Create a job and set its QueuedFcn property, using a function handle to
an anonymous function that sends information to the display.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm, 'Name', 'Job_52a');
set(j, 'RunningFcn', ...
@(job,eventdata) disp([job.Name ' now running.']))
.
.
.
submit(j)
16-89
RunningFcn
Properties
FinishedFcn, QueuedFcn
16-90
SchedulerHostname
Remarks If you change the value of SchedulerHostname, this resets the values of
ClusterSize, JobTemplate, and UseSOAJobSubmission.
16-91
SecurityLevel
Purpose Security level controlling access to job manager and its jobs
Values The property values indicating security level and their effects are shown
in the following table.
Security Effect
Level
0 No security. All users can access all jobs; the
AuthorizedUsers property of the job is ignored.
1 You are warned when you try to access other users’ jobs
and tasks, but can still perform all actions. You can
suppress the warning by adding your user name to the
AuthorizedUsers property of the job.
16-92
SecurityLevel
Security Effect
Level
2 Authentication required. You must enter a password
to access any jobs and tasks. You cannot access other
users’ jobs unless your user name is included in the job’s
AuthorizedUsers property.
3 Same as level 2, but in addition, tasks run on the
workers as the user to whom the job belongs. The user
name and password for authentication in the client
session need to be the same as the system password
used to log on to a worker machine. NOTE: This level
requires secure communication between job manager
and workers. Secure communication is also set in
the mdce_def file, and is indicated by a job manager’s
IsUsingSecureCommunication property.
The job manager and the workers should run at the same security level.
A worker running at too low a security level will fail to register with the
job manager, because the job manager does not trust it.
Properties
AuthorizedUsers, IsUsingSecureCommunication,
PromptForPassword, UserName
16-93
ServerName
Description ServerName indicates the name of the node on which the PBS Pro or
TORQUE scheduler is running.
16-94
StartTime
Description StartTime holds a date number specifying the time when a job or task
starts running, in the format 'day mon dd hh:mm:ss tz yyyy'.
Values StartTime is assigned the job manager’s system time when the task
or job has started running.
Examples Create and submit a job, then get its StartTime and FinishTime.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
t1 = createTask(j, @rand, 1, {12,12});
t2 = createTask(j, @rand, 1, {12,12});
t3 = createTask(j, @rand, 1, {12,12});
t4 = createTask(j, @rand, 1, {12,12});
submit(j)
waitForState(j, 'finished')
get(j, 'StartTime')
ans =
Mon Jun 21 10:02:17 EDT 2004
get(j, 'FinishTime')
ans =
Mon Jun 21 10:02:52 EDT 2004
16-95
StartTime
Properties
CreateTime, FinishTime, SubmitTime
16-96
State
Description The State property reflects the stage of an object in its life cycle,
indicating primarily whether or not it has yet been executed. The
possible State values for all Parallel Computing Toolbox objects are
discussed below in the “Values” section.
Note The State property of the task object is different than the State
property of the job object. For example, a task that is finished may be
part of a job that is running if other tasks in the job have not finished.
• pending — Tasks that have not yet started to evaluate the task
object’s Function property are in the pending state.
• running — Task objects that are currently in the process of
evaluating the Function property are in the running state.
• finished — Task objects that have finished evaluating the task
object’s Function property are in the finished state.
• unavailable — Communication cannot be established with the job
manager.
16-97
State
Job Object
For a job object, possible values for State are
• pending — Job objects that have not yet been submitted to a job
queue are in the pending state.
• queued — Job objects that have been submitted to a job queue but
have not yet started to run are in the queued state.
• running — Job objects that are currently in the process of running
are in the running state.
• finished — Job objects that have completed running all their tasks
are in the finished state.
• failed — Job objects when using a third-party scheduler and the job
could not run because of unexpected or missing information.
• destroyed — Job objects whose data has been permanently removed
from the data location or job manager.
• unavailable — Communication cannot be established with the job
manager.
Job Manager
For a job manager, possible values for State are
When a job manager first starts up, the default value for State is
running.
16-98
State
Worker
For a worker, possible values for State are
Examples Create a job manager object representing a job manager service, and
create a job object; then examine each object’s State property.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
get(jm, 'State')
ans =
running
j = createJob(jm);
get(j, 'State')
ans =
pending
16-99
SubmitArguments
Description SubmitArguments is simply a string that is passed via the bsub or qsub
command to the LSF, PBS Pro, or TORQUE scheduler at submit time,
or passed to the mpiexec command if using an mpiexec scheduler.
mpiexec Scheduler
The following SubmitArguments values might be useful when using an
mpiexec scheduler. They can be combined to form a single string when
separated by spaces.
16-100
SubmitArguments
Value Description
-phrase MATLAB Use MATLAB as passphrase to connect with
smpd.
-noprompt Suppress prompting for any user
information.
-localonly Run only on the local computer.
-host <hostname> Run only on the identified host.
-machinefile Run only on the nodes listed in the specified
<filename> file (one hostname per line).
For a complete list, see the command-line help for the mpiexec
command:
mpiexec -help
mpiexec -help2
Properties
MatlabCommandToRun, MpiexecFileName
16-101
SubmitFcn
Description SubmitFcn identifies the function to run when you submit a job to the
generic scheduler. The function runs in the MATLAB client. This
user-defined submit function provides certain job and task data for
the MATLAB worker, and identifies a corresponding decode function
for the MATLAB worker to run.
For further information, see “MATLAB Client Submit Function” on
page 8-35.
Values SubmitFcn can be set to any valid MATLAB callback value that uses
the user-defined submit function.
For a description of the user-defined submit function, how it is used,
and its relationship to the worker decode function, see “Using the
Generic Scheduler Interface” on page 8-34.
Properties
MatlabCommandToRun
16-102
SubmitTime
Description SubmitTime holds a date number specifying the time when a job was
submitted to the job queue, in the format
'day mon dd hh:mm:ss tz yyyy'.
Values SubmitTime is assigned the job manager’s system time when the job is
submitted.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
createTask(j, @rand, 1, {12,12});
submit(j)
get(j, 'SubmitTime')
ans =
Wed Jun 30 11:33:21 EDT 2004
Properties
CreateTime, FinishTime, StartTime
16-103
Tag
Description You configure Tag to be a string value that uniquely identifies a job
object.
Tag is particularly useful in programs that would otherwise need to
define the job object as a global variable, or pass the object as an
argument between callback routines.
You can return the job object with the findJob function by specifying
the Tag property value.
Examples Suppose you create a job object in the job manager jm.
job1 = createJob(jm);
set(job1,'Tag','MyFirstJob')
You can identify and access job1 using the findJob function and the
Tag property value.
job_one = findJob(jm,'Tag','MyFirstJob');
16-104
Task
Description The Task property contains the task object for the MATLAB pool
job, which has only this one task. This is the same as the first task
contained in the Tasks property.
Properties
Tasks
16-105
Tasks
Description The Tasks property contains an array of all the task objects in a job,
whether the tasks are pending, running, or finished. Tasks are always
returned in the order in which they were created.
Examples Examine the Tasks property for a job object, and use the resulting array
of objects to set property values.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
createTask(j, ...)
.
.
.
createTask(j, ...)
alltasks = get(j, 'Tasks')
alltasks =
distcomp.task: 10-by-1
set(alltasks, 'Timeout', 20);
The last line of code sets the Timeout property value to 20 seconds for
each task in the job.
16-106
Tasks
Properties
Jobs
16-107
Timeout
Description Timeout holds a double value specifying the number of seconds to wait
before giving up on a task or job.
The time for timeout begins counting when the task State property
value changes from the Pending to Running, or when the job object
State property value changes from Queued to Running.
When a task times out, the behavior of the task is the same as if the
task were stopped with the cancel function, except a different message
is placed in the task object’s ErrorMessage property.
When a job times out, the behavior of the job is the same as if the job
were stopped using the cancel function, except all pending and running
tasks are treated as having timed out.
Values The default value for Timeout is large enough so that in practice, tasks
and jobs will never time out. You should set the value of Timeout to the
number of seconds you want to allow for completion of tasks and jobs.
jm = findResource('scheduler','type','jobmanager', ...
'name','MyJobManager','LookupURL','JobMgrHost');
j = createJob(jm);
set(j, 'Timeout', 60)
16-108
Timeout
Properties
ErrorMessage, State
16-109
Type
16-110
UserData
Description You configure UserData to store data that you want to associate with
an object. The object does not use this data directly, but you can access
it using the get function or dot notation.
UserData is stored in the local MATLAB client session, not in the job
manager, job data location, or worker. So, one MATLAB client session
cannot access the data stored in this property by another MATLAB
client session. Even on the same machine, if you close the client session
where UserData is set for an object, and then access the same object
from a later client session via the job manager or job data location, the
original UserData is not recovered. Likewise, commands such as
clear all
clear functions
will clear an object in the local session, permanently removing the data
in the UserData property.
job1 = createJob(jm);
coeff.a = 1.0;
coeff.b = -1.25;
job1.UserData = coeff
16-111
UserData
get(job1,'UserData')
ans =
a: 1
b: -1.2500
16-112
UserName
Description On a job, the UserName property value is a string indicating the login
name of the user who created the job.
On a job manager object, the UserName property value indicates the
user who created the object or who is using the job manager object to
access jobs in its queue.
get(job1, 'UserName')
ans =
jsmith
Change the user for a job manager object in your current MATLAB
session. Certain security levels display a password prompt.
jm = findResource('scheduler','type','jobmanager','name','central-jm');
set(jm, 'UserName', 'MyNewName')
See Also These references apply to using the UserName property for job manager
objects.
Functions
changePassword, clearLocalPassword
Properties
IsUsingSecureCommunication, PromptForPassword, SecurityLevel
16-113
UseSOAJobSubmission
Note The MATLAB client from which you submit SOA jobs to the HPC
Server 2008 scheduler must remain open for the duration of these jobs.
Closing the MATLAB client session while SOA jobs are in the pending,
queued, or running state causes the scheduler to cancel these jobs.
16-114
UseSOAJobSubmission
16-115
Worker
Description The Worker property value is an object representing the worker session
that evaluated the task.
Values Before a task is evaluated, its Worker property value is an empty vector.
submit(job1)
waitForState(job1,'finished')
t1 = findTask(job1,'ID',1)
t1.Worker.Name
ans =
node55_worker1
16-116
WorkerMachineOsType
Purpose Specify operating system of nodes on which mpiexec scheduler will start
labs
Values The only value the property can have is 'pc' or 'unix'. The nodes of
the labs running an mpiexec job must all be the same platform. The
only heterogeneous mixing allowed in the cluster for the same mpiexec
job is Intel® Macintosh-based systems with 32-bit Linux-based systems.
16-117
WorkerMachineOsType
16-118
Glossary
Glossary
CHECKPOINTBASE
The name of the parameter in the mdce_def file that defines the location
of the job manager and worker checkpoint directories.
checkpoint directory
Location where job manager checkpoint information and worker
checkpoint information is stored.
client
The MATLAB session that defines and submits the job. This is the
MATLAB session in which the programmer usually develops and
prototypes applications. Also known as the MATLAB client.
client computer
The computer running the MATLAB client.
cluster
A collection of computers that are connected via a network and intended
for a common purpose.
coarse-grained application
An application for which run time is significantly greater than
the communication time needed to start and stop the program.
Coarse-grained distributed applications are also called embarrassingly
parallel applications.
codistributed array
An array partitioned into segments, with each segment residing in the
workspace of a different lab.
Composite
An object in a MATLAB client session that provides access to data
values stored on the labs in a MATLAB pool, such as the values of
variables that are assigned inside an spmd statement.
computer
A system with one or more processors.
Glossary-1
Glossary
distributed application
The same application that runs independently on several nodes,
possibly with different input parameters. There is no communication,
shared data, or synchronization points between the nodes. Distributed
applications can be either coarse-grained or fine-grained.
distributed computing
Computing with distributed applications, running the application on
several nodes simultaneously.
DNS
Domain Name System. A system that translates Internet domain
names into IP addresses.
dynamic licensing
The ability of a MATLAB worker or lab to employ all the functionality
you are licensed for in the MATLAB client, while checking out only
an engine license. When a job is created in the MATLAB client
with Parallel Computing Toolbox software, the products for which
the client is licensed will be available for all workers or labs that
evaluate tasks for that job. This allows you to run any code on the
cluster that you are licensed for on your MATLAB client, without
requiring extra licenses for the worker beyond MATLAB Distributed
Computing Server software. For a list of products that are not
eligible for use with Parallel Computing Toolbox software, see
http://www.mathworks.com/products/ineligible_programs/.
fine-grained application
An application for which run time is significantly less than the
communication time needed to start and stop the program. Compare to
coarse-grained applications.
head node
Usually, the node of the cluster designated for running the job manager
and license manager. It is often useful to run all the nonworker related
processes on a single machine.
Glossary-2
Glossary
heterogeneous cluster
A cluster that is not homogeneous.
homogeneous cluster
A cluster of identical machines, in terms of both hardware and software.
job
The complete large-scale operation to perform in MATLAB, composed
of a set of tasks.
job manager
The MathWorks process that queues jobs and assigns tasks to workers.
A third-party process that performs this function is called a scheduler.
The general term "scheduler" can also refer to a job manager.
lab
When workers start, they work independently by default. They can
then connect to each other and work together as peers, and are then
referred to as labs.
LOGDIR
The name of the parameter in the mdce_def file that defines the
directory where logs are stored.
Glossary-3
Glossary
MATLAB client
See client.
MATLAB pool
A collection of labs that are reserved by the client for execution of
parfor-loops or spmd statements. See also lab.
MATLAB worker
See worker.
mdce
The service that has to run on all machines before they can run a job
manager or worker. This is the engine foundation process, making sure
that the job manager and worker processes that it controls are always
running.
Note that the program and service name is all lowercase letters.
mdce_def file
The file that defines all the defaults for the mdce processes by allowing
you to set preferences or definitions in the form of parameter values.
MPI
Message Passing Interface, the means by which labs communicate with
each other while running tasks in the same job.
node
A computer that is part of a cluster.
parallel application
The same application that runs on several labs simultaneously, with
communication, shared data, or synchronization points between the
labs.
private array
An array which resides in the workspaces of one or more, but perhaps
not all labs. There might or might not be a relationship between the
values of these arrays among the labs.
Glossary-4
Glossary
random port
A random unprivileged TCP port, i.e., a random TCP port above 1024.
register a worker
The action that happens when both worker and job manager are started
and the worker contacts job manager.
replicated array
An array which resides in the workspaces of all labs, and whose size and
content are identical on all labs.
scheduler
The process, either third-party or the MathWorks job manager, that
queues jobs and assigns tasks to workers.
task
One segment of a job to be evaluated by a worker.
variant array
An array which resides in the workspaces of all labs, but whose content
differs on these labs.
worker
The MATLAB session that performs the task computations. Also known
as the MATLAB worker or worker process.
Glossary-5
Glossary
Glossary-6
Index
A
Index codistributed object 12-4
arrayfun function 14-2 codistributed.build function 14-16
arrays codistributed.cell function 14-18
codistributed 5-4 codistributed.colon function 14-20
local 5-11 codistributed.eye function 14-22
private 5-4 codistributed.false function 14-24
replicated 5-2 codistributed.Inf function 14-26
types of 5-2 codistributed.NaN function 14-28
variant 5-3 codistributed.ones function 14-30
AttemptedNumberOfRetries property 16-2 codistributed.rand function 14-32
AuthorizedUsers property 16-3 codistributed.randn function 14-34
codistributed.spalloc function 14-36
codistributed.speye function 14-38
B codistributed.sprand function 14-40
batch function 14-5 codistributed.sprandn function 14-42
BlockSize property 16-5 codistributed.true function 14-44
BusyWorkers property 16-6 codistributed.zeros function 14-46
codistributor function 14-48
codistributor1d function 14-51
C
codistributor1d object 12-6
cancel function 14-9 codistributor1d.defaultPartition
CancelJobFcn property 16-7 function 14-54
CancelTaskFcn property 16-8 codistributor2dbc function 14-55
CaptureCommandWindowOutput property 16-9 codistributor2dbc object 12-7
ccsscheduler object 12-2 codistributor2dbc.defaultBlockSize
changePassword function 14-11 property 16-16
clear function 14-12 codistributor2dbc.defaultLabGrid
clearLocalPassword function 14-13 function 14-57
ClusterMatlabRoot property 16-11 CommandWindowOutput property 16-17
ClusterName property 16-12 Composite
ClusterOsType property 16-13 getting started 1-10
ClusterSize property 16-14 outside spmd 3-10
ClusterVersion property 16-15 Composite function 14-58
codistributed arrays Composite object 12-8
constructor functions 5-10 Computer property 16-19
creating 5-7 Configuration property 16-20
defined 5-4 configurations 6-16
indexing 5-15 importing and exporting 6-23
working with 5-5 using in application 6-27
codistributed function 14-14 validating 6-24
Index-1
Index
D
F
DataLocation property 16-25
defaultParallelConfig function 14-69 FailedAttemptInformation property 16-34
demote function 14-71 feval function 14-100
destroy function 14-73 FileDependencies property 16-35
DestroyJobFcn property 16-27 files
DestroyTaskFcn property 16-28 sharing 8-14
dfeval function 14-74 using an LSF scheduler 8-29
dfevalasync function 14-78 findJob function 14-102
diary function 14-80 findResource function 14-104
Dimension property 16-29 findTask function 14-109
distributed function 14-81 FinishedFcn property 16-38
distributed object 12-10 FinishTime property 16-40
distributed.cell function 14-82 for loop
distributed.eye function 14-83 distributed 14-111
distributed.false function 14-84 Function property 16-42
distributed.Inf function 14-85 functions
distributed.NaN function 14-86 arrayfun 14-2
distributed.ones function 14-87 batch 14-5
distributed.rand function 14-88 cancel 14-9
distributed.randn function 14-89 changePassword 14-11
distributed.spalloc function 14-90 clear 14-12
distributed.speye function 14-91 clearLocalPassword 14-13
distributed.sprand function 14-92 codistributed 14-14
distributed.sprandn function 14-93 codistributed.build 14-16
distributed.true function 14-94 codistributed.cell 14-18
distributed.zeros function 14-95 codistributed.colon 14-20
dload function 14-96 codistributed.eye 14-22
Index-2
Index
Index-3
Index
Index-4
Index
I L
ID property 16-47 labBarrier function 14-151
IdleWorkers property 16-49 labBroadcast function 14-152
importParallelConfig function 14-141 LabGrid property 16-59
InputArguments property 16-50 labindex function 14-154
inspect function 14-143 labProbe function 14-155
isaUnderlying function 14-145 labReceive function 14-156
iscodistributed function 14-146 labSend function 14-158
isComplete function 14-147 labSendReceive function 14-159
isdistributed function 14-148 length function 14-162
isreplicated function 14-149 load function 14-163
IsUsingSecureCommunication property 16-51 localscheduler object 12-23
LSF scheduler 8-21
lsfscheduler object 12-25
J
job
creating M
example 8-10 MasterName property 16-60
creating on generic scheduler MatlabCommandToRun property 16-61
example 8-45 matlabpool
creating on LSF or HPC Server scheduler parfor 2-3
example 8-25 spmd 3-3
life cycle 6-14 matlabpool function 14-165
local scheduler 8-3 matlabpooljob object 12-27
submitting to generic scheduler queue 8-47 MaximumNumberOfRetries property 16-62
submitting to local scheduler 8-5 MaximumNumberOfWorkers property 16-63
submitting to LSF or HPC Server scheduler methods function 14-171
queue 8-27 MinimumNumberOfWorkers property 16-64
submitting to queue 8-13 mpiexec object 12-30
job manager MpiexecFileName property 16-65
finding mpiLibConf function 14-173
example 8-3 8-8 mpiprofile function 14-175
job object 12-17 mpiSettings function 14-180
JobData property 16-52
JobDescriptionFile property 16-54
N
jobmanager object 12-20
JobManager property 16-55 Name property 16-66
Jobs property 16-56 NumberOfBusyWorkers property 16-68
jobStartup function 14-150 NumberOfIdleWorkers property 16-69
JobTemplate property 16-58 NumberOfOutputArguments property 16-70
Index-5
Index
Index-6
Index
Index-7
Index
Index-8
Index
T
W
Tag property 16-104
task wait function 14-223
creating waitForState function 14-225
example 8-12 Windows HPC Server scheduler 8-21
creating on generic scheduler worker object 12-53
example 8-46 Worker property 16-116
creating on LSF scheduler WorkerMachineOsType property 16-117
example 8-26
Index-9