VT 2010 Parfor
VT 2010 Parfor
1/1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
2/1
INTRO: Parallel MATLAB
3/1
INTRO: Local and Remote MATLAB Workers
4/1
INTRO: User Programming
5/1
INTRO: Execution
6/1
INTRO: PARFOR: Parallel FOR Loops
7/1
INTRO: ”SPMD” Single Program Multiple Data
8/1
INTRO: ”SPMD” Distributed Arrays
9/1
INTRO: ”TASK” Computing
10 / 1
INTRO: Direct Execution for PARFOR
Parallel MATLAB jobs can be run directly, that is, interactively.
The matlabpool command is used to reserve a given number of
workers on the local (or perhaps remote) machine.
Once these workers are available, the user can type commands, run
scripts, or evaluate functions, which contain parfor statements.
The workers will cooperate in producing results.
Interactive parallel execution is great for desktop debugging of
short jobs.
It’s an inefficient way to work on a cluster, though, because no one
else can use the workers until you release them!
So...don’t log into Ithaca interactively, and treat it as though it
was your desktop machine! In our examples, we will indeed use
Ithaca, but always through the indirect batch system.
11 / 1
INTRO: Indirect Execution for PARFOR
12 / 1
INTRO: ITHACA
13 / 1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
14 / 1
QUAD: Estimating an Integral
15 / 1
QUAD: The QUAD FUN Function
q = 0.0;
w = ( b − a ) / n;
for i = 1 : n
x = ( ( n − i ) ∗ a + ( i − 1 ) ∗ b ) / ( n − 1 );
fx = bessely ( 4.5 , x );
q = q + w ∗ fx ;
end
return
end
16 / 1
QUAD: Comments
17 / 1
QUAD: The Parallel QUAD FUN Function
q = 0.0;
w = ( b − a ) / n;
parfor i = 1 : n
x = ( ( n − i ) ∗ a + ( i − 1 ) ∗ b ) / ( n − 1 );
fx = bessely ( 4.5 , x );
q = q + w ∗ fx ;
end
return
end
18 / 1
QUAD: Comments
19 / 1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
20 / 1
EXECUTION: What Do You Need?
21 / 1
EXECUTION: Ways to Run
22 / 1
EXECUTION: Interactive MATLABPOOL
n = 10000; a = 0; b = 1;
matlabpool open local 4
q = quad_fun ( n, a, b );
matlabpool close
The word local is choosing the local configuration, that is, the
cores assigned to be workers will be on the local machine.
The value ”4” is the number of workers you are asking for. It can
be up to 8 on a local machine. It does not have to match the
number of cores you have.
23 / 1
EXECUTION: Indirect Local BATCH
n = 10000; a = 0; b = 1;
q = quad_fun ( n, a, b )
24 / 1
EXECUTION: Indirect Remote BATCH
The batch command can send your job anywhere, and get the
results back, as long as you have set up an account on the remote
machine, and you have defined a configuration on your desktop
that tells it how to access the remote machine.
At Virginia Tech, if your Ithaca account has been set up properly,
your desktop can send a batch job there as easily as running locally:
25 / 1
EXECUTION: Submitting a Job and Waiting
Doing this requires that you stay logged in so that the value of job
can be used to identify output to the load() command.
26 / 1
EXECUTION: Submitting a Job and Coming Back Later
If you don’t want to wait for a remote job to finish, you can exit
after the submit(), turn off your computer, and go home.
However, when you think your job has run, you now have to try to
retrieve the job identifier before you can load the results.
27 / 1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
28 / 1
MD: A Molecular Dynamics Simulation
29 / 1
MD: The Molecular Dynamics Example
30 / 1
MD: Profile the Sequential Code
>> profile on
>> md
>> profile viewer
MD: Where
Home is Execution Time Spent?
Profile Summary
Generated 27-Apr-2009 15:37:30 using cpu time.
Function Name Calls Total Time Self Time* Total Time Plot
(dark band = self time)
md 1 415.847 s 0.096 s
Self time is the time spent in a function excluding the time spent in its child functions. Self time also includes overhead res
the process of profiling.
32 / 1
MD: The COMPUTE Function
f = z e r o s ( nd , np ) ;
pot = 0 . 0 ;
f o r i = 1 : np
f o r j = 1 : np
i f ( i ˜= j )
r i j ( 1 : nd ) = p o s ( 1 : d , i ) − p o s ( 1 : nd , j ) ;
d = s q r t ( sum ( r i j ( 1 : nd ) . ˆ 2 ) ) ;
d2 = min ( d , p i / 2 . 0 ) ;
p o t = p o t + 0 . 5 ∗ s i n ( d2 ) ∗ s i n ( d2 ) ;
f ( 1 : nd , i ) = f ( 1 : nd , i ) − r i j ( 1 : nd ) ∗ s i n ( 2 . 0 ∗ d2 ) / d ;
end
end
end
k i n = 0 . 5 ∗ mass ∗ sum ( v e l ( 1 : nd , 1 : np ) . ˆ 2 ) ;
return
end
33 / 1
MD: Can We Use PARFOR?
The compute function fills the force vector f(i) using a for loop.
Iteration i computes the force on particle i, determining the
distance to each particle j, squaring, truncating, taking the sine.
The computation for each particle is “independent”; nothing
computed in one iteration is needed by, nor affects, the
computation in another iteration. We could compute each value on
a separate worker, at the same time.
The MATLAB command parfor will distribute the iterations of this
loop across the available workers.
Tricky question: Could we parallelize the j loop instead?
Tricky question: Could we parallelize both loops?
34 / 1
MD: Speedup
Replacing “for i” by “parfor i”, here is our speedup:
35 / 1
MD: Speedup
36 / 1
MD: PARFOR is Particular
37 / 1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
38 / 1
PRIME: The Prime Number Example
39 / 1
PRIME: The Sieve of Erastosthenes
40 / 1
PRIME: Program Text
f u n c t i o n t o t a l = prime ( n )
total = 0;
for i = 2 : n
prime = 1;
for j = 2 : i − 1
i f ( mod ( i , j ) == 0 )
prime = 0;
end
end
t o t a l = t o t a l + prime ;
end
return
end
41 / 1
PRIME: We can run this in parallel
42 / 1
PRIME: Local Execution With MATLABPOOL
m a t l a b p o o l ( ’ open ’ , ’ l o c a l ’ , 4 )
n = 50;
w h i l e ( n <= 500000 )
primes = prime parfor ( n ) ;
f p r i n t f ( 1 , ’ %8d %8d\n ’ , n , p r i m e s ) ;
n = n ∗ 10;
end
matlabpool ( ’ c l o s e ’ )
43 / 1
PRIME: Timing
PRIME_PARFOR_RUN
Run PRIME_PARFOR with 0, 1, 2, and 4 labs.
44 / 1
PRIME: Timing Comments
There are many thoughts that come to mind from these results!
Why does 500 take less time than 50? (It doesn’t, really).
How can ”1+1” take longer than ”1+0”?
(It does, but it’s probably not as bad as it looks!)
This data suggests two conclusions:
Parallelism doesn’t pay until your problem is big enough;
AND
Parallelism doesn’t pay until you have a decent number of workers.
45 / 1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
46 / 1
ODE: A Parameterized Problem
Consider a favorite ordinary differential equation, which describes
the motion of a spring-mass system:
d 2x dx
m 2
+b + k x = f (t)
dt dt
47 / 1
ODE: A Parameterized Problem
48 / 1
ODE: Each Solution has a Maximum Value
49 / 1
ODE: A Parameterized Problem
50 / 1
ODE: The Parallel Code
m = 5.0;
bVals = 0.1 : 0.05 : 5;
kVals = 1.5 : 0.05 : 5;
p e a k V a l s = nan ( s i z e ( k G r i d ) ) ;
tic ;
parfor i j = 1 : numel ( k G r i d )
[ T , Y ] = ode45 ( @( t , y ) o d e s y s t e m ( t , y , m, b G r i d ( i j ) , k G r i d ( i j ) ) , ...
[0 , 25] , [0 , 1] ) ;
p e a k V a l s ( i j ) = max ( Y ( : , 1 ) );
end
toc ;
51 / 1
ODE: MATLABPOOL or BATCH Execution
%
% Display the r e s u l t s .
%
figure ;
t i t l e ( ’ R e s u l t s o f ODE P a r a m e t e r Sweep ’ )
x l a b e l ( ’ Damping B ’ ) ;
ylabel ( ’ Stiffness K’ );
z l a b e l ( ’ Peak D i s p l a c e m e n t ’ ) ;
v i e w ( 5 0 , 30 )
53 / 1
ODE: A Parameterized Problem
54 / 1
ODE: A Very Loosely Coupled Calculation
55 / 1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
56 / 1
FMINCON: Hidden Parallelism
A*X <= B,
Aeq*X = Beq (linear constraints)
C(X) <= 0,
Ceq(X) = 0 (nonlinear constraints)
LB <= X <= UB (bounds)
58 / 1
FMINCON: Riding the Helpful Current
59 / 1
FMINCON: Hidden Parallelism
60 / 1
MATLAB Parallel Computing
Introduction
QUAD Example
Executing a PARFOR Program
MD Example
PRIME Example
ODE SWEEP Example
FMINCON Example
Conclusion
61 / 1
CONCLUSION: Summary of Examples
62 / 1
CONCLUSION: Summary of Examples
63 / 1
Conclusion: Desktop Experiments
64 / 1
Conclusion: Cluster Experiments
http://www.arc.vt.edu/index.php
65 / 1
Conclusion: Desktop-to-Cluster Submission
If you want to use parallel MATLAB regularly, you may want to set
up a way to submit jobs from your desktop to Ithaca, without
logging in directly.
This requires defining a configuration file on your desktop, adding
some scripts to your MATLAB directory, and setting up a secure
connection to Ithaca.
The steps for doing this are described in the document:
https://portal.arc.vt.edu/matlab/...
RemoteMatlabSubmission.pdf
66 / 1
Conclusion: VT MATLAB LISTSERV
67 / 1
CONCLUSION: Where is it?
68 / 1