MIMD4
MIMD4
MIMD4
There are several terms which are often used in a confusing way.
Definition:
Multiprocessors are computers capable of running multiple instruction
streams simultaneously to cooperatively execute a single program.
Definition:
Multiprogramming is the sharing of computing equipment by many
independent jobs. They interact only through their requests for the same
resources. Multiprocessors can be used to multiprogram single stream
programs.
Definition:
A process is a dynamic instance of an instruction stream. It is a
combination of code and process state, for example program counter and
the status words.
Multiprocessing is either:
or
Cluster:
Data may be aggregated into long messages before being sent into
the interconnecting switch.
Large data transmissions may mask long and variable latency in the
communications network.
The distinctions are often in the details of the low level switching protocol
rather than in high level switch topology:
Each processor
connects directly with
log2N others, whose
indices are obtained by
changing one bit of the
binary number of the
reference processor
(gray code). Up to
log2N steps are needed
to transmit a message
between processors.
G. Alaghband Fundamentals of Parallel 14, MIMD
Processing
Form of a four dimensional hypercube
1. At the level of mcode in the K.map, there are explicit send and
receive instructions and message passing software \ No it is not shared
memory.
All accesses of CE’s and IP’s to the bus are through cache memory.
There can be up to 521 Kbytes of cache shared by CE’s and up to 128 Kbytes
of cache shared by IP’s. Every 3 Ip’s share 32 Kbytes of cache.
Each IP contains a Motorola 68000 CPU. IP’s are used for interactive
processes and IO.
CE’s have custom chips to support M68000 instructions and floating point
instructions (Weitek processor chip), vector arithmetic instructions, and
concurrency instructions.
The vector registers are 32-element long for both integers and floating point
types
G. Alaghband Fundamentals of Parallel 22, MIMD
Processing
G. Alaghband Fundamentals of Parallel 23, MIMD
Processing
Programming Shared Memory Multiprocessors
Key Features needed to Program Shared memory MIMD Computers:
• Process Management:
– Fork/Join
– Create/Quit
– Parbegin/Parend
• Data Sharing:
– Shared Variables
– Private Variables
• Synchronization:
– Controlled-based:
» Critical Sections
» Barriers
– Data-based:
» Lock/Unlock
» Produce/Consume
where
• Control Oriented:
Progress past some point in the program is controlled.(Critical Sections)
• Data Oriented:
Access to data item is controlled by the state of the data item. (Lock and
Unlock)
Process 1 Process 2
••••••• •••••••
Critical Critical
code body1 code body2
End critical End critical
Definitions/Notations:
In other words Test&Set returns the old value of v and sets it to true
regardless of its previous value.
Now we can implement the critical section entry and exit sections using
Test&Set and Swap instructions:
Definition:
Semaphore S is a
• Shared Integer variable and
• can only be accessed through 2 indivisible operations P(S) and V(S)
P(S) : S:= S - 1;
If S < 0 Then Block(S);
V(S) : S:= S + 1;
If S 0 Then Wakeup(S);
P(mutex);
•••
Critical section
V(mutex);
UNLOCK L: L=0
Copy private var = asynch var - wait for full, read value,
don’t change state.
Void asynch var - Initialize the state to empty.
Note: Private variable privf is used to obtain a copy of the shared Vf, the state of
the asynch variable, before the process attempts to perform the Produce operation.
This way if the test in statement number 3 reveals that the state is full, then the
process returns to 1 and tries again.
G. Alaghband Fundamentals of Parallel 50, MIMD
Processing
Problem (4-11):
procedure produce(X,V) {
R : lock(X.l)
if X.full then
{ unlock(X.l) ;
goto R }
else
{X.full := true ;
X.value := V ;
unlock(X.l) } }
G. Alaghband Fundamentals of Parallel 52, MIMD
Processing
procedure consume(X,V)
{
R : lock(X.l)
if not X.full then
{ unlock(X.l) ;
goto R }
else
{ V := X.value ;
X.full := false ;
unlock(X.l) }
}
Different systems provide alternative ways to fork and join processes. Some
common alternatives are outlined bellow:
• Instead of Fork Label, the Unix fork gives two identical processes
returning from the fork call with a return value being the only distinction between
the two processes.
Lets use:
CREATE subr (A, B, C, ...) starts a subroutine in parallel with the main
program. parameters are passed by reference.
Do 10 I = 1, N-1
10 CREATE sub(I)
Problem?
Assume integers IC, N and logical DONE are in Shared Common, the main
program executes the code:
Void IC
Void DONE
Produce IC=N
Do 10 I = 1, N-1 forks N stream of
10 CREATE proc(...) processes
CALL proc(...) calling process continues
• • • • •
Consume F= DONE
DONE was voided initially
G. Alaghband Fundamentals of Parallel 61, MIMD
Processing
…
The SPMD style of programming is almost the only choice for managing
many processes in a so-called massively parallel processor (MPP) with
hundreds or thousands of processors.
Block mapping
Cyclic mapping
Serial imperfect two loop nest Split into parallel perfect nests
A simple example to show the dynamic scheduling concept where the amount of
work to be done depends on the outcome of the ongoing computation.
a b
G. Alaghband Fundamentals of Parallel 72, MIMD
Processing
Sequential procedure for this integration can be described with
the following steps:
3)
a. If the error is small enough add contribution to
the integral.
b. If not, split the interval in two and recursively
do each half.
3)
a. If the error is small enough cooperatively add to
the integral and quit.
b. If not, split interval into two, create a process to
do one half, and do the other half.
5. Go to step 1.
G. Alaghband Fundamentals of Parallel 76, MIMD
Processing
approx(a, b, f): Returns an approximation to the integral of f
over the interval (a, b),
getwork(task): Returns true and a task from the work list is not
empty and false otherwise,
Task, task1, task2: current task and the two tasks (1 & 2) resulting
from task split, private
shared P, idle, integral;
private more, t, ok, cent, task, task1, task2;
so the outer while does not terminate until all processes have failed to
find more work to do (break statement).
The order of putting work onto, and taking it from, the list is important.
Breadth-first order adds more and more tasks to the work list, generating
new work much faster than work is completed until the end of the
computation nears. NOT DESIRABLE.
Properly managed, the framework of a set of processes cooperatively
accessing a shared work list can be a very effective form of dynamic
scheduling.
G. Alaghband Fundamentals of Parallel 80, MIMD
Processing
OpenMP
• A language extension
• Extensions exists for C, C++. And Fortran
(API)
• OpenMP constructs are limited to compiler
directives and library subroutine calls
(!$OMP), so the base language, so
OpenMP programs also correspond to
legal programs in the base language.
SUBROUTINE OMP_SET_NUM_THREADS(integer)
PRIVATE(list)
SHARED(list)
X = X operator expression
or
X = intrinsic(X, expression)
PROGRAM MAIN
INTEGER K
REAL A(10), X
CALL INPUT(A)
CALL OMP_SET_NUM_THREADS(10)
!$OMP PARALLEL SHARED(A, X) PRIVATE(K) REDUCTION(+:X)
K = OMP_GET_THREAD_NUM()
X = X + A(K+1)
!$OMPEND PARALLEL
PRINT *, ‘Sum of As: ‘, X
STOP
END
SUBROUTINE OMP_SET_NUM_THREADS(integer)
Implicit in: BARRIER, CRITICAL, END CRITICAL, END DO, END SECTIONS,
END PARALLEL, END SINGLE, ORDERED, END ORDERED (unless
NOWAIT)
G. Alaghband Fundamentals of Parallel 99, MIMD
Processing
OpenMP Fortran synchronization constructs
SUBROUTINE OMP_DESTROY_LOCK(var)
Destroy lock var, where var is type integer
SUBROUTINE OMP_SET_LOCK(var) Wait until lock var is unset, then set it
do i = 1, nsteps
call compute(n, pos, vel, mass, f, pot, kin)
call update(n, pos, vel, f, a, mass, dt)
enddo
This is a minimal change to the program, but threads are created and
terminated twice for each time step. If process management overhead is
large, this can slow the program significantly.
A better method is described in the text, see Program 4-13,
We will present the idea of global parallelism by introducing the
FORCE Parallel Language
G. Alaghband Fundamentals of Parallel 106, MIMD
Processing
Development of Parallel Programming Languages
Issues to consider:
• SPMD:
Single program executed by many processes
• Global parallelism:
parallel execution is the norm,
sequential execution must be explicitly specified
• Generic Synchronizations:
Synch. operations do not identify specific
processes. They use quantifiers such as all, none,
only one, or state of a variable.
GMAX=0.0
End barrier *********************
Forcecall <name>([parameters])
G. Alaghband Fundamentals of Parallel 116, MIMD
Processing
Variable Declarations (6 Types):
Cyclic mapping:
Shared i1, i2, np;
Private var, me;
for var := i1+ me*i3 step np*i3 until i2,
loop body
Block mapping:
Barrier
<code block>
End barrier
• Data oriented: