Unit - 2 HPC
Unit - 2 HPC
We need to compute the products A[i, j] x b[j] for only those values of j for
which A[i, j]≠ 0.
For example, y[0] = A[0, 0].b[0] + A[0, 1].b[1] + A[0, 4].b[4] + A[0, 8].b[8].
Processes
• Process: The tasks, into which a problem is decomposed, run on physical
processors.
• It is an abstract entity that uses the code and data corresponding to a
task to produce the output of that task within a finite amount of time
after the task is activated by the parallel program.
• Process = Task + task data + task code required to produce the task’s
output
• Mapping:
• Processor: It is a hardware unit that physically performs the
computations.
Why use processes rather than processors?
• We refer to the mapping as being from tasks to processes, as opposed to
processors.
• This is because typical programming APIs do not allow easy binding of
tasks to physical processors.
• Rather, we aggregate tasks into processes and rely on the system to map
these processes to physical processors.
• We use processes, not in the UNIX sense of a process, rather, simply as a
collection of tasks and associated data.
Basis for Choosing Mapping
Task-dependency graph Task-interaction graph
Makes sure the max. concurrency Minimum communication.
Criteria of Mapping
1. Recursive decomposition
2. Data decomposition
3. Exploratory decomposition
4. Speculative decomposition
Recursive Decomposition Technique
• Ideal for problems to be solved by divide-and conquer
method.
Steps:
1.Decompose a problem into a set of independent
sub problems
2. Recursively decompose each sub-problem
3. Stop decomposition when minimum desired granularity
is achieved or (partial) result is obtained
Quick Sort Example
Recursive Decomposition for Finding Min
Find the minimum in an array of numbers A of length n
1. procedure SERIAL_MIN(A,n)
2. begin
3. min =A[0];
4. for i:= 1 to n − 1 do
5. if (A[i] < min) min := A[i];
6. endfor;
7. return min;
8. end SERIAL_MIN
Finding Min by using Recursive Procedure
1. procedure RECURSIVE_MIN (A, n)
2. begin
3. if ( n = 1 ) then
4. min := A [0] ;
5. else
6. lmin := RECURSIVE_MIN ( A, n/2 );
7. rmin := RECURSIVE_MIN ( &(A[n/2]), n - n/2 );
8. if (lmin < rmin) then
9. min := lmin;
10. else
11. min := rmin;
12. endelse;
13. endelse;
14. return min;
15. end RECURSIVE_MIN
Data Decomposition Technique:
• Ideal for problems that operate on large data structures
• Steps:
1. The data on which the computations are performed are partitioned
2. Data partition is used to induce a partitioning of the computations
into tasks.
• Data Partitioning –
Partition output data
Partition input data
Partition input + output data
Partition intermediate data
Data Decomposition Based on Partitioning Output Data
• If each element of the output
can be computed independently
of others as a function of the
input.
• Partitioning computations into
tasks is natural. Each task is
assigned with the work of
computing a portion of the
output.
Matrix Multiplication Example
Matrix-matrix multiplication: 𝐶 = 𝐴 × 𝐵
• Partition matrix C into 2× 2 submatrices
• Computation of C then can be partitioned into four tasks.
• A partitioning of output data does not result in a unique
decomposition into tasks.
• For example, for the same problem as in previous, with
identical output data distribution, we can derive the
following two (other) decompositions:
Problem: itemset in a database of transaction
• Consider the problem of computing the frequency of a set of itemsets in
a transaction database.
• In this problem there is a set T containing n transactions and a set I
containing m itemsets.
• Each transaction and itemset contains a small number of items, out of a
possible set of items.
Data Decomposition Based on Partitioning Input Data
• Ideal if output is a single unknown value or the
individual elements of the output can not be efficiently
determined in isolation.
• Example: Finding the minimum, maximum, or sum of a
set of numbers.
• Example. Sorting a set.
• Partitioning the input data and associating a task with
each partition of the input data.
Data Decomposition based on partitioning
• input/output data
Often input and output data decomposition can be combined for a
higher degree of concurrency.
• Decomposition based on partitioning input/output data is referred to
as the owner- computes rule.
• Each partition performs all the computations involving data that it
owns.
• Input data decomposition: – A task performs all the computations that
can be done using these input data.
• Output data decomposition: – A task computes all the results in the
partition assigned to it.
For the itemset counting example, the transaction set (input) and itemset counts
(output) can both be decomposed as follows:
Data Decomposition Based on Partitioning Intermediate
Data
• Applicable for problems which can be solved by
multi-stage computations such that the output of
one stage is the input to the subsequent stage.
• Partitioning can be based on input or output of an
intermediate stage.
Example: Dense matrix multiplication
• Original output data decomposition yields a maximum degree of
concurrency of 4.
Exploratory Decomposition
• In many cases, the decomposition of the problem goes hand-in-
hand with its execution.
• Exploratory decomposition is used to decompose problems
whose underlying computations correspond to a search of a
space for solutions.
• In exploratory decomposition, the search space is partitioned
into smaller parts, and search each one of these parts
concurrently, until the desired solutions are found.
• Example of Problems in this class include a variety of discrete
optimization problems, theorem proving, game playing, etc.
• A simple application of exploratory decomposition is in the solution to a
15 puzzle (a tile puzzle). We show a sequence of three moves that
transform a given initial state (a) to desired final state (d).
Speculative Decomposition
• In some applications, dependencies between tasks are not known a-
priori.
• For such applications, it is impossible to identify independent tasks.
• There are generally two approaches to dealing with such applications:
conservative approaches, which identify independent tasks only when
they are guaranteed to not have dependencies, and, optimistic
approaches, which schedule tasks even when they may potentially be
erroneous.
• Conservative approaches may yield little concurrency and optimistic
approaches may require roll-back mechanism in the case of an error.
Example
• A classic example of speculative decomposition is in discrete event
simulation.
• The central data structure in a discrete event simulation is a time-ordered
event list.
• Events are extracted precisely in time order, processed, and if required,
resulting events are inserted back into the event list.
• Consider your day today as a discrete event system - you get up, get
ready, drive to work, work, eat lunch, work some more, drive back, eat
dinner, and sleep.
• Each of these events may be processed independently, however, in
driving to work, you might meet with an unfortunate accident and not
get to work at all.
• Therefore, an optimistic scheduling of other events will have to be rolled
back.
• Another Example: The simulation of a network of nodes (for instance, an
assembly line or a computer network through which packets pass). The
task is to simulate the behavior of this network for various inputs and
node delay parameters (note that networks may become unstable for
certain values of service rates, queue sizes, etc.).
Example: A simple network for discrete event
simulation
Characteristics of Tasks
• Identify the concurrency that is available in a problem and decompose it
into tasks that can be executed in parallel.
• The nature of the tasks and the interactions among them has a bearing on
the mapping.
• The characteristics of these tasks critically impact choice and performance
of parallel algorithms.
• Relevant task characteristics include:
1. Task generation.
2. Task sizes.
3. Size of data associated with tasks.
Task Generation: The tasks that constitute a parallel algorithm may be generated
either statically or dynamically.
• Static task generation refers to where all the tasks are known before the
algorithm starts execution.
Task Sizes: the size of a task is the relative amount of time required to complete it.
• The complexity of mapping schemes often depends on whether or not the
tasks are uniform; i.e., whether or not they require roughly the same amount
of time. If the amount of time required by the tasks varies significantly, then
they are said to be non-uniform.
• If the size of all the tasks is known, then this information can often be
used in mapping of tasks to processes.
Examples of two-dimensional distributions of an array, (a) on a 4 x 4 process grid, and (b) on a 2 x 8 process grid.
• For multiplying two dense matrices A and B, we can partition the output
matrix C using a block decomposition.
• For load balance, we give each task the same number of elements of C.
(Note that each element of C corresponds to a single dot product.)
• The choice of precise decomposition (1-D or 2-D) is determined by the
associated communication overhead.
• In general, higher dimension decomposition allows the use of larger
number of processes.
• Example: 𝑛 × 𝑛 dense matrix multiplication 𝐶 = 𝐴 × 𝐵 using 𝑝
processes