0% found this document useful (0 votes)

10 views96 pages

Unit - 2 HPC

Uploaded by

Meet Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views96 pages

Unit - 2 HPC

Uploaded by

Meet Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 96

Unit- 2

Parallel Algorithm Design

By: NEETHU V A
Assistant Professor,
CSE department, PIET, PU
 Algorithm
 Sequential Algorithm
 Parallel Algorithm
 Principles of Parallel Algorithms
 Preliminaries
Example: Multiplying a Dense Matrix with a Vector
Example: Database Query Processing
ID# Model Year Color Dealer Price
4523 Civic 2002 Blue MN $18,000
3476 Corolla 1999 White IL $15,000
7623 Camry 2001 Green NY $21,000
9834 Prius 2001 Green CA $18,000
6734 Civic 2001 White OR $17,000
5342 Altima 2001 Green FL $19,000
3845 Maxima 2001 Blue NY $22,000
8354 Accord 2000 Green VT $18,000
4395 Civic 2001 Red CA $17,000
7352 Civic 2002 Red WA $18,000
Consider the execution of the query:

MODEL = ``CIVIC'' AND YEAR = 2001 AND (COLOR

= ``GREEN'' OR COLOR = ``WHITE)
Task-dependency graph
Example: Database Query Processing
ID# Model Year Color Dealer Price
4523 Civic 2002 Blue MN $18,000
3476 Corolla 1999 White IL $15,000
7623 Camry 2001 Green NY $21,000
9834 Prius 2001 Green CA $18,000
6734 Civic 2001 White OR $17,000
5342 Altima 2001 Green FL $19,000
3845 Maxima 2001 Blue NY $22,000
8354 Accord 2000 Green VT $18,000
4395 Civic 2001 Red CA $17,000
7352 Civic 2002 Red WA $18,000
An alternate decomposition of the given problem into subtasks, along
with their
data dependencies.
Fine-grained granularity decomposition of task
Coarse-grained granularity decomposition of task
Abstraction of task graphs
Limits on Parallel Performance
 Types of granularity, utilizing the resulting concurrency
 Facing inherit bound
Task Interaction Graph
 To capture interactions among tasks
 Node = task
 Edge(undirected/directed) = interaction or data
exchange

 Task-dependency graph vs. task-interaction graph

Example: Sparse matrix vector multiplication
Example: Sparse matrix vector multiplication
Tasks: each task computes an entry of y[]
• Assign ith row of A to Task i. Also assign b[i] to Task i.

We need to compute the products A[i, j] x b[j] for only those values of j for
which A[i, j]≠ 0.
For example, y[0] = A[0, 0].b[0] + A[0, 1].b[1] + A[0, 4].b[4] + A[0, 8].b[8].
Processes
• Process: The tasks, into which a problem is decomposed, run on physical
processors.
• It is an abstract entity that uses the code and data corresponding to a
task to produce the output of that task within a finite amount of time
after the task is activated by the parallel program.
• Process = Task + task data + task code required to produce the task’s
output
• Mapping:
• Processor: It is a hardware unit that physically performs the
computations.
Why use processes rather than processors?
• We refer to the mapping as being from tasks to processes, as opposed to
processors.
• This is because typical programming APIs do not allow easy binding of
tasks to physical processors.
• Rather, we aggregate tasks into processes and rely on the system to map
these processes to physical processors.
• We use processes, not in the UNIX sense of a process, rather, simply as a
collection of tasks and associated data.
Basis for Choosing Mapping
Task-dependency graph Task-interaction graph
Makes sure the max. concurrency Minimum communication.
Criteria of Mapping

1.Mapping independent tasks to different processes.

2.Assigning tasks on critical path to processes as soon as they

become available.

3.Minimizing interaction between processes by mapping tasks

with dense interactions to the same process.
Mapping Database Query to Processes

4 processes can be used in total since the max. concurrency is 4.

• Assign all tasks within a level to different processes.
Decomposition Techniques

1. Recursive decomposition
2. Data decomposition
3. Exploratory decomposition
4. Speculative decomposition
Recursive Decomposition Technique
• Ideal for problems to be solved by divide-and conquer
method.
Steps:
1.Decompose a problem into a set of independent
sub problems
2. Recursively decompose each sub-problem
3. Stop decomposition when minimum desired granularity
is achieved or (partial) result is obtained
Quick Sort Example
Recursive Decomposition for Finding Min
Find the minimum in an array of numbers A of length n
1. procedure SERIAL_MIN(A,n)
2. begin
3. min =A[0];
4. for i:= 1 to n − 1 do
5. if (A[i] < min) min := A[i];
6. endfor;
7. return min;
8. end SERIAL_MIN
Finding Min by using Recursive Procedure
1. procedure RECURSIVE_MIN (A, n)
2. begin
3. if ( n = 1 ) then
4. min := A [0] ;
5. else
6. lmin := RECURSIVE_MIN ( A, n/2 );
7. rmin := RECURSIVE_MIN ( &(A[n/2]), n - n/2 );
8. if (lmin < rmin) then
9. min := lmin;
10. else
11. min := rmin;
12. endelse;
13. endelse;
14. return min;
15. end RECURSIVE_MIN
Data Decomposition Technique:
• Ideal for problems that operate on large data structures
• Steps:
1. The data on which the computations are performed are partitioned
2. Data partition is used to induce a partitioning of the computations
into tasks.
• Data Partitioning –
Partition output data
Partition input data
Partition input + output data
Partition intermediate data
Data Decomposition Based on Partitioning Output Data
• If each element of the output
can be computed independently
of others as a function of the
input.
• Partitioning computations into
tasks is natural. Each task is
assigned with the work of
computing a portion of the
output.
Matrix Multiplication Example

Matrix-matrix multiplication: 𝐶 = 𝐴 × 𝐵
• Partition matrix C into 2× 2 submatrices
• Computation of C then can be partitioned into four tasks.
• A partitioning of output data does not result in a unique
decomposition into tasks.
• For example, for the same problem as in previous, with
identical output data distribution, we can derive the
following two (other) decompositions:
Problem: itemset in a database of transaction
• Consider the problem of computing the frequency of a set of itemsets in
a transaction database.
• In this problem there is a set T containing n transactions and a set I
containing m itemsets.
• Each transaction and itemset contains a small number of items, out of a
possible set of items.
Data Decomposition Based on Partitioning Input Data
• Ideal if output is a single unknown value or the
individual elements of the output can not be efficiently
determined in isolation.
• Example: Finding the minimum, maximum, or sum of a
set of numbers.
• Example. Sorting a set.
• Partitioning the input data and associating a task with
each partition of the input data.
Data Decomposition based on partitioning
• input/output data
Often input and output data decomposition can be combined for a
higher degree of concurrency.
• Decomposition based on partitioning input/output data is referred to
as the owner- computes rule.
• Each partition performs all the computations involving data that it
owns.
• Input data decomposition: – A task performs all the computations that
can be done using these input data.
• Output data decomposition: – A task computes all the results in the
partition assigned to it.
For the itemset counting example, the transaction set (input) and itemset counts
(output) can both be decomposed as follows:
Data Decomposition Based on Partitioning Intermediate
Data
• Applicable for problems which can be solved by
multi-stage computations such that the output of
one stage is the input to the subsequent stage.
• Partitioning can be based on input or output of an
intermediate stage.
Example: Dense matrix multiplication
• Original output data decomposition yields a maximum degree of
concurrency of 4.
Exploratory Decomposition
• In many cases, the decomposition of the problem goes hand-in-
hand with its execution.
• Exploratory decomposition is used to decompose problems
whose underlying computations correspond to a search of a
space for solutions.
• In exploratory decomposition, the search space is partitioned
into smaller parts, and search each one of these parts
concurrently, until the desired solutions are found.
• Example of Problems in this class include a variety of discrete
optimization problems, theorem proving, game playing, etc.
• A simple application of exploratory decomposition is in the solution to a
15 puzzle (a tile puzzle). We show a sequence of three moves that
transform a given initial state (a) to desired final state (d).
Speculative Decomposition
• In some applications, dependencies between tasks are not known a-
priori.
• For such applications, it is impossible to identify independent tasks.
• There are generally two approaches to dealing with such applications:
conservative approaches, which identify independent tasks only when
they are guaranteed to not have dependencies, and, optimistic
approaches, which schedule tasks even when they may potentially be
erroneous.
• Conservative approaches may yield little concurrency and optimistic
approaches may require roll-back mechanism in the case of an error.
Example
• A classic example of speculative decomposition is in discrete event
simulation.
• The central data structure in a discrete event simulation is a time-ordered
event list.
• Events are extracted precisely in time order, processed, and if required,
resulting events are inserted back into the event list.
• Consider your day today as a discrete event system - you get up, get
ready, drive to work, work, eat lunch, work some more, drive back, eat
dinner, and sleep.
• Each of these events may be processed independently, however, in
driving to work, you might meet with an unfortunate accident and not
get to work at all.
• Therefore, an optimistic scheduling of other events will have to be rolled
back.
• Another Example: The simulation of a network of nodes (for instance, an
assembly line or a computer network through which packets pass). The
task is to simulate the behavior of this network for various inputs and
node delay parameters (note that networks may become unstable for
certain values of service rates, queue sizes, etc.).
Example: A simple network for discrete event
simulation
Characteristics of Tasks
• Identify the concurrency that is available in a problem and decompose it
into tasks that can be executed in parallel.
• The nature of the tasks and the interactions among them has a bearing on
the mapping.
• The characteristics of these tasks critically impact choice and performance
of parallel algorithms.
• Relevant task characteristics include:
1. Task generation.
2. Task sizes.
3. Size of data associated with tasks.
Task Generation: The tasks that constitute a parallel algorithm may be generated
either statically or dynamically.
• Static task generation refers to where all the tasks are known before the
algorithm starts execution.
Task Sizes: the size of a task is the relative amount of time required to complete it.
• The complexity of mapping schemes often depends on whether or not the
tasks are uniform; i.e., whether or not they require roughly the same amount
of time. If the amount of time required by the tasks varies significantly, then
they are said to be non-uniform.
• If the size of all the tasks is known, then this information can often be
used in mapping of tasks to processes.

Size of Data Associated with Tasks:

• It is important because the data associated with a task must be available
to the process performing that task, and the size and the location of
these data may determine the process that can perform the task without
incurring excessive data-movement overheads.
Characteristics of Inter-Task Interactions
• Tasks need to interact with each other to share data, work, or synchronization
information.
• Different parallel algorithms require different types of interactions among concurrent
tasks.
• The nature of these interactions makes them more suitable for certain programming
paradigms and mapping schemes
1) Static versus dynamic –
Static: interactions are known prior to execution.
2) Regular versus irregular –
Regular: interaction pattern can be exploited for efficient implementation.
3) Read-only versus read-write
4) One-way versus two-way
Static vs. Dynamic Interactions
• Static interaction –
• Tasks and associated interactions are predetermined:
• Task-interaction graph and times that interactions occur are
known:
• Example: matrix multiplication – Easy to program
• Dynamic interaction –
• Timing of interaction or sets of tasks to interact with can not be
determined prior to the execution.
• Difficult to program using massage-passing;
• Shared memory space programming may be simple
Regular vs. Irregular Interactions
• Regular interactions –
• Interaction has a spatial structure that can be exploited for
efficient implementation:
• ring, mesh
• Irregular Interactions –
• Interactions has no well-defined structure
• Example: Sparse matrix-vector multiplication
Example: Image dithering
Read-Only versus Read-Write
• Read-Only interactions:- tasks require only a read-access
to the data shared among many concurrent tasks.

• Read-Write interactions:- multiple tasks need to read

and write access on some shared data.
One-way versus Two-way
• One-way interaction:-
only one of a pair of communicating tasks initiates the interaction
and completes it without interrupting the other one.
• Two-way interactions:-
The data or work needed by a task or a subset of tasks is
explicitly supplied by another task or subset of tasks, called which usually
involve predefined producer and consumer tasks..
• All read-only interactions can be formulated as one-way interactions.
• Read-write interactions can be either one way or two-way.
Mapping Techniques for Load Balancing
• Problem  Decomposed in to no. of tasks  these tasks are mapped onto
processes
• Objective Minimize the execution time
 In order to achieve a small execution time, the overheads
of executing the tasks in parallel must be minimized.
• For a given decomposition, there are two key sources of overhead:
1. time spent in inter-process interaction and
2. time that some processes may spend being idle.
• A good mapping must ensure that the computations and interactions among
processes at each stage of the execution of the parallel algorithm are well balanced.
Dense Matrix example
Schemes for Static Mapping
• Mapping must simultaneously minimize idling and load balance.
• It distributes the tasks among processes prior to the execution
of the algorithm.
• For this to work, we must have a good estimate of the size of
each task.
• Types of techniques:
1. Mapping Based on Data Partitioning
2. Task Graph Partitioning
3. Hybrid Strategies
Mapping Based on Data Partitioning
• By owner-computes rule, mapping the relevant data onto processes is
equivalent to mapping tasks onto processes
• Mappings based on partitioning, two of the most common ways of
representing data in algorithms are arrays (or matrices) and graphs.
• Array or Matrices regular data
– Block distributions
– Cyclic and block cyclic distributions
• Irregular Data
– Graph partitioning
Block Distribution
• In these distributions, n-dimensional array is distributed among
the processes such that each process receives a contiguous block
of array entries along a specified subset of array dimensions.

• Consider 1-D of dense matrix example

• No. of processes = 8
• Hint: Distribute rows or columns of matrix to different processes
1-D Dense matrix multiplication
Multi-D Block Distribution
• Distribute blocks of matrix to different processes

Examples of two-dimensional distributions of an array, (a) on a 4 x 4 process grid, and (b) on a 2 x 8 process grid.
• For multiplying two dense matrices A and B, we can partition the output
matrix C using a block decomposition.
• For load balance, we give each task the same number of elements of C.
(Note that each element of C corresponds to a single dot product.)
• The choice of precise decomposition (1-D or 2-D) is determined by the
associated communication overhead.
• In general, higher dimension decomposition allows the use of larger
number of processes.
• Example: 𝑛 × 𝑛 dense matrix multiplication 𝐶 = 𝐴 × 𝐵 using 𝑝
processes

• Each entry of 𝐶 use the same amount of computation.

• Decomposition based on output data.

• Either 1D or 2D block distribution can be used

• Multi-D distribution allows higher degree of concurrency.
• Multi-D distribution can also help to reduce interactions
Block Distribution
Block Cyclic Distributions
• If the amount of work differs for different entries of a matrix, a block
distribution can lead to load imbalances.
• Example. Doolittle’s method of LU factorization of dense matrix
• The amount of computation increases from the top left to the bottom
right of the matrix.
Work used to compute Entries of L and U
• Block distribution of LU factorization tasks leads to load imbalance.
Block-Cyclic Distribution
• A variation of block distribution that can be used to alleviate the
load-imbalance.
• Steps:
1. Partition an array into many more blocks than the
number of available processes
2. Assign blocks to processes in a round-robin manner so
that each process gets several non-adjacent blocks.
Cyclic Distribution
• A cyclic distribution is an extreme case of a block-cyclic
distribution and can result in an almost perfect load balance due
to the extreme fine-grained underlying decomposition.
• Cyclic Distribution  when the block size =1
 achieve fine grained granularity
Task Graph Partitioning

• Partitioning a given task-dependency graph across processes.

• Determining an optimal mapping for a general task-dependency

graph is an NP-complete problem.

• Excellent heuristics exist for structured graphs.

• In case of sparse matrices, block decompositions are more complex.

• Consider the problem of multiplying a sparse matrix with a vector.

• The graph of the matrix is a useful indicator of the work (number of

nodes) and communication (the degree of each node).

• In this case, we would like to partition the graph so as to assign

equal number of nodes to each process, while minimizing edge
count of the graph partition.
Task Graph Partitioning
Sparse-matrix vector multiplication
Dynamic Mapping Techniques
• Dynamic mapping is sometimes also referred to as dynamic load
balancing, since load balancing is the primary motivation for dynamic
mapping.

• Dynamic mapping is necessary in situations where a static mapping may

result in a highly imbalanced distribution of work among processes

• Dynamic mapping schemes can be

1. centralized
2. distributed
Centralized Schemes
• All executable tasks are maintained in a common central data structure or they are
maintained by a special process or a subset of processes.
• Processes are designated as masters or slaves.
• When a process runs out of work, it requests the master for more work.
• When the number of processes increases, the master may become the bottleneck.
• To alleviate this, a process may pick up a number of tasks (a chunk) at one time.
This is called Chunk scheduling.
• Selecting large chunk sizes may lead to significant load imbalances as well. A
number of schemes have been used to gradually decrease chunk size as the
computation progresses.
Distributed Schemes
• Each process can send or receive work from other processes.

• This alleviates the bottleneck in centralized schemes.

• There are four critical questions:

• How are sending and receiving processes paired together,
• Who initiates work transfer,
• How much work is transferred, and
• When is a transfer triggered?
Methods for Containing Interaction Overheads
• Maximize data locality: Where possible, reuse intermediate data. Restructure
computation so that data can be reused in smaller time windows.
• Minimize volume of data exchange: There is a cost associated with each word
that is communicated. For this reason, we must minimize the volume of data
communicated.
• Minimize frequency of interactions: There is a startup cost associated with
each interaction. Therefore, try to merge multiple interactions to one, where
possible.
• Minimize contention and hot-spots: Use decentralized techniques, replicate
data where necessary.
Overlapping computations with interactions: Use non-blocking
communications, multithreading, and prefetching to hide latencies.

Replicating data or computations.

Using group communications instead of point-to-point primitives.

Overlap interactions with other interactions.

Parallel Algorithm Models
• An algorithm model is a way of structuring a parallel algorithm by selecting a
decomposition and mapping technique and applying the appropriate strategy to
minimize interactions.
• Data Parallel Model: Tasks are statically (or semi-statically) mapped to processes and
each task performs similar operations on different data.
• Usually based on data decomposition followed by static mapping
• Uniform partitioning of data followed by static mapping guarantees load balance
• Example algorithm: dense matrix multiplication
• Task Graph Model: Starting from a task dependency graph, the interrelationships
among the tasks are utilized to promote locality or to reduce interaction costs.
• Typically used to solve problems where amount of data associated with a task is
large relative to computation
• Static mapping usually used to optimize data movement costs
• Example algorithm: parallel quicksort, sparse matrix factorization
• Master-Slave Model: One or more processes generate work and allocate it
to worker processes. This allocation may be static or dynamic.
• Pipeline / Producer-Consumer Model: A stream of data is passed through a
succession of processes, each of which perform some task on it.
• Hybrid Models: A hybrid model may be composed either of multiple models
applied hierarchically or multiple models applied sequentially to different
phases of a parallel algorithm.

AWS DE Certification Guide 1728124415
No ratings yet
AWS DE Certification Guide 1728124415
112 pages
Co 2
No ratings yet
Co 2
22 pages
Lotchi Dagbo
No ratings yet
Lotchi Dagbo
5 pages
Postgresql DBA Architecture
100% (1)
Postgresql DBA Architecture
60 pages
Lecture 7 Decomposition Techniques
No ratings yet
Lecture 7 Decomposition Techniques
22 pages
Academic Year: 2020: Course Title: Data Structures and Algorithms Lab
No ratings yet
Academic Year: 2020: Course Title: Data Structures and Algorithms Lab
7 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
Iit Roorkee Full Stack Software Dev
No ratings yet
Iit Roorkee Full Stack Software Dev
17 pages
Chapter 3 - Principles of Parallel Algorithm Design
No ratings yet
Chapter 3 - Principles of Parallel Algorithm Design
52 pages
Unit 2
No ratings yet
Unit 2
151 pages
De Thi 2
No ratings yet
De Thi 2
4 pages
Adbms PDF
No ratings yet
Adbms PDF
199 pages
HPC - Unit-2 Insem Notes
No ratings yet
HPC - Unit-2 Insem Notes
99 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
72 pages
FoP HPC Unit II
No ratings yet
FoP HPC Unit II
107 pages
Unit 2 HPC
No ratings yet
Unit 2 HPC
92 pages
Unit 2
No ratings yet
Unit 2
81 pages
3.1.3 Processes and Mapping (1/5)
No ratings yet
3.1.3 Processes and Mapping (1/5)
74 pages
SQL Online Interview QA2
No ratings yet
SQL Online Interview QA2
58 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
63 pages
Parallel Computing Unit 3 - Principles of Parallel Computing Design
No ratings yet
Parallel Computing Unit 3 - Principles of Parallel Computing Design
78 pages
Unit 2
No ratings yet
Unit 2
64 pages
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
89 pages
Algo - 1
No ratings yet
Algo - 1
54 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Pda 4
No ratings yet
Pda 4
82 pages
Module 3 - Principles of Parallel Algorithm Design
No ratings yet
Module 3 - Principles of Parallel Algorithm Design
39 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
Chap3 Slides Week4
No ratings yet
Chap3 Slides Week4
42 pages
Lecture 02 - Conceptual Phase and ERD
No ratings yet
Lecture 02 - Conceptual Phase and ERD
38 pages
Chapter Overview: Algorithms and Concurrency: - Introduction To Parallel Algorithms
No ratings yet
Chapter Overview: Algorithms and Concurrency: - Introduction To Parallel Algorithms
84 pages
6-Decomposition Techniques
No ratings yet
6-Decomposition Techniques
30 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
78 pages
AA Part1
No ratings yet
AA Part1
43 pages
Hotel Management - Tanmaya
No ratings yet
Hotel Management - Tanmaya
27 pages
Caltech Post Graduate Program in Data Science Brochure2123
No ratings yet
Caltech Post Graduate Program in Data Science Brochure2123
37 pages
RG2 ParallelizationPrinciples HPCAI Jan2020
No ratings yet
RG2 ParallelizationPrinciples HPCAI Jan2020
40 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
The Sieve of Eratosthenes
No ratings yet
The Sieve of Eratosthenes
68 pages
In3200 Chap05
No ratings yet
In3200 Chap05
34 pages
MIT6 172F10 Lec03
No ratings yet
MIT6 172F10 Lec03
75 pages
Processes and Mapping, Decomposition Techniques
No ratings yet
Processes and Mapping, Decomposition Techniques
28 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
IDM Introduction
No ratings yet
IDM Introduction
33 pages
Lecture 7 Decomposition Techniques
No ratings yet
Lecture 7 Decomposition Techniques
22 pages
Common PDC Module3
No ratings yet
Common PDC Module3
43 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
No ratings yet
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
28 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
28 pages
Student Attendance System Using QR Code
No ratings yet
Student Attendance System Using QR Code
9 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
Unit 2 - Part - 1
No ratings yet
Unit 2 - Part - 1
32 pages
Data and Work Partitioning
No ratings yet
Data and Work Partitioning
19 pages
Introduction To Tableau: Data Visualization With Tableau
No ratings yet
Introduction To Tableau: Data Visualization With Tableau
17 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
Fallsem2021-22 Cse4001 Eth Vl2021220104087 Reference Material I 02-09-2021 Module 3 Decomposition Techniques
No ratings yet
Fallsem2021-22 Cse4001 Eth Vl2021220104087 Reference Material I 02-09-2021 Module 3 Decomposition Techniques
21 pages
Prof. Ram Meghe Institute of Technology & Research Badnera, Amravati (M.S.) 444701
No ratings yet
Prof. Ram Meghe Institute of Technology & Research Badnera, Amravati (M.S.) 444701
17 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
03-Task Decomposition and Mapping
No ratings yet
03-Task Decomposition and Mapping
62 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
Bop Adv Atp
No ratings yet
Bop Adv Atp
10 pages
PC 10 Esf PDF
No ratings yet
PC 10 Esf PDF
30 pages
163 Stat 2 - Module
No ratings yet
163 Stat 2 - Module
21 pages
CSC 580 - Chapter 3
No ratings yet
CSC 580 - Chapter 3
35 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
X. Decomposition and Orchestration: April 27, 2009
No ratings yet
X. Decomposition and Orchestration: April 27, 2009
30 pages
TRIVIUM AZ304 Assessment Template
No ratings yet
TRIVIUM AZ304 Assessment Template
13 pages
Palak Chaturvedi 2K20MC90 Assignment 1
No ratings yet
Palak Chaturvedi 2K20MC90 Assignment 1
10 pages
12 CS 2023-24 QP PB Central Cluster MS
No ratings yet
12 CS 2023-24 QP PB Central Cluster MS
8 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
1-3 Short Paper 5 Core Technologies
No ratings yet
1-3 Short Paper 5 Core Technologies
5 pages
CS Xii PB MS - Set1
No ratings yet
CS Xii PB MS - Set1
6 pages
HPC Notes Unit 3
No ratings yet
HPC Notes Unit 3
7 pages
NCA 6.5 Demo
No ratings yet
NCA 6.5 Demo
5 pages
Question Bank of Mobile Application Development Using Flutter
No ratings yet
Question Bank of Mobile Application Development Using Flutter
4 pages
BHUNAKSHA
No ratings yet
BHUNAKSHA
8 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
Chapter 14: Parallel Algorithms
No ratings yet
Chapter 14: Parallel Algorithms
23 pages
2.decomposition Done
No ratings yet
2.decomposition Done
4 pages
Algorithms For Parallel Machines
No ratings yet
Algorithms For Parallel Machines
7 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
Cse 7 Sem Data Warehousing and Data Mining Winter 2017
No ratings yet
Cse 7 Sem Data Warehousing and Data Mining Winter 2017
2 pages
Recon NG 5.x Cheat Sheet Sheet1 1
No ratings yet
Recon NG 5.x Cheat Sheet Sheet1 1
1 page
Jpa Iq
No ratings yet
Jpa Iq
1 page
E 03 H 1 Jobs
No ratings yet
E 03 H 1 Jobs
3 pages
Full Download Two Scoops of Django 3.x Daniel Feldroy PDF
100% (8)
Full Download Two Scoops of Django 3.x Daniel Feldroy PDF
62 pages

Unit - 2 HPC

Uploaded by

Unit - 2 HPC

Uploaded by

Unit- 2

Parallel Algorithm Design

MODEL = ``CIVIC'' AND YEAR = 2001 AND (COLOR

 Task-dependency graph vs. task-interaction graph

1.Mapping independent tasks to different processes.

2.Assigning tasks on critical path to processes as soon as they

3.Minimizing interaction between processes by mapping tasks

4 processes can be used in total since the max. concurrency is 4.

Size of Data Associated with Tasks:

• Read-Write interactions:- multiple tasks need to read

• Consider 1-D of dense matrix example

• Each entry of 𝐶 use the same amount of computation.

• Either 1D or 2D block distribution can be used

• Partitioning a given task-dependency graph across processes.

• Determining an optimal mapping for a general task-dependency

• Excellent heuristics exist for structured graphs.

• Consider the problem of multiplying a sparse matrix with a vector.

• The graph of the matrix is a useful indicator of the work (number of

• In this case, we would like to partition the graph so as to assign

• Dynamic mapping is necessary in situations where a static mapping may

• Dynamic mapping schemes can be

• This alleviates the bottleneck in centralized schemes.

• There are four critical questions:

Replicating data or computations.

Using group communications instead of point-to-point primitives.

Overlap interactions with other interactions.

You might also like