0% found this document useful (0 votes)

50 views35 pages

Principles of Parallel Algorithm Design

The document discusses principles of parallel algorithm design, including task decomposition, task dependency graphs, granularity, and concurrency. It outlines steps in parallel algorithm design such as identification, mapping, data partitioning, and defining access protocols, along with examples like the IBM Deep Blue chess program. Additionally, it covers concepts like maximum and average degree of concurrency, critical paths, and the distinction between processes and processors.

Uploaded by

Sameer Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views35 pages

Principles of Parallel Algorithm Design

Uploaded by

Sameer Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CS 3006

Parallel and Distributed Computing

Lecture 6
Danyal Farhat
FAST School of Computing
NUCES Lahore
Principles of Parallel Algorithm Design,
Task Dependency Graphs, Granularity
and Concurrency, and Task Interaction
Graphs
Outline
• Parallel Algorithm Design Life Cycle
• Parallel Computing Example
• Task Decomposition
• Task Dependency Graph
• Granularity
Fine Grained
Coarse Grained
Outline (Cont.)
• Concurrency
Maximum Degree of Concurrency
Critical Path and Critical Path Length
Average Degree of Concurrency
• Task Interaction Graphs
• Processes and Mapping
• Summary
• Additional Resources

Introduction: 1-4
Steps in Parallel Algorithm Design
• Identification: Identifying portions of the work that can be
performed concurrently.
Work-units are also known as tasks
E.g., Initializing two mega-arrays are two tasks and can be performed in
parallel

• Mapping: The process of mapping concurrent pieces of the work or

tasks onto multiple processes running in parallel.
Goal: balance load; maximize data locality
Approach: static vs. dynamic task assignment
Process is logical agent for computation over a physical processing element
(processor).
Steps in Parallel Algorithm Design (Cont.)
• Data Partitioning: Distributing the input, output, and intermediate
data associated with the program.
One way is to copy whole data at each processing node
 Memory challenges for huge-size problems
Other way is to give fragments of data to each processing node
 Communication overheads

• Defining Access Protocol: Managing accesses to data shared by

multiple processors
Access protocol that manage communication and synchronization
Parallel Computing Example
Chess Player
• A parallel program to play chess might look at all the possible first
moves it could make
• Each different first move could be explored by a different
processor, to see how the game would continue from that point
• Results have to be combined to figure out which is the best first
move
• Famous IBM Deep Blue machine that beat Kasparov
Brute force computing power
Massively parallel with 30 nodes, with each node containing a
120 MHz P2SC microprocessor
Task Decomposition
Decomposition
• “The process of dividing a computation into smaller parts, some or all of
which may potentially be executed in parallel.”
Tasks
• Programmer-defined units of computation into which the main
computation is subdivided by means of decomposition
• Tasks can be of arbitrary size, but once defined, they are regarded as
indivisible units of computation
• Simultaneous execution of multiple tasks is the key to reducing the time
required to solve the entire problem
Multiplication of a Dense Matrix with a Vector

• Problem can be decomposed into n tasks

• Computation of each element of vector y is independent of
other elements
• No control dependencies so no task-dependency graph
Vector Multiplication n x 1
• So the sequential multiplication program like:
for (row = 0; row < n; row++)
y[row] = dot_product( get_row(A, row), get_col(b));

• Can be transformed to parallel program:

for (row = 0; row < n; row++)
y[row]= create_thread ( dot_product(get_row(A, row), get_col(b)));
Matrix Multiplication n x n
Sequential:
for (row = 0; row < n; row++)
for (column = 0; column < n; column++)
c[row][column] = dot_product( get_row(a, row), get_col(b, col));

Multithreaded:
for (row = 0; row < n; row++)
for (column = 0; column < n; column++)
c[row][column] = create_thread( dot_product(get_row(a, row),
get_col(b, col)));
Task Dependency Graph
• The tasks in the previous examples are independent and
can be performed in any sequence.

• In most of the problems, there exist some sort of

dependencies between the tasks.

• An abstraction used to express such dependencies among

tasks and their relative order of execution is known as a
task-dependency graph.
Task Dependency Graph (Cont.)
• “It is a directed acyclic graph in which node are tasks and
the directed edges indicate the dependencies between
them”

• The task corresponding to a node can be executed when all

tasks connected to this node by incoming edges have
completed
Some tasks may use data produced by other tasks and thus may need to
wait for these tasks to finish execution
Example of Task Dependence

Execution of the query:

MODEL = “CIVIC” AND YEAR = 2001 AND
(COLOR = “GREEN” OR COLOR = “WHITE”)
Example of Task Dependence
Granularity
• The number and sizes of tasks into which a problem is
decomposed determines the granularity of the decomposition
Granularity: roughness (means consisting of small grains or particles)
A decomposition into a large number of small tasks is called fine-grained
A decomposition into a small number of large tasks is called coarse-grained

• For matrix-vector multiplication, the decomposition would

usually be considered fine-grained, although coarse-grained
could also be an option
Granularity (Cont.)
• Below figure shows a coarse-grained decomposition as each
tasks computes n/3 of the entries of the output vector of
length n
Maximum Degree of Concurrency
• “The maximum number of tasks that can be executed simultaneously in
a parallel program at any given time is known as its maximum degree of
concurrency.”

• Usually, it is always less than total number of tasks due to dependencies.

• E.g., max-degree of concurrency in the task-graphs of Figure 3.3 is 4.

• Rule of thumb: For task-dependency graphs that are trees, the

maximum degree of concurrency is always equal to the number of
leaves in the tree
Maximum Degree of Concurrency (Cont.)
• Determine Maximum Degree of Concurrency?
Average Degree of Concurrency
• Relatively better measure for performance of parallel program

• “The average number of tasks that can run concurrently over

the entire duration of execution of the program”

The ratio of the total amount of work to the critical-path

length
Total amount of work - Weight of all the nodes / Tasks
Weight of all the nodes is the size or the amount of work associated with
the corresponding task
what is the critical path in a graph?
Critical Path and Critical Path Length
• Critical Path: The longest directed path between any pair of
start and finish nodes is known as the critical path.
• Critical Path Length: The sum of the weights of nodes along the
critical path

• Shorter critical path favors a higher average degree of

concurrency
• Both, maximum and average degree of concurrency increases
as tasks become smaller(finer)
Exercise – Task Dependence Graph

• Maximum Degree of concurrency: ?

• Critical path lengths: ?
• Total amount of work: ?
• Average degree of concurrency: ?
Task Interaction Graph
• Depicts pattern of interaction between the tasks
• Dependency graphs show that how output of first task
becomes input to the next level task
• But how the tasks interact with each other to access
distributed data is depicted by task interaction graphs
• The nodes in a task-interaction graph represent tasks
• The edges connect tasks that interact with each other
• Example: Dense matrix-vector multiplication
Task Interaction Graph (Cont.)
• The edges in a task interaction graph are usually undirected
But directed edges can be used to indicate the direction of flow of
data, if it is unidirectional

• The edge-set of a task-interaction graph is a superset of the

edge-set of the task-dependency graph
Tasks 1, 2, 3, 4, 5, and 6 interact each other
Tasks 4 is dependent on the result of task 2

• In database query processing example, the task-interaction

graph is the same as the task-dependency graph.
Task Interaction Graph (Cont.)
Processes and Mapping
• Logical processing or computing agent that performs tasks is
called process
• The mechanism by which tasks are assigned to processers for
execution is called mapping
• Multiple tasks can be mapped on a single processor
• Independent task should be mapped onto different processors
• Map tasks with high mutual-interactions onto a single
processor
• A parallel program must have several processors active and
simultaneously working on different tasks to gain a significant
speedup over the sequential program
Processes and Mapping (Cont.)
Processes vs Processors
• Processes are logical computing agents that perform tasks

• Processors are the hardware units that physically perform

computations

• Depending on the problem, multiple processes can be mapped

on a single processor

• But, in most of the cases, there is one-to-one correspondence

between processors and processes
Summary
• Steps in Parallel Algorithm Design
Identification - Identification of parallel portion in the program
Mapping - Mapping concurrent tasks onto multiple processers
Data Partitioning - Distribution of input, output, and intermediate data
associated with the program
Defining Access Protocol - Managing accesses to data shared by multiple
processors
• Parallel Computing Example
IBM Deep Blue machine beats Chess World Champion Kasparov using 30
nodes of 120 MHZ microprocessors
• Task Decomposition
Dividing of a process into multiple tasks that may be arbitrary sizes
Summary (Cont.)
• Multiplication of a Dense Matrix with a Vector
Vector Multiplication n x 1
Matrix Multiplication n x n
 Independent computation, no task dependency graph

• Task Dependency Graph

An abstraction used to express dependencies among tasks and their relative
order of execution
A directed acyclic graph in which node are tasks and the directed edges
indicate the dependencies between them
• Example of Task Dependence
Execution of a database query
Summary (Cont.)
• Granularity
Number and sizes of tasks into which a problem is decomposed determines
the granularity of the decomposition
 A decomposition into a large number of small tasks is called fine-grained
 A decomposition into a small number of large tasks is called coarse-grained

• Maximum and Average Degree of Concurrency

Maximum number of tasks that can be executed simultaneously in a parallel
program at any given time is known as its maximum degree of concurrency
Average number of tasks that can run concurrently over the entire duration
of execution of the program
Summary (Cont.)
• Critical Path and Critical Path Length
Critical Path: The longest directed path between any pair of start and finish
nodes.
Critical Path Length: The sum of the weights of nodes along the critical path

• Task Interaction Graph

Dependency graphs show that how output of first task becomes input to
the next level task
But how the tasks interact with each other to access distributed data is
depicted by task interaction graphs
Summary (Cont.)
• Processes and Mapping
Logical processing or computing agent that performs tasks is called process
The mechanism by which tasks are assigned to processers for execution is
called mapping

• Processes vs Processors
Processes are logical computing agents that perform tasks
Processors are the hardware units that physically perform computations
Additional Resources
• Introduction to Parallel Computing by Ananth Grama and
Anshul Gupta

Chapter 3. Principles of Parallel Algorithm Design

 Section 3.1: Preliminaries

Questions?

Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
28 pages
Parallel Algorithm Design Basics
No ratings yet
Parallel Algorithm Design Basics
63 pages
Parallel Algorithms & Concurrency
No ratings yet
Parallel Algorithms & Concurrency
84 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Parallel Algorithms and Task Decomposition
No ratings yet
Parallel Algorithms and Task Decomposition
89 pages
Parallel Algorithm Design Basics
No ratings yet
Parallel Algorithm Design Basics
78 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
HPC - Unit-2 Insem Notes
No ratings yet
HPC - Unit-2 Insem Notes
99 pages
Decomposition Techniques in Parallel Computing
No ratings yet
Decomposition Techniques in Parallel Computing
43 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
72 pages
Unit 2 HPC - Nap
No ratings yet
Unit 2 HPC - Nap
72 pages
Parallel Algorithm Design Guide
No ratings yet
Parallel Algorithm Design Guide
107 pages
Unit 2
No ratings yet
Unit 2
64 pages
Padp Unit 4up
No ratings yet
Padp Unit 4up
147 pages
Module - 3 Parallel Algorithm Design - Preliminaries
No ratings yet
Module - 3 Parallel Algorithm Design - Preliminaries
12 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
Unit 2
No ratings yet
Unit 2
151 pages
AA Part1
No ratings yet
AA Part1
43 pages
Unit 2 - Part - 1
No ratings yet
Unit 2 - Part - 1
32 pages
Parallel Algorithm Design Guide
No ratings yet
Parallel Algorithm Design Guide
35 pages
5-Parallel Algorithm Design Life Cycle
No ratings yet
5-Parallel Algorithm Design Life Cycle
25 pages
Unit 2 HPC
No ratings yet
Unit 2 HPC
92 pages
Unit 2
No ratings yet
Unit 2
81 pages
Bert 1 Parallel Algorithmic Concepts
No ratings yet
Bert 1 Parallel Algorithmic Concepts
95 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
52 pages
Chap3 Slides Week4
No ratings yet
Chap3 Slides Week4
42 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
PDC Unit-2
No ratings yet
PDC Unit-2
48 pages
Unit - 2 HPC
No ratings yet
Unit - 2 HPC
96 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
Layers of Implementing An Application in Software or Hardware Using Parallel Computers
No ratings yet
Layers of Implementing An Application in Software or Hardware Using Parallel Computers
46 pages
PDC (Steps in Parallel Algorithm Design)
No ratings yet
PDC (Steps in Parallel Algorithm Design)
82 pages
Partitioning
No ratings yet
Partitioning
37 pages
Parallel Computing Unit 3 - Principles of Parallel Computing Design
No ratings yet
Parallel Computing Unit 3 - Principles of Parallel Computing Design
78 pages
Task Decomposition Techniques in Computing
No ratings yet
Task Decomposition Techniques in Computing
62 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
Parallel Algorithms Lecture Notes
No ratings yet
Parallel Algorithms Lecture Notes
37 pages
LECTURE 4 - Parallel Computing Design (PART 1)
No ratings yet
LECTURE 4 - Parallel Computing Design (PART 1)
47 pages
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
No ratings yet
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
47 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
No ratings yet
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
31 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-02-07 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-02-07 Reference-Material-I
35 pages
Module 3 - Principles of Parallel Algorithm Design
No ratings yet
Module 3 - Principles of Parallel Algorithm Design
39 pages
Introduction to Parallel Algorithms
No ratings yet
Introduction to Parallel Algorithms
36 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
28 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
3.1.3 Processes and Mapping (1/5)
No ratings yet
3.1.3 Processes and Mapping (1/5)
74 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
Processes and Mapping, Decomposition Techniques
No ratings yet
Processes and Mapping, Decomposition Techniques
28 pages
Grade 11 Winter Project Business Studies 2025
No ratings yet
Grade 11 Winter Project Business Studies 2025
6 pages
Cisco Industrial-Security-Cvd-So
No ratings yet
Cisco Industrial-Security-Cvd-So
8 pages
Group Project Proposal
No ratings yet
Group Project Proposal
11 pages
Test Bank For Calculus, 11th Edition, Ron Larson Bruce H. Edwards
100% (1)
Test Bank For Calculus, 11th Edition, Ron Larson Bruce H. Edwards
48 pages
Awfp (BCPC)
100% (1)
Awfp (BCPC)
2 pages
Audit Exam Hanan
No ratings yet
Audit Exam Hanan
1 page
ASEAN Project Grade12 12pages
No ratings yet
ASEAN Project Grade12 12pages
2 pages
EU Recall Process Guidance Document
No ratings yet
EU Recall Process Guidance Document
35 pages
Aerospace Engineer's Journey
No ratings yet
Aerospace Engineer's Journey
1 page
Double Tree Suites - Invoice
No ratings yet
Double Tree Suites - Invoice
1 page
Design and Fabrication of Fertiliser Spreading Machine
No ratings yet
Design and Fabrication of Fertiliser Spreading Machine
8 pages
The Collaborative Browser Based IDE - Replit
No ratings yet
The Collaborative Browser Based IDE - Replit
1 page
Educ 202 Slides Template
No ratings yet
Educ 202 Slides Template
6 pages
2023 Equipment Checklists
No ratings yet
2023 Equipment Checklists
24 pages
4562 140876 8110833275 PDF
No ratings yet
4562 140876 8110833275 PDF
3 pages
College Fair Letter and Flyer 2024 2
No ratings yet
College Fair Letter and Flyer 2024 2
2 pages
Front Matter 41765359
No ratings yet
Front Matter 41765359
7 pages
SkitreLABS: AI-Driven Soft Skills Training
No ratings yet
SkitreLABS: AI-Driven Soft Skills Training
15 pages
C90 GTX PTM
100% (3)
C90 GTX PTM
397 pages
ATU International Letter To Rep. John Delaney
No ratings yet
ATU International Letter To Rep. John Delaney
2 pages
NDT - Objective
No ratings yet
NDT - Objective
5 pages
Piston Damage Recognising and Rectifying - 51730 PDF
100% (3)
Piston Damage Recognising and Rectifying - 51730 PDF
92 pages
Fast Food Growth in Pakistan
No ratings yet
Fast Food Growth in Pakistan
3 pages
Applied Economics Q1 Lesson 3
No ratings yet
Applied Economics Q1 Lesson 3
29 pages
Casebook On Environmental Law PDF
No ratings yet
Casebook On Environmental Law PDF
386 pages
IG Commerce - Mark Scheme 1R (Jun 2025)
No ratings yet
IG Commerce - Mark Scheme 1R (Jun 2025)
22 pages
Toll Rate Chart for Vehicles
No ratings yet
Toll Rate Chart for Vehicles
1 page
Importing and Manipulate Data in Power BI
No ratings yet
Importing and Manipulate Data in Power BI
24 pages
03 Inspecting Tightening Torque
No ratings yet
03 Inspecting Tightening Torque
4 pages
Basohli Bridge: Design & Construction
No ratings yet
Basohli Bridge: Design & Construction
7 pages

Principles of Parallel Algorithm Design

Uploaded by

Principles of Parallel Algorithm Design

Uploaded by

CS 3006

Parallel and Distributed Computing

• Mapping: The process of mapping concurrent pieces of the work or

• Defining Access Protocol: Managing accesses to data shared by

• Problem can be decomposed into n tasks

• Can be transformed to parallel program:

• In most of the problems, there exist some sort of

• An abstraction used to express such dependencies among

• The task corresponding to a node can be executed when all

Execution of the query:

• For matrix-vector multiplication, the decomposition would

• Usually, it is always less than total number of tasks due to dependencies.

• E.g., max-degree of concurrency in the task-graphs of Figure 3.3 is 4.

• Rule of thumb: For task-dependency graphs that are trees, the

• “The average number of tasks that can run concurrently over

The ratio of the total amount of work to the critical-path

• Shorter critical path favors a higher average degree of

• Maximum Degree of concurrency: ?

• The edge-set of a task-interaction graph is a superset of the

• In database query processing example, the task-interaction

• Processors are the hardware units that physically perform

• Depending on the problem, multiple processes can be mapped

• But, in most of the cases, there is one-to-one correspondence

• Task Dependency Graph

• Maximum and Average Degree of Concurrency

• Task Interaction Graph

Chapter 3. Principles of Parallel Algorithm Design

 Section 3.1: Preliminaries

You might also like