0% found this document useful (0 votes)
24 views

Module - 3 Parallel Algorithm Design - Preliminaries

This document discusses key concepts in parallel algorithm design such as: 1. Identifying portions of work that can be performed concurrently and mapping them to multiple processes. 2. Distributing a program's input, output, and intermediate data across processes. 3. Managing access to shared data to avoid conflicts and synchronizing processes.

Uploaded by

Bantu Aadhf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Module - 3 Parallel Algorithm Design - Preliminaries

This document discusses key concepts in parallel algorithm design such as: 1. Identifying portions of work that can be performed concurrently and mapping them to multiple processes. 2. Distributing a program's input, output, and intermediate data across processes. 3. Managing access to shared data to avoid conflicts and synchronizing processes.

Uploaded by

Bantu Aadhf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Module – 3

Parallel Algorithm Design –


Preliminaries

1
Constructing a Parallel Algorithm

• identify portions of work that can be performed


concurrently
• map concurrent portions of work onto multiple
processes running in parallel
• distribute a program’s input, output,
and intermediate data
• manage accesses to shared data: avoid
conflicts
• synchronize the processes at stages of the
parallel program execution
2
Task Decomposition and Dependency Graphs

Decomposition: divide a computation into smaller


parts, which can be executed concurrently
Task: programmer-defined units of computation.
Task-dependency graph:
Node represent s task.
Edge represents control
dependence.

3
Example 1: Dense Matrix-Vector Multiplication

• Computing y[i] only use ith row of A and b – treat


computing y[i] as a task.
• Remark:
– Task size is uniform
– No dependence between tasks
– All tasks need b
4
Example 2: Database Query Processing

• Executing the query:


Model =“civic” AND Year = “2001” AND (Color = “green”
OR Color = “white”)
on the following database:

5
• Task: create sets of elements that satisfy a (or several)
criteria.
• Edge: output of one task serves as input to the next

6
• An alternate task-dependency graph for query

• Different task decomposition leads to different


parallelism 7
Granularity of Task Decomposition

• Fine-grained decomposition: large number of


small tasks
• Coarse-grained decomposition: small number of
large tasks
Matrix-vector multiplication example
-- coarse-grain: each task computes 3 elements of y[]

8
Degree of Concurrency

• Degree of Concurrency: # of tasks that can


execute in parallel
-- maximum degree of concurrency: largest # of
concurrent tasks at any point of the execution
-- average degree of concurrency: average # of tasks
that can be executed concurrently
• Degree of Concurrency vs. Task Granularity
– Inverse relation

9
Critical Path of Task Graph

• Critical path: The longest directed path


between any pair of start node (node with no
incoming edge) and finish node (node with on
outgoing edges).
• Critical path length: The sum of weights of
nodes along critical path.
• Average degree of concurrency = total
amount of work / critical path length

10
Example: Critical Path Length

Task-dependency graphs of query processing operation

Left graph:
Critical path length = 27
Average degree of concurrency = 63/27 = 2.33
Right graph:
Critical path length = 34
Average degree of concurrency = 64/34 = 1.88
11
Limits on Parallelization

• Facts bounds on parallel execution


– Maximum task granularity is finite
• Matrix-vector multiplication O(n2)
– Communication between tasks
• Speedup = sequential execution time/parallel
execution time
• Parallel efficiency = sequential execution
time/(parallel execution time × processors used)

12

You might also like