ConcurrencyDecomposition Parallel Algorithm
ConcurrencyDecomposition Parallel Algorithm
2
Topics for Today
3
Decomposing Work for Parallel Execution
T11
T2 T12
T5 T9 T15
T1 T3 T7 T13 T17
T6 T10 T16
T4 T8 T14
4
Example: Dense Matrix-Vector Product
1 2
A n
b y
Task 1
2
6
Example: Database Query Processing
• Task: compute set of elements that satisfy a predicate
— task result = table of entries that satisfy the predicate
• Edge: output of one task serves as input to the next
MODEL = "CIVIC" AND YEAR = 2001 AND
(COLOR = "GREEN" OR COLOR = "WHITE")
7
Example: Database Query Processing
• Alternate task decomposition for query
MODEL = "CIVIC" AND YEAR = 2001 AND
(COLOR = "GREEN" OR COLOR = "WHITE")
9
Degree of Concurrency
10
Critical Path
11
Critical Path Length
n
Questions:
What is the maximum number of tasks possible?
What does a task dependency graph look like for this case?
What is the shortest parallel execution time for the graph?
How many processors are needed to achieve the minimum time?
What is the maximum degree of concurrency?
What is the average parallelism?
13
Limits on Parallel Performance
14
Amdahl’s Law
• A hard limit on the speedup that can be obtained using
multiple CPUs
• Two expressions of Amdahl’s law
—execution time on N CPUs
—speedup on N processors
https://en.wikipedia.org/wiki/Amdahl's_law
15
Amdahl’s Law
https://en.wikipedia.org/wiki/Amdahl's_law
16
Task Interaction Graphs
17
Task Interaction Graph Example
Sparse matrix-vector multiplication
• Computation of each result element = independent task
• Only non-zero elements of sparse matrix A participate
• If, b is partitioned among tasks …
— structure of the task interaction graph = graph of the matrix A
(i.e. the graph for which A represents the adjacency structure)
18
Interaction Graphs, Granularity, & Communication
• Assumptions:
— each node takes unit time to process
— each interaction (edge) causes an overhead of a unit time
• Generally
—# of tasks > # threads available
—parallel algorithm must map tasks to threads
• Why threads rather than CPU cores?
—aggregate tasks into threads
– thread = processing or computing agent that performs work
– assign collection of tasks and associated data to a thread
—operating system maps threads to physical cores
– operating systems often enable one to bind a thread to a core
– for multithreaded cores, the OS can bind multiple software threads
to distinct hardware threads associated with a core
20
Tasks, Threads, and Mapping
21
Tasks, Threads, and Mapping
22
Tasks, Threads, and Mapping Example
24
Decomposition Techniques
25
Recursive Decomposition
Suitable for problems solvable using divide-and-conquer
Steps
1. decompose a problem into a set of sub-problems
2. recursively decompose each sub-problem
3. stop decomposition when minimum desired granularity reached
26
Recursive Decomposition for Quicksort
Sort a vector v:
1. Select a pivot
2. Partition v around pivot into vleft and vright
3. In parallel, sort vleft and sort vright
27
Recursive Decomposition for Min
• Steps
1. identify the data on which computations are performed
2. partition the data across various tasks
– partitioning induces a decomposition of the problem
29
Decomposition Based on Input Data
30
Example: Decomposition Based on Input Data
Count the frequency of item sets in database transactions
— sum local count vectors for item sets to produce total count vector
31
Decomposition Based on Output Data
12
A n
b y
Task 1
2
Example:
dense matrix-vector
multiply
n
32
Output Data Decomposition: Example
• Matrix multiplication: C = A x B
• Computation of C can be partitioned into four tasks
Task 1:
Task 2:
Task 3:
Task 4:
35
Intermediate Data Partitioning
36
Example: Intermediate Data Partitioning
Dense Matrix Multiply
Decomposition of intermediate data: yields 8 + 4 tasks
Stage I
D1,1,1 D1,1,2
D1,2,1 D1,2,2
D2,1,1 D2,1,2
D2,2,1 D2,2,2
Stage II
Task 01: D1,1,1= A1,1 B1,1 Task 02: D2,1,1= A1,2 B2,1
Task 03: D1,1,2= A1,1 B1,2 Task 04: D2,1,2= A1,2 B2,2
Task 05: D1,2,1= A2,1 B1,1 Task 06: D2,2,1= A2,2 B2,1
Task 07: D1,2,2= A2,1 B1,2 Task 08: D2,2,2= A2,2 B2,2
Task 09: C1,1 = D1,1,1 + D2,1,1 Task 10: C1,2 = D1,1,2 + D2,1,2
Task 11: C2,1 = D1,2,1 + D2,2,1 Task 12: C2,,2 = D1,2,2 + D2,2,2 37
Intermediate Data Partitioning: Example
Task 01: D1,1,1= A1,1 B1,1 Task 02: D2,1,1= A1,2 B2,1
Task 03: D1,1,2= A1,1 B1,2 Task 04: D2,1,2= A1,2 B2,2
Task 05: D1,2,1= A2,1 B1,1 Task 06: D2,2,1= A2,2 B2,1
Task 07: D1,2,2= A2,1 B1,2 Task 08: D2,2,2= A2,2 B2,2
Task 09: C1,1 = D1,1,1 + D2,1,1 Task 10: C1,2 = D1,1,2 + D2,1,2
Task 11: C2,1 = D1,2,1 + D2,2,1 Task 12: C2,,2 = D1,2,2 + D2,2,2
Task dependency graph
38
Owner Computes Rule
39
Topics for Next Class
40
References
41