0% found this document useful (0 votes)
2 views13 pages

5 - ParallelAlgorithms

The document discusses parallel algorithms, specifically focusing on the problem of cutting a 100-inch sandwich for 8 people based on their individual preferences. It introduces the concept of prefix sums, explaining both inclusive and exclusive variants, and compares sequential and parallel implementations. Additionally, it highlights the efficiency improvements through a balanced binary tree approach and outlines various applications of prefix sums in parallel computing.

Uploaded by

humairabbas678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views13 pages

5 - ParallelAlgorithms

The document discusses parallel algorithms, specifically focusing on the problem of cutting a 100-inch sandwich for 8 people based on their individual preferences. It introduces the concept of prefix sums, explaining both inclusive and exclusive variants, and compares sequential and parallel implementations. Additionally, it highlights the efficiency improvements through a balanced binary tree approach and outlines various applications of prefix sums in parallel computing.

Uploaded by

humairabbas678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

CSC 334 – Parallel and Distributed Computing

Instructor: Ms. Muntha Amjad


Lecture# 05: Parallel Algorithms

1
Problem
• Assume we have a 100-inch sandwich to feed 8 people.

• We know how many inches each person wants.


• [6 4 16 10 16 14 2 8]

• How do we cut the sandwich quickly? How much will be left?

• Method 1: cut the sections sequentially: 6 inches first, 4 inches second, 16


inches third, etc.

• Method 2: calculate Prefix sum and cut in parallel


• [6, 10, 26, 36, 52, 66, 68, 76] (24 inches left) 2
Prefix Sum (Scan)
• Given an array X, compute output array Y as follows:
for

Using recursive definition

Y[0] = X[0];
• Implementation: for (i = 1; i < N; i++)
Y[i] = Y[i-1] + X[i];

• The above is an inclusive prefix sum since includes X.


• Fo an exclusive prefix sum perform the summation for . Y[i] excludes X[i], i.e.,
sum from index 0 to i - 1.
3

• It is easy to see that prefix sums can be computed sequentially in O(N) time
Naïve Inclusive Parallel Prefix Sum
• Assign one thread to calculate each element that adds up all X elements
needed for that Y element.

• Implementation:

#pragma omp parallel for


for (i = 0; i < N; i++){
Y[i] = 0;
for (j = 0; j <= i; j++)
Y[i] += X[j];
}
4
Naïve Inclusive Parallel Prefix Sum
10 26 36 52 66 68 76
6 4
10 16 26 10 36 16 52 14 66 2 68 8
6 4 52 14 66 2
10 16 26 10 36 16
6 4 36 16 52 14
10 16 26 10
6 4 26 10 36 16
10 16
6 4 10 26 10
16
6 4
10 16
6 4
5
Naïve Inclusive Parallel Prefix Sum
Observations:

6
Naïve Inclusive Parallel Prefix Sum
Best-Case Parallel Time:

The time is still O(N) — same as the sequential version! You used lots of processors, but got no
speedup. Parallel programming is easy as long as you don’t care about performance.
Parallel Prefix Sum
• How to improve efficiency?
• Build a balanced binary tree on the input data and sweep it to and from the root
• Traverse up from leaves to root building partial sums at internal nodes in the tree
• Root holds sum of all leaves
• Traverse back down the tree building the scan from the partial sums

8
Parallel Prefix Sum: Upward Sweep (Bottom-up)
76
for (i = 0; i < n - 1; i++)
Y[i] = X[i]; 36 40
for (d = 1; d < n; d *= 2) {
#pragma omp parallel for 10 26 30 10
for (int i = 0; i < n; i += 2 * d) {
Y[i + 2 * d - 1] += Y[i + d - 1];
} 6 4 16 10 16 14 2 8
}

Step 6 4 16 10 16 14 2 8
d =1 6 10 16 26 16 30 2 10
d=2 6 10 16 36 16 30 2 40
9
d=4 6 10 16 36 16 30 2 76
Parallel Prefix Sum: Downward Sweep (Top-down)
0
Y[n-1] = 0;
for (d = n / 2; d >= 1; d /= 2) {
#pragma omp parallel for 0 36
for (int i = 0; i < n; i += 2 * d) {
int left = i + d - 1;
int right = i + 2 * d - 1; 0 10 36 66
temp = Y[left];
Y[left] Y[right];
Y[right] temp; 0 6 10 26 36 52 66 68
}
}

Step 6 10 16 36 16 30 2 0
d=4 6 10 16 0 16 30 2 36
d=2 6 0 16 10 16 36 2 66 10

d=1 0 6 10 26 36 52 66 68
Parallel Prefix Sum: Inclusive
for (i = 0; i < n; i++) { X 6 4 16 10 16 14 2 8
Y[i] += X[i];
}
Y 0 6 10 26 36 52 66 68
Updated Y 6 10 26 36 52 66 68 76

11
Parallel Prefix Sum: Inclusive
Observations:

12
Applications of Prefix Sum
• Parallel Prefix Sum has several applications that go way beyond computing the
• sum of array elements. A key primitive in many parallel algorithms to convert
serial computation into parallel computation
• Radix Sort
• String Comparison
• Lexical Analysis
• Stream Compaction
• Polynomial Evaluation
• Solving Recurrences
• Tree Operations
• Histograms
• Assigning space in farmers market
13
• Allocating memory to parallel threads
• Allocating memory buffer for communication channels
• Frequently used for parallel work assignment and resource allocation

You might also like