0% found this document useful (0 votes)

48 views23 pages

Pap 3 Shared Memory Algos

The document discusses parallel algorithms and the PRAM model of computation. It describes parallel algorithms for problems like list ranking, pointer jumping, and parallel prefix sums. It also analyzes the performance of parallel prefix sum algorithms and different PRAM models.

Uploaded by

bivakarmahapatra7872

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views23 pages

Pap 3 Shared Memory Algos

Uploaded by

bivakarmahapatra7872

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Parallel Algorithms and

Programming

Parallel algorithms in shared

memory
Thomas Ropars

Email: [email protected]

Website: tropars.github.io

1
References
The content of this lecture is inspired by:
Parallel algorithms (Chapter 1) by H. Casanova, Y. Robert, A. Legrand.
A survey of parallel algorithms for shared-memory machines by R.
Karp, V. Ramachandran.
Parallel Algorithms by G. Blelloch and B. Maggs.
Data Parallel Thinking by K. Fatahalian

2
Outline
The PRAM model

Some shared-memory algorithms

Analysis of PRAM models

3
Need for a model
A parallel algorithm
De nes multiple operations to be executed in each step
Includes communication/coordination between the processing units

The problem
A wide variety of parallel architectures
Di erent number of processing units
Multiple network topologies

How to reason about parallel algorithms?

How to avoid designing algorithms that would work only for one
architecture?

A model can be used to abstract away some of the complexity

Should still capture enough details to predict with a reasonable
accuracy how the algorithm will perform
4
A model for shared memory
computation
The PRAM model

Parallel RAM
A shared central memory
A set of processing units (PUs)
Any PU can access any memory location in one unit of time
The number of PUs and the size of the memory is unbounded

5
Details about the PRAM model
Lock-step execution
A 3-phase cycle:
1. Read memory cells
2. Run local computations
3. Write to the shared memory
All PUs execute these steps synchronously
No need for explicit synchronization

About concurrent accesses to memory: 3 PRAM models

CREW: Concurrent Read, Exclusive Write
CRCW: Concurrent Read, Concurrent Write
Semantic of concurrent writes?
EREW: Exclusive Read, Exclusive Write

6
About the CRCW model
Semantic of concurrent writes:
Arbitrary mode : Select one value from the concurrent writes
Priority mode : Select the value of the PU with the lowest index
Fusion mode : A commutative and associative operation is applied to the
values (logical OR, AND, sum, maximum, etc.)

How powerful are the di erent models:

C RC W > C RE W > E RE W

A model is more powerful if there is one problem for which this model
allows implementing a strictly faster solution with the same number of PUs

7
Some shared-memory
algorithms

8
List ranking
Description of the problem
A linked list of n objects
Doubly-linked list
We want to compute the distance of each element to the end of the list

The sequential solution

Iterate through the list from the end to the beginning
Assign each element a distance from the last element while iterating
This solution has a complexity (execution time) in O(n)

Can we do better with a parallel algorithm?

9
List ranking

A solution based on pointer jumping

# the list is stored in array next

# the distances are stored in array *d*
Ranking()
forall i in parallel: # initialization
if next[i] is None:
d[i] = 0
else:
d[i] = 1
while there exists a node i such that next[i] != None:
forall i in parallel do:
if next[i] != None:
d[i] = d[i] + d[next[i]]
next[i] = next[next[i]] # pointer jumping

This solution has an execution time in O(log n)

Note that the solution requires n PUs

We note that the parallel version requires more work than the sequential
version of the algorithm

Credit: Parallel algorithms, Casanova, Robert, Legrand.

10
Comments on the previous algorithm
Implementing pointer jumping

forall i in parallel:
next[i] = next[next[i]]

In practice, if all processors do not execute synchronously,

next[next[i]] may be overwritten by another PU before it is read here.
To make the algorithm safe in practice, we would have to implement:

forall i in parallel:
temp[i] = next[next[i]]
forall i in parallel:
next[i] = temp[i]

11
Comments on the previous algorithm
About the termination test
Note that the test in the while loop can be done in constant time only
in the CRCW model
The problem is about having all PUs sharing the result of their local test
(next[i] != None)
In a CW model, all PUs can write to the same variable and a fusion
operation can be used
In a EW model, the results of the tests can only aggregated two-by-two
leading to a solution with a complexity in O(log n) for this operation

12
Point to root
Description of the problem
A tree data structure
Each node should get a pointer to the root

Use of pointer jumping

PointToRoot(P):
for k in 1..ceiling(log(sizeof(P))):
forall i in parallel:
P[i] = P[P[i]]

We assume that we know sizeof(P)

13
Divide and conquer
Split the problems into sub-problems that can be solved independently
Merge the solutions

Example: Mergesort

Mergesort(A):
if sizeof(A) is 1:
return A
else:
Do in parallel:
L = Mergesort(A[0 .. sizeof(A)/2])
R = Mergesort(A[sizeof(A)/2 .. sizeof(A)])
Merge(L,R)

It is usually important to parallelize the divide and the merge step:

In the algorithm above, the merge step is going to be the bottleneck

14
Analysis of PRAM models

15
Comparison of PRAM models
CRCW vs CREW
To compare CRCW and CREW, we consider a reduce operation over n
elements with an associative operation.
Example: the sum of n elements

With CRCW: O(1) steps

With CREW: O(log n) steps

16
Comparison of PRAM models
CREW vs EREW
To compare CREW and EREW, we consider the problem of determining
whether an element e belongs to a set (e1 , . . . en ) .
Solution with CREW:
A boolean res is initialized to false and n PUs are used
PU k runs the test (ek == e )
If one PU nds e, it sets res to true
Solution with EREW:
Same algorithm except e cannot be read simultaneously by multiple
PUs
n copies of e should be created (broadcast)

With CREW: O(1) steps

With EREW: O(log n) steps

17
Limits of the PRAM model
Unrealistic memory model
Constant time access for all memory location

Synchronous execution
Removes some exibility

Unlimited amount of resources

Might not allow devising an algorithm that works well on a real
system

18
Study of Parallel scans

19
Scans (Pre x sums)
Description of the problem
Inputs:
A sequence of elements x1 , x2 . . . xn
A associative operation *
Output:
A sequence of elements y1 , y2 . . . yn such that yk = x1 ∗ x2 . . . ∗xk

Solution applying the pointer jumping technique

Scan(L):
forall i in parallel: # initialization
y[i] = x[i]

for k in 1..ceiling(log(sizeof(L))):
forall i in parallel:
if next[i] != None:
y[next[i]] = y[i] * y[next[i]]
next[i] = next[next[i]]

20
Scans (Pre x sums)
Performance of this algorithm
Work:

W (n) = O(n) × log(n)

Depth:

D(n) = log(n)

If we do not have n processing units in practice, the large value of n

can be an issue for performance

For instance, what would be a good algorithm on two processing

units?

21
Parallel scan with 2 processing units
Solution

Scan(L):
# input: x; output: y
# first phase
half = sizeof(L)/2
for i in 0..1 in parallel
SequentialScan(x[half*i .. half*(i+1)-1])

# second phase
base = y[half]
quarter = half / 2
for i in 0..1 in parallel
add base to elems in y[half+quarter*i .. half+quarter*(i+1)-1]

Performance of this algorithm

Work: W(n) = O(n)
Depth: D(n) = O(n)
It will perform better in practice due to the reduced amount of work
Improves the locality of the data accesses (good for prefetchers)
Credit: Lecture -- Data parallel thinking, Fatahalian.
22
Performance comparison
Assumptions for the computation
Read 2 elements, compute the sum, and write back the result in 1 step
Array of 1000 elements

Execution time as a function of the number of PUs

The algorithm with a larger depth and less work per iteration
performs better up to 16 PUs 23

Boym RussianSoulPostCommunist 1995
No ratings yet
Boym RussianSoulPostCommunist 1995
35 pages
Overhead Lines Chapter 4 PDF
No ratings yet
Overhead Lines Chapter 4 PDF
102 pages
Case Study
33% (3)
Case Study
4 pages
Algorithms 20
No ratings yet
Algorithms 20
217 pages
CANON Color ImageRUNNER C2880, C2880i, C3380, C3380i Parts List
100% (1)
CANON Color ImageRUNNER C2880, C2880i, C3380, C3380i Parts List
150 pages
Co 2
No ratings yet
Co 2
22 pages
PRAM Algorithms
100% (1)
PRAM Algorithms
24 pages
1 Parallel and Distributed Computation
No ratings yet
1 Parallel and Distributed Computation
10 pages
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
No ratings yet
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
17 pages
Human Settlements and Town Planning
No ratings yet
Human Settlements and Town Planning
3 pages
Installation of Signboard
100% (1)
Installation of Signboard
13 pages
Invitation PWD Forum
No ratings yet
Invitation PWD Forum
5 pages
Parallel
No ratings yet
Parallel
13 pages
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
No ratings yet
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
104 pages
Ram, Pram, and Logp Models
No ratings yet
Ram, Pram, and Logp Models
72 pages
Parallel Algorithm Merged
No ratings yet
Parallel Algorithm Merged
76 pages
Parallel Algorithm Main Single
No ratings yet
Parallel Algorithm Main Single
289 pages
Chiller York San Lorenzo Ycal0024
No ratings yet
Chiller York San Lorenzo Ycal0024
112 pages
Parallel Thinking: Guy Blelloch Carnegie Mellon University
No ratings yet
Parallel Thinking: Guy Blelloch Carnegie Mellon University
41 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
Breccia Types: Hydrothermal, Fault, Volcanic, ETC: June 2016
No ratings yet
Breccia Types: Hydrothermal, Fault, Volcanic, ETC: June 2016
40 pages
ECE408 MT2 Review FA24
No ratings yet
ECE408 MT2 Review FA24
58 pages
Chapter 8 - Advanced Parallel Algorithms
No ratings yet
Chapter 8 - Advanced Parallel Algorithms
56 pages
Parallel Thinking: Guy Blelloch Carnegie Mellon University
No ratings yet
Parallel Thinking: Guy Blelloch Carnegie Mellon University
37 pages
Parallel Algorithms: Theory and Practice
No ratings yet
Parallel Algorithms: Theory and Practice
44 pages
Partial Molar Heat Content and Chemical Potential, Significance and Factors Affecting, Gibb's-Duhem Equation
No ratings yet
Partial Molar Heat Content and Chemical Potential, Significance and Factors Affecting, Gibb's-Duhem Equation
11 pages
Computer Graphics Final-1
No ratings yet
Computer Graphics Final-1
32 pages
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
No ratings yet
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
51 pages
PRAM COMP 633: Parallel Computing Algorithms: The PRAM Model of Computation
No ratings yet
PRAM COMP 633: Parallel Computing Algorithms: The PRAM Model of Computation
49 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
HPC Note
No ratings yet
HPC Note
39 pages
Lecture 10
No ratings yet
Lecture 10
40 pages
Parallel
No ratings yet
Parallel
59 pages
E2 UoE Unit 4
No ratings yet
E2 UoE Unit 4
16 pages
Written Asst2
No ratings yet
Written Asst2
27 pages
Agfa Parat-1
No ratings yet
Agfa Parat-1
30 pages
The PRAM Model and Algorithms: Advanced Topics Spring 2008
No ratings yet
The PRAM Model and Algorithms: Advanced Topics Spring 2008
24 pages
ADA Lab Manual Updated 2023-24 NEW
No ratings yet
ADA Lab Manual Updated 2023-24 NEW
36 pages
Fundamental Algorithms: Chapter 3: Parallel Algorithms - The PRAM Model
No ratings yet
Fundamental Algorithms: Chapter 3: Parallel Algorithms - The PRAM Model
26 pages
08 Dataparallel
No ratings yet
08 Dataparallel
51 pages
1.3 Abstract Machine Models in Parallel Computing
No ratings yet
1.3 Abstract Machine Models in Parallel Computing
48 pages
Week5 Lec14
No ratings yet
Week5 Lec14
27 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
RG2 ParallelizationPrinciples HPCAI Jan2020
No ratings yet
RG2 ParallelizationPrinciples HPCAI Jan2020
40 pages
Pda 3
No ratings yet
Pda 3
90 pages
Chapter 02
No ratings yet
Chapter 02
47 pages
CS4230 Parallel Programming Introduction To Parallel Algorithms
No ratings yet
CS4230 Parallel Programming Introduction To Parallel Algorithms
25 pages
Pda 4
No ratings yet
Pda 4
82 pages
Basic PRAM Algorithm Design Techniques
No ratings yet
Basic PRAM Algorithm Design Techniques
13 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
Simulating A CRCW Algorithm With An EREW Algorithm: Efficient Parallel Algorithms COMP308
No ratings yet
Simulating A CRCW Algorithm With An EREW Algorithm: Efficient Parallel Algorithms COMP308
11 pages
n32 Parallel
No ratings yet
n32 Parallel
16 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Lab Experiment 1 - Friction Pipe
No ratings yet
Lab Experiment 1 - Friction Pipe
7 pages
Parallel Algorithms
No ratings yet
Parallel Algorithms
19 pages
Module #2 Part 4 Gradient Series
No ratings yet
Module #2 Part 4 Gradient Series
15 pages
L8 Parallel Algorithms
No ratings yet
L8 Parallel Algorithms
41 pages
Imagery Use in Sport: Mediational Effects For Efficacy: Sandra E. Short, Amy Tenute, & Deborah L. Feltz
No ratings yet
Imagery Use in Sport: Mediational Effects For Efficacy: Sandra E. Short, Amy Tenute, & Deborah L. Feltz
11 pages
Encyclopedia of Giftedness Creativity and Talent 1st Edition Barbara Kerr Download
No ratings yet
Encyclopedia of Giftedness Creativity and Talent 1st Edition Barbara Kerr Download
86 pages
Assignment 1: Name Class Date Period Sbuid Netid Email
No ratings yet
Assignment 1: Name Class Date Period Sbuid Netid Email
4 pages
Mẫu Câu Writing Task 2 Hay
No ratings yet
Mẫu Câu Writing Task 2 Hay
15 pages
Chapter 14: Parallel Algorithms
No ratings yet
Chapter 14: Parallel Algorithms
23 pages
Lecture Parallelism DC PDF
No ratings yet
Lecture Parallelism DC PDF
7 pages
18CSP83 - Project Phase 2 - Body
No ratings yet
18CSP83 - Project Phase 2 - Body
11 pages
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
No ratings yet
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
3 pages
Assignment of Algorithm
No ratings yet
Assignment of Algorithm
9 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
9780374533557RGGReading Group Gold
No ratings yet
9780374533557RGGReading Group Gold
5 pages
Attitude Defines Our Altitude
No ratings yet
Attitude Defines Our Altitude
3 pages
PERSONAL-LIFELONG-LEARNING-PLAN Marilyn D. Tagao
No ratings yet
PERSONAL-LIFELONG-LEARNING-PLAN Marilyn D. Tagao
7 pages
1 Overview, Models of Computation, Brent's Theorem
No ratings yet
1 Overview, Models of Computation, Brent's Theorem
8 pages
Parallel Merge Sort
No ratings yet
Parallel Merge Sort
6 pages
Three
No ratings yet
Three
10 pages
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
No ratings yet
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
5 pages
SikaGrout-220 2011-11 - 1
No ratings yet
SikaGrout-220 2011-11 - 1
4 pages
Lec6 PRAMalgs
No ratings yet
Lec6 PRAMalgs
5 pages
PRAM and Distributed Computing Report
No ratings yet
PRAM and Distributed Computing Report
5 pages
Adiabatic Compressibility of Liquid Ammonia
No ratings yet
Adiabatic Compressibility of Liquid Ammonia
3 pages
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
No ratings yet
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
3 pages
1.1 Parallelism Is Ubiquitous
No ratings yet
1.1 Parallelism Is Ubiquitous
3 pages
Notes 03
No ratings yet
Notes 03
3 pages
Parallel Models of Computation
No ratings yet
Parallel Models of Computation
3 pages
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
No ratings yet
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
1 page
IMU (V) 2012 13 Detail Brochure
No ratings yet
IMU (V) 2012 13 Detail Brochure
6 pages
Ministry of Corporate Affairs: Only For Pay Later Payment. Not For Payment at Branch Counter E-Challan For Paying Later
No ratings yet
Ministry of Corporate Affairs: Only For Pay Later Payment. Not For Payment at Branch Counter E-Challan For Paying Later
2 pages
2nd Diagnostic Test
No ratings yet
2nd Diagnostic Test
2 pages
Case Study
No ratings yet
Case Study
2 pages

Pap 3 Shared Memory Algos

Uploaded by

Pap 3 Shared Memory Algos

Uploaded by

Parallel Algorithms and

Parallel algorithms in shared

Some shared-memory algorithms

Analysis of PRAM models

How to reason about parallel algorithms?

A model can be used to abstract away some of the complexity

About concurrent accesses to memory: 3 PRAM models

How powerful are the di erent models:

The sequential solution

Can we do better with a parallel algorithm?

A solution based on pointer jumping

# the list is stored in array *next*

This solution has an execution time in O(log n)

Note that the solution requires n PUs

Credit: Parallel algorithms, Casanova, Robert, Legrand.

In practice, if all processors do not execute synchronously,

Use of pointer jumping

We assume that we know sizeof(P)

It is usually important to parallelize the divide and the merge step:

With CRCW: O(1) steps

With CREW: O(1) steps

Unlimited amount of resources

Solution applying the pointer jumping technique

W (n) = O(n) × log(n)

If we do not have n processing units in practice, the large value of n

For instance, what would be a good algorithm on two processing

Performance of this algorithm

Execution time as a function of the number of PUs

You might also like

# the list is stored in array next