0% found this document useful (0 votes)

62 views40 pages

ParallelIzation Principles

This document discusses principles of parallelization including challenges, evaluation metrics, limitations, and steps to create a parallel program. It describes decomposing a problem into tasks, assigning tasks to processes, orchestrating communication and synchronization between processes, and mapping processes to processors. Common parallelization techniques like data parallelism using domain decomposition and task parallelism are presented. Evaluation metrics like speedup and efficiency are defined based on Amdahl's and Gustafson's laws. The document provides examples to illustrate concepts like data distributions, task graphs for mapping, and orchestrating communication for shared memory and message passing models.

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views40 pages

ParallelIzation Principles

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Parallelization Principles

Sathish Vadhiyar
Parallel Programming and Challenges
 Recall the advantages and motivation of
parallelism
 But parallel programs incur overheads not
seen in sequential programs
 Communication delay
 Idling
 Synchronization

2
Challenges

Idle time
Computation

Communication

Synchronization

3
How do we evaluate a parallel program?
 Execution time, Tp
 Speedup, S
 S(p, n) = T(1, n) / T(p, n)
 Usually, S(p, n) < p
 Sometimes S(p, n) > p (superlinear speedup)
 Efficiency, E
 E(p, n) = S(p, n)/p
 Usually, E(p, n) < 1
 Sometimes, greater than 1
 Scalability – Limitations in parallel computing,
relation to n and p.

4
Speedups and efficiency

S E

Ideal p p

Practical
5
Limitations on speedup – Amdahl’s law
 Amdahl's law states that the performance
improvement to be gained from using some faster
mode of execution is limited by the fraction of
the time the faster mode can be used.
 Overall speedup in terms of fractions of
computation time with and without enhancement,
% increase in enhancement.
 Places a limit on the speedup due to parallelism.
 Speedup = 1
(fs + (fp/P))

6
Gustafson’s Law
 Increase problem size proportionally so as to
keep the overall time constant
 The scaling keeping the problem size
constant (Amdahl’s law) is called strong
scaling
 The scaling due to increasing problem size is
called weak scaling

7
PARALLEL PROGRAMMING
CLASSIFICATION AND STEPS

8
Programming Paradigms
 Shared memory model – Threads, OpenMP,
CUDA
 Message passing model – MPI

9
Parallelizing a Program
Given a sequential program/algorithm, how to
go about producing a parallel version
Four steps in program parallelization
1. Decomposition
Identifying parallel tasks with large extent of possible
concurrent activity; splitting the problem into tasks
2. Assignment
Grouping the tasks into processes with best load
balancing
3. Orchestration
Reducing synchronization and communication costs
4. Mapping
Mapping of processes to processors (if possible)
10
Steps in Creating a Parallel Program
Partitioning

D A O M
e s r a
c s c p
o i h p
m g p0 p1 e p0 p1 i
p s P0 P1
n n
o m t g
s e r
i n a
t t t
P2 P3
i p2 p3 i p2 p3
o o
n n

Sequential Tasks Processes Parallel Processors

computation program

11
Decomposition and Assignment
 Specifies how to group tasks together for a process
 Balance workload, reduce communication and
management cost
 In practical cases, both steps combined into
one step, trying to answer the question “What
is the role of each parallel processing entity?”

12
Data Parallelism and Domain
Decomposition
 Given data divided across the processing
entitites
 Each process owns and computes a portion
of the data – owner-computes rule
 Multi-dimensional domain in simulations
divided into subdomains equal to processing
entities
 This is called domain decomposition

13
Domain decomposition and Process
Grids
 The given P processes arranged in multi-
dimensions forming a process grid
 The domain of the problem divided into
process grid

14
Illustrations

15
Data Distributions
 For dividing the data in a dimension using the
processes in a dimension, data distribution
schemes are followed
 Common data distributions:
 Block: for regular
computations
 Block-cyclic: when
there is load
imbalance across
space

16
Task parallelism
 Independent tasks identified
 The task may or may not process different
data

17
Orchestration
 Goals
 Structuring communication
 Synchronization
 Challenges
 Organizing data structures – packing
 Small or large messages?
 How to organize communication and
synchronization ?

18
Orchestration
 Maximizing data locality
 Minimizing volume of data exchange
 Not communicating intermediate results – e.g. dot product
 Minimizing frequency of interactions - packing
 Minimizing contention and hot spots
 Do not use the same communication pattern with the
other processes in all the processes
 Overlapping computations with interactions
 Split computations into phases: those that depend on
communicated data (type 1) and those that do not (type
2)
 Initiate communication for type 1; During
communication, perform type 2
 Replicating data or computations
 Balancing the extra computation or storage cost with
the gain due to less communication
19
Mapping
 Which process runs on which particular
processor?
 Can depend on network topology, communication
pattern of processes
 On processor speeds in case of heterogeneous
systems
 The tasks are grouped by a process called mapping
 Two objectives:
 Balance the groups
 Minimize inter-group dependencies
 Represented as task graph
 Mapping problem is NP-hard
20
Based on Task Partitioning

 Based on task dependency graph

0 4

0 2 4 6

0 1 2 3 4 5 6 7

 In general the problem is NP complete

21
High-level Goals

Table 2.1 Steps in the Parallelization Process and Their Goals

Architecture-
Step Dependent? Major Performance Goals
Decomposition Mostly no Expose enough concurr ency but not too much
Assignment Mostly no Balance workload
Reduce communication volume
Orchestration Yes Reduce noninher ent communication via data
locality
Reduce communication and synchr onization cost
as seen by the processor
Reduce serialization at shared resources
Schedule tasks to satisfy dependences early
Mapping Yes Put related processes on the same processor if
necessary
Exploit locality in network topology

22
Example
Given a 2-d array of float values, repeatedly
average each elements with immediate
neighbours until the difference between two
iterations is less than some tolerance value
do {
diff = 0.0 A[i-1][j]
for (i=0; i < n; i++)
for (j=0; j < n, j++){
A[i][j-1] A[i][j] A[i][j+1]
temp = A[i] [j];
A[i][j] = average (neighbours);
diff += abs (A[i][j] – temp); A[i+1][j]
}
while (diff > tolerance) ;

23
Assignment

24
Orchestration
 Different for different programming
models/architectures
 Shared address space
 Naming: global addr. Space
 Synch. through barriers and locks
 Distributed Memory /Message passing
 Non-shared address space
 Send-receive messages + barrier for synch.

25
SAS Version – Generating Processes
1. int n, nprocs; /* matrix: (n + 2-by-n + 2) elts.*/
2. float **A, diff = 0;
2a. LockDec (lock_diff);
2b. BarrierDec (barrier1);
3. main()
4. begin
5. read(n) ; /*read input parameter: matrix size*/
5a. Read (nprocs);
6. A  g_malloc (a 2-d array of (n+2) x (n+2) doubles);
6a. Create (nprocs -1, Solve, A);
7. initialize(A); /*initialize the matrix A somehow*/
8. Solve (A); /*call the routine to solve equation*/
8a. Wait_for_End (nprocs-1);
9. end main

26
SAS Version -- Solve
10. procedure Solve (A) /*solve the equation system*/
11. float **A; /*A is an (n + 2)-by-(n + 2) array*/
12. begin
13. int i, j, pid, done = 0;
14. float temp;
14a. mybegin = 1 + (n/nprocs)*pid;
14b. myend = mybegin + (n/nprocs);
15. while (!done) do /*outermost loop over sweeps*/
16. diff = 0; /*initialize difference to 0*/
16a. Barriers (barrier1, nprocs);
17. for i  mybeg to myend do/*sweep for all points of grid*/
18. for j  1 to n do
19. temp = A[i,j]; /*save old value of element*/
20. A[i,j]  0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] +
21. A[i,j+1] + A[i+1,j]); /*compute average*/
22. diff += abs(A[i,j] - temp);
23. end for
24. end for
25. if (diff/(n*n) < TOL) then done = 1;
26. end while
27. end procedure

27
SAS Version -- Issues
 SPMD program
 Wait_for_end – all to one communication
 How is diff accessed among processes?
 Mutex to ensure diff is updated correctly.
 Single lock  too much synchronization!
 Need not synchronize for every grid point. Can do only
once.
 What about access to A[i][j], especially the boundary
rows between processes?
 Can loop termination be determined without any
synch. among processes?
 Do we need any statement for the termination condition
statement

28
SAS Version -- Solve
10. procedure Solve (A) /*solve the equation system*/
11. float **A; /*A is an (n + 2)-by-(n + 2) array*/
12. begin
13. int i, j, pid, done = 0;
14. float mydiff, temp;
14a. mybegin = 1 + (n/nprocs)*pid;
14b. myend = mybegin + (n/nprocs);
15. while (!done) do /*outermost loop over sweeps*/
16. mydiff = diff = 0; /*initialize local difference to 0*/
16a. Barriers (barrier1, nprocs);
17. for i  mybeg to myend do/*sweep for all points of grid*/
18. for j  1 to n do
19. temp = A[i,j]; /*save old value of element*/
20. A[i,j]  0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] +
21. A[i,j+1] + A[i+1,j]); /*compute average*/
22. mydiff += abs(A[i,j] - temp);
23. end for
24. end for
24a lock (diff-lock);
24b. diff += mydiff;
24c unlock (diff-lock)
24d. barrier (barrier1, nprocs);
25. if (diff/(n*n) < TOL) then done = 1;
25a. Barrier (barrier1, nprocs);
26. end while
27. end procedure

29
SAS Program
 done condition evaluated redundantly by all
 Code that does the update identical to
sequential program
 each process has private mydiff variable
 Most interesting special operations are for
synchronization
 accumulations into shared diff have to be mutually
exclusive
 why the need for all the barriers?

 Good global reduction?

 Utility of this parallel accumulate??
30
Message Passing Version
 Cannot declare A to be global shared array
 compose it from per-process private arrays
 usually allocated in accordance with the assignment of

work -- owner-compute rule

 process assigned a set of rows allocates them locally

 Structurally similar to SPMD SAS

 Orchestration different
 data structures and data access/naming
 communication

 synchronization

 Ghost rows
31
Data Layout and Orchestration
P0

Data partition allocated per processor

Add ghost rows to hold boundary data
Send edges to neighbors
P4
Receive into ghost rows
Compute as in sequential program

32
Message Passing Version – Generating
Processes
1. int n, nprocs; /* matrix: (n + 2-by-n + 2) elts.*/
2. float **myA;
3. main()
4. begin
5. read(n) ; /*read input parameter: matrix size*/
5a. read (nprocs);
/* 6. A  g_malloc (a 2-d array of (n+2) x (n+2) doubles); */
6a. Create (nprocs -1, Solve, A);
/* 7. initialize(A); */ /*initialize the matrix A somehow*/
8. Solve (A); /*call the routine to solve equation*/
8a. Wait_for_End (nprocs-1);
9. end main

33
Message Passing Version – Array allocation
and Ghost-row Copying
10. procedure Solve (A) /*solve the equation system*/
11. float **A; /*A is an (n + 2)-by-(n + 2) array*/
12. begin
13. int i, j, pid, done = 0;
14. float mydiff, temp;
14a. myend = (n/nprocs) ;
6. myA = malloc (array of (n/nprocs) x n floats );
7. initialize (myA); /* initialize myA LOCALLY */
15. while (!done) do /*outermost loop over sweeps*/
16. mydiff = 0; /*initialize local difference to 0*/
16a. if (pid != 0) then
SEND (&myA[1,0] , n*sizeof(float), (pid-1), row);
16b. if (pid != nprocs-1) then
SEND (&myA[myend,0], n*sizeof(float), (pid+1), row);
16c. if (pid != 0) then
RECEIVE (&myA[0,0], n*sizeof(float), (pid -1), row);
16d. if (pid != nprocs-1) then
RECEIVE (&myA[myend+1,0], n*sizeof(float), (pid -1),
row);

34
Message Passing Version – Solver
12. begin
… … …
15. while (!done) do /*outermost loop over sweeps*/
… … …
17. for i  1 to myend do/*sweep for all points of grid*/
18. for j  1 to n do
19. temp = myA[i,j]; /*save old value of element*/
20. myA[i,j]  0.2 * (myA[i,j] + myA[i,j-1] +myA[i-1,j] +
21. myA[i,j+1] + myA[i+1,j]); /*compute average*/
22. mydiff += abs(myA[i,j] - temp);
23. end for
24. end for
24a if (pid != 0) then
24b. SEND (mydiff, sizeof (float), 0, DIFF);
24c. RECEIVE (done, sizeof(int), 0, DONE);
24d. else
24e. for k  1 to nprocs-1 do
24f. RECEIVE (tempdiff, sizeof(float), k , DIFF);
24g. mydiff += tempdiff;
24h. endfor
24i. If(mydiff/(n*n) < TOL) then done = 1;
24j. for k  1 to nprocs-1 do
24k. SEND (done, sizeof(float), k , DONE);
24l. endfor
25. end while
26. end procedure
35
Notes on Message Passing Version
 Receive does not transfer data, send does
 unlike SAS which is usually receiver-initiated (load
fetches data)
 Can there be deadlock situation due to sends?
 Communication done at once in whole rows at
beginning of iteration, not grid-point by grid-point
 Core similar, but indices/bounds in local rather
than global space
 Synchronization through sends and receives
 Update of global diff and event synch for done
condition – mutual exclusion occurs naturally

36
Orchestration: Summary
 Shared address space
 Shared and private data explicitly separate
 Communication implicit in access patterns

 Synchronization via atomic operations on shared data

 Synchronization explicit and distinct from data

communication

37
Orchestration: Summary
 Message passing
 Data distribution among local address spaces needed
 No explicit shared structures (implicit in comm. patterns)

 Communication is explicit

 Synchronization implicit in communication (at least in

synch. case)

38
Grid Solver Program: Summary
 Decomposition and Assignment similar in SAS and
message-passing
 Orchestration is different
 Data structures, data access/naming, communication,
synchronization
 Performance?

39
Grid Solver Program: Summary

SAS Msg-Passing

Explicit global data structure? Yes No

Communication Implicit Explicit

Synchronization Explicit Implicit

Explicit replication of border rows? No Yes

Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
Why Parallel Computing?: Peter Pacheco
No ratings yet
Why Parallel Computing?: Peter Pacheco
84 pages
Parallel and Distributed Lec 8
No ratings yet
Parallel and Distributed Lec 8
24 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
AA Part1
No ratings yet
AA Part1
43 pages
CS439 CC 2 Parallel Distributed Systems
No ratings yet
CS439 CC 2 Parallel Distributed Systems
37 pages
L04 Parallel Programming Models I
No ratings yet
L04 Parallel Programming Models I
72 pages
Parallel Algorithm Design Principles and Programming
No ratings yet
Parallel Algorithm Design Principles and Programming
8 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
PDC Unit-2
No ratings yet
PDC Unit-2
48 pages
Cours 2
No ratings yet
Cours 2
25 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
RG2 ParallelizationPrinciples HPCAI Jan2020
No ratings yet
RG2 ParallelizationPrinciples HPCAI Jan2020
40 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
63 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
PDC Summers Finals Revision Notes
No ratings yet
PDC Summers Finals Revision Notes
50 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
How To Parallelise An Application
No ratings yet
How To Parallelise An Application
30 pages
Partitioning
No ratings yet
Partitioning
37 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
HPC Note
No ratings yet
HPC Note
39 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
04 Progbasics
No ratings yet
04 Progbasics
62 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Parallel Programming Models: Sathish Vadhiyar
No ratings yet
Parallel Programming Models: Sathish Vadhiyar
32 pages
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
No ratings yet
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
47 pages
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
42 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Data Analytics and Digital Transformation Erik Beulen Marla A Dans Download
No ratings yet
Data Analytics and Digital Transformation Erik Beulen Marla A Dans Download
79 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
24 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
ChatGPT Codex - System - Card
No ratings yet
ChatGPT Codex - System - Card
8 pages
Aphs
No ratings yet
Aphs
111 pages
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
No ratings yet
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
1 page
Chapter 2 Types and Level of Testing
No ratings yet
Chapter 2 Types and Level of Testing
24 pages
Granite Code Models: A Family of Open Foundation Models For Code Intelligence
No ratings yet
Granite Code Models: A Family of Open Foundation Models For Code Intelligence
28 pages
Ecc Deadline Related
No ratings yet
Ecc Deadline Related
2 pages
AlphaCom XE7 - Zenitel Wiki
No ratings yet
AlphaCom XE7 - Zenitel Wiki
11 pages
Ict Assignment
No ratings yet
Ict Assignment
23 pages
Assignment: Objective
No ratings yet
Assignment: Objective
30 pages
Ms SQL Server Always On Io Reliability Storage System On Hitachi VSP
No ratings yet
Ms SQL Server Always On Io Reliability Storage System On Hitachi VSP
25 pages
The Evolution of Programming Languages - From 1843 To Today
No ratings yet
The Evolution of Programming Languages - From 1843 To Today
7 pages
Ch.01 - Introduction - To - Computers
No ratings yet
Ch.01 - Introduction - To - Computers
7 pages
Unit-5: Attacks and Techniques Used in Cyber Crime
No ratings yet
Unit-5: Attacks and Techniques Used in Cyber Crime
17 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
Snake and Ladder Game
100% (1)
Snake and Ladder Game
23 pages
IP Address Lookup - Geolocation
No ratings yet
IP Address Lookup - Geolocation
4 pages
NC 3 Open Test (1) (1) 25 July 2022
No ratings yet
NC 3 Open Test (1) (1) 25 July 2022
9 pages
Qradar Admin Guide
No ratings yet
Qradar Admin Guide
342 pages
3 - Tokens
No ratings yet
3 - Tokens
12 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Open MP
No ratings yet
Open MP
35 pages
Cloud Computing and Web Services
No ratings yet
Cloud Computing and Web Services
4 pages
Unit 1 Cyber Security Notes
No ratings yet
Unit 1 Cyber Security Notes
9 pages
The Future of Work: Robotic Process Automation and Its Role in Shaping Tomorrow's Business Landscape
No ratings yet
The Future of Work: Robotic Process Automation and Its Role in Shaping Tomorrow's Business Landscape
19 pages
Switch Between Android GMS and Non GMS Builds
No ratings yet
Switch Between Android GMS and Non GMS Builds
1 page
Scanned by Camscanner
No ratings yet
Scanned by Camscanner
6 pages
Create Directory Structure
No ratings yet
Create Directory Structure
5 pages
Banking System API DocumentationV1.0 - Sample
No ratings yet
Banking System API DocumentationV1.0 - Sample
14 pages
SAP Master Data Management
75% (4)
SAP Master Data Management
3 pages
VMS Presentation 2018
No ratings yet
VMS Presentation 2018
17 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
2 - AMCAT Auto-Proctored Instructions
No ratings yet
2 - AMCAT Auto-Proctored Instructions
2 pages
WinWire IoT Project-HLD-0.1
No ratings yet
WinWire IoT Project-HLD-0.1
8 pages
7 Data Science / Machine Learning Cheat Sheets in One
100% (1)
7 Data Science / Machine Learning Cheat Sheets in One
9 pages
Course Objectives
No ratings yet
Course Objectives
3 pages
PGSQL CheatSheet Mysql2psql
No ratings yet
PGSQL CheatSheet Mysql2psql
7 pages

ParallelIzation Principles

Uploaded by

ParallelIzation Principles

Uploaded by

Parallelization Principles

Sequential Tasks Processes Parallel Processors

 Based on task dependency graph

 In general the problem is NP complete

Table 2.1 Steps in the Parallelization Process and Their Goals

 Good global reduction?

work -- owner-compute rule

 Structurally similar to SPMD SAS

Data partition allocated per processor

 Synchronization via atomic operations on shared data

 Synchronization explicit and distinct from data

 Synchronization implicit in communication (at least in

Explicit global data structure? Yes No

Communication Implicit Explicit

Synchronization Explicit Implicit

Explicit replication of border rows? No Yes

You might also like