0% found this document useful (0 votes)

14 views

02 cmsc416 Parallel

Uploaded by

qiqi85078802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

02 cmsc416 Parallel

Uploaded by

qiqi85078802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction to Parallel Computing (CMSC416 / CMSC616)

Designing Parallel Programs

Abhinav Bhatele, Alan Sussman
Reminders / Annoucements

• If you do not have a zaratan account, email: [email protected]

• When emailing, please mention your course and section number:

• Example: 416 / Section 0201

• Accomodations: please get the letters to the respective instructors soon

• Join piazza: https://piazza.com/umd/fall2024/cmsc416cmsc616

• Assignment 0 will be posted tonight Sep 3 11:59 pm, due on Sep 10 11:59 pm

• Of ce hours have been posted on the website

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 2

fi
Writing parallel programs

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 3

Writing parallel programs

• Decide the serial algorithm rst

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 3

fi
Writing parallel programs

SPMD model
• Decide the serial algorithm rst

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 3

fi
Writing parallel programs

SPMD model
• Decide the serial algorithm rst

• Data: how to distribute data among threads/processes?

• Data locality: assignment of data to speci c processes to minimize data movement

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 3

fi
fi
Writing parallel programs

SPMD model
• Decide the serial algorithm rst

• Data: how to distribute data among threads/processes?

• Data locality: assignment of data to speci c processes to minimize data movement

• Computation: how to divide work among threads/processes?

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 3

fi
fi
Writing parallel programs

SPMD model
• Decide the serial algorithm rst

• Data: how to distribute data among threads/processes?

• Data locality: assignment of data to speci c processes to minimize data movement

• Computation: how to divide work among threads/processes?

• Figure out how often communication will be needed

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 3

fi
fi
Conway’s Game of Life

• Two-dimensional grid of (square) cells

• Each cell can be in one of two states: live or dead

• Every cell only interacts with its eight nearest

neighbors

• In every generation (or iteration or time step),

there are some rules that decide if a cell will
continue to live or die or be born (dead ➜ live)
By Lev Kalmykov - Own work, CC BY-SA 4.0,
https://en.wikipedia.org/wiki/Conway's_Game_of_Life
https://commons.wikimedia.org/w/index.php?curid=43448735

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 4

Conway’s Game of Life

• Two-dimensional grid of (square) cells

• Each cell can be in one of two states: live or dead

• Every cell only interacts with its eight nearest

neighbors

• In every generation (or iteration or time step),

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 4

Two-dimensional stencil computation
2D 5-point Stencil

• Commonly found kernel in computational codes

• Heat diffusion, Jacobi method, Gauss-Seidel method

A[i, j] + A[i − 1,j] + A[i + 1,j] + A[i, j − 1] + A[i, j + 1]

A[i, j] =
5

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 5

Two-dimensional stencil computation
2D 5-point Stencil

• Commonly found kernel in computational codes

• Heat diffusion, Jacobi method, Gauss-Seidel method

A[i, j] + A[i − 1,j] + A[i + 1,j] + A[i, j − 1] + A[i, j + 1]

A[i, j] =
5
3D 7-point Stencil
Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 5
Serial code

for(int t=0; t<num_steps; t++) {

...

// copy contents of A_new into A

for(i ...)
for(j ...)
A_new[i, j] = (A[i, j] + A[i-1, j] + A[i+1, j] + A[i, j-1] + A[i, j+1]) * 0.2

...
}

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 6

Serial code

Why do we keep two

for(int t=0; t<num_steps; t++) { copies of A?
...

// copy contents of A_new into A

for(i ...)
for(j ...)
A_new[i, j] = (A[i, j] + A[i-1, j] + A[i+1, j] + A[i, j-1] + A[i, j+1]) * 0.2

...
}

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 6

Serial code

Why do we keep two

for(int t=0; t<num_steps; t++) { copies of A?
...

// copy contents of A_new into A

for(i ...)
for(j ...)
A_new[i, j] = (A[i, j] + A[i-1, j] + A[i+1, j] + A[i, j-1] + A[i, j+1]) * 0.2

...
} For correctness, we have to ensure that
elements in A are not written into before they
are read in the same timestep / iteration

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 6

2D stencil computation in parallel

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 7

2D stencil computation in parallel

• 1D decomposition
• Divide rows (or columns) among processes

• Each process has to communicate with two

neighbors (above and below)

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 7

2D stencil computation in parallel

• 1D decomposition
• Divide rows (or columns) among processes

• Each process has to communicate with two

neighbors (above and below)

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 7

2D stencil computation in parallel

• 1D decomposition
• Divide rows (or columns) among processes

• Each process has to communicate with two

neighbors (above and below) Ghost cells

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 7

2D stencil computation in parallel

• 1D decomposition
• Divide rows (or columns) among processes

• Each process has to communicate with two

neighbors (above and below) Ghost cells

• 2D decomposition
• Divide both rows and columns (2d blocks)
among processes

• Each process has to communicate with four

neighbors

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 7

2D stencil computation in parallel

• 1D decomposition
• Divide rows (or columns) among processes

• Each process has to communicate with two

neighbors (above and below) Ghost cells

• 2D decomposition
• Divide both rows and columns (2d blocks)
among processes

• Each process has to communicate with four

neighbors

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 7

Prefix sum
• Calculate sums of pre xes (running totals) of elements (numbers) in an array

• Also called a “scan” sometimes

pSum[0] = A[0]

for(i=1; i<N; i++) {

pSum[i] = pSum[i-1] + A[i]
}

A 1 2 3 4 5 6 …
pSum 1 3 6 10 15 21 …

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 8

fi
Parallel prefix sum

2 8 3 5 7 4 1 6

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 9

Parallel prefix sum
Processes/
0 1 2 3 4 5 6 7
threads
2 8 3 5 7 4 1 6

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 9

Parallel prefix sum
Processes/
0 1 2 3 4 5 6 7
threads
2 8 3 5 7 4 1 6

Stride 1

2 10 11 8 12 11 5 7

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 9

Parallel prefix sum
Processes/
0 1 2 3 4 5 6 7
threads
2 8 3 5 7 4 1 6

Stride 1

2 10 11 8 12 11 5 7

Stride 2

2 10 13 18 23 19 17 18

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 9

Parallel prefix sum
Processes/
0 1 2 3 4 5 6 7
threads
2 8 3 5 7 4 1 6

Stride 1

2 10 11 8 12 11 5 7

Stride 2

2 10 13 18 23 19 17 18

Stride 4

2 10 13 18 25 29 30 36

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 9

In practice

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 10

In practice

• You have N numbers and p processes, N >> p

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 10

In practice

• You have N numbers and p processes, N >> p

• Assign a N/p block to each process

• Do the serial pre x sum calculation for the blocks owned on each process locally

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 10

fi
In practice

• You have N numbers and p processes, N >> p

• Assign a N/p block to each process

• Do the serial pre x sum calculation for the blocks owned on each process locally

• Then do parallel algorithm with partial pre x sums (using the last element from each
local block)
• Last element from sending process is added to all elements in receiving process’ sub-block

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 10

fi
fi
Load balance and grain size

• Load balance: try to balance the amount of work (computation) assigned to different
threads/ processes
• Bring ratio of maximum to average load as close to 1.0 as possible

• Secondary consideration: also load balance amount of communication

• Grain size: ratio of computation-to-communication

• Coarse-grained (more computation) vs. ne-grained (more communication)

Abhinav Bhatele, Alan Sussman (CMSC416 / CMSC616) 11

Analytical Chemistry II Classical Methods Notes
100% (1)
Analytical Chemistry II Classical Methods Notes
65 pages
Risk Assessment For Installation of Fire Fighting System
100% (5)
Risk Assessment For Installation of Fire Fighting System
14 pages
06 cmsc416 Algorithms
No ratings yet
06 cmsc416 Algorithms
40 pages
05 Cmsc416 Perf Analysis
No ratings yet
05 Cmsc416 Perf Analysis
79 pages
04 Progbasics
No ratings yet
04 Progbasics
62 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
04 Progbasics
No ratings yet
04 Progbasics
43 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
No ratings yet
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
53 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
Lec 14
No ratings yet
Lec 14
52 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
CUDA Tricks PDF
No ratings yet
CUDA Tricks PDF
33 pages
Lecture 8
No ratings yet
Lecture 8
42 pages
Lecture02 Types
No ratings yet
Lecture02 Types
21 pages
01 cmsc416 Intro
No ratings yet
01 cmsc416 Intro
51 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Chapter 9 - Parallel Computation Problems
No ratings yet
Chapter 9 - Parallel Computation Problems
43 pages
250116_L4
No ratings yet
250116_L4
53 pages
Matrix-Matrix Multiplication Using Shared Memory
No ratings yet
Matrix-Matrix Multiplication Using Shared Memory
27 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
RG2-ParallelizationPrinciples-HPCAI-Jan2020
No ratings yet
RG2-ParallelizationPrinciples-HPCAI-Jan2020
40 pages
Untitled document
No ratings yet
Untitled document
23 pages
ECE408 S19 ZJUI Exam1 Study Guide
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
25 pages
Sunil Kumar L 24
No ratings yet
Sunil Kumar L 24
21 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
No ratings yet
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
40 pages
Parralel Demro 001
No ratings yet
Parralel Demro 001
45 pages
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
No ratings yet
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
32 pages
6.189 Lecture5 Parallelism
No ratings yet
6.189 Lecture5 Parallelism
63 pages
217 Lec7
No ratings yet
217 Lec7
30 pages
Parallel Algorithm Design
No ratings yet
Parallel Algorithm Design
8 pages
03 cmsc416 Openmp
No ratings yet
03 cmsc416 Openmp
49 pages
756-4-17-2012 Parallel
No ratings yet
756-4-17-2012 Parallel
80 pages
Ece408 Lecture19 Sparse Matrix VK SP23
No ratings yet
Ece408 Lecture19 Sparse Matrix VK SP23
28 pages
COSC 4101 Parallel and Distributed Computing Final
No ratings yet
COSC 4101 Parallel and Distributed Computing Final
4 pages
3.Introduction to Parallelism
No ratings yet
3.Introduction to Parallelism
64 pages
Parallel Programming Models: Sathish Vadhiyar
No ratings yet
Parallel Programming Models: Sathish Vadhiyar
32 pages
Introduction
No ratings yet
Introduction
46 pages
Design For Performance
100% (1)
Design For Performance
34 pages
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
100% (5)
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
4 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
HPC Note
No ratings yet
HPC Note
39 pages
Introduction to Paralel Procesing
No ratings yet
Introduction to Paralel Procesing
40 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Pemrosesan Parale2l
No ratings yet
Pemrosesan Parale2l
27 pages
04_progbasics
No ratings yet
04_progbasics
51 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
3-CUDA
No ratings yet
3-CUDA
5 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
Programming Parallelism: by Kelvin Chou
No ratings yet
Programming Parallelism: by Kelvin Chou
27 pages
Unit-6 Concurrent and Parallel Programming:: C++ AMP (Accelerated Massive Programming)
No ratings yet
Unit-6 Concurrent and Parallel Programming:: C++ AMP (Accelerated Massive Programming)
21 pages
S0285 Optimization of Sparse Matrix Matrixltiplication On GPU
No ratings yet
S0285 Optimization of Sparse Matrix Matrixltiplication On GPU
21 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
Web GPU
0% (1)
Web GPU
40 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
18 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Jurnal Ambroxol Tablet
No ratings yet
Jurnal Ambroxol Tablet
6 pages
KidsReaders Catalogue 2019
No ratings yet
KidsReaders Catalogue 2019
16 pages
Rollright Stones Oxford Shire
100% (1)
Rollright Stones Oxford Shire
94 pages
Explaining International Relations Since 1945 PDF
0% (4)
Explaining International Relations Since 1945 PDF
2 pages
Professional CompanyProfile PratamaIndomitra 2020 R01 LR
No ratings yet
Professional CompanyProfile PratamaIndomitra 2020 R01 LR
7 pages
E-Customer Application Form: Subscriber ID Subscriber Name
No ratings yet
E-Customer Application Form: Subscriber ID Subscriber Name
1 page
Events and Issues - Script
No ratings yet
Events and Issues - Script
2 pages
Secondary Data Analysis - Introduction - Workshop
No ratings yet
Secondary Data Analysis - Introduction - Workshop
35 pages
Duracoat AR: Elastomeric, Flexible Cementitious Waterproofing Coating
No ratings yet
Duracoat AR: Elastomeric, Flexible Cementitious Waterproofing Coating
3 pages
Cashew Report
100% (1)
Cashew Report
13 pages
Material Downloaded From - 1 / 5
No ratings yet
Material Downloaded From - 1 / 5
5 pages
Philippine Overseas Employment Administration
No ratings yet
Philippine Overseas Employment Administration
15 pages
Willa B. Brown
No ratings yet
Willa B. Brown
1 page
Amen by Vann Joseph B. Ibasco RN
No ratings yet
Amen by Vann Joseph B. Ibasco RN
1 page
Concept of Corporate Social Responsibility
100% (1)
Concept of Corporate Social Responsibility
40 pages
The Bingo Paradox MH Sept17
No ratings yet
The Bingo Paradox MH Sept17
4 pages
Hindi Atoms & Molecules in One Shot Anubha
100% (1)
Hindi Atoms & Molecules in One Shot Anubha
112 pages
Niday, Mers Opening Brief Oregon Supreme Court
No ratings yet
Niday, Mers Opening Brief Oregon Supreme Court
67 pages
CDR
No ratings yet
CDR
10 pages
Faking Nature PDF
No ratings yet
Faking Nature PDF
2 pages
Business Profile: Civil Contractors and Interior and Exterior Decorators
No ratings yet
Business Profile: Civil Contractors and Interior and Exterior Decorators
24 pages
Portfolio Part 3 - Individual Assessment: - Integrity - Checklist PDF
No ratings yet
Portfolio Part 3 - Individual Assessment: - Integrity - Checklist PDF
2 pages
Operation and Control of STP
No ratings yet
Operation and Control of STP
11 pages
Disclosure To Promote The Right To Information
No ratings yet
Disclosure To Promote The Right To Information
18 pages
Past Papers
No ratings yet
Past Papers
6 pages
Defined by Excellence: Annual Report 2020/21
No ratings yet
Defined by Excellence: Annual Report 2020/21
200 pages
CR750, CR751 - Instruction Manual (Ethernet Function) BFP-A3379-0 (03.15)
No ratings yet
CR750, CR751 - Instruction Manual (Ethernet Function) BFP-A3379-0 (03.15)
82 pages
Conceptual Framework
100% (1)
Conceptual Framework
5 pages