0% found this document useful (0 votes)

20 views25 pages

Cours 2

Uploaded by

Said Saterih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views25 pages

Cours 2

Uploaded by

Said Saterih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Parallel performance measure and embarrassingly

Parallel algorithms
Performance measure and load balancing

Xavier JUVIGNY, SN2A, DAAA, ONERA

[email protected]
Course Parallel Programming
1 ONERA,2 DAAA
- September 27th 2023 -
Ce document est la propriété de l’ONERA. Il ne peut être communiqué à des tiers et/ou reproduit sans l’autorisation préalable écrite de l’ONERA, et son contenu ne peut être divulgué.
This document and the information contained herein is proprietary information of ONERA and shall not be disclosed or reproduced without the prior authorization of ONERA.
Table of contents

2 Embarrassingly parallel algorithms

1 Performance tools
3 Nearly embarrassingly parallel algorithm

09/27/2023 X. JUVIGNY Parallel programming 1

Overview

2 Embarrassingly parallel algorithms

1 Performance tools
3 Nearly embarrassingly parallel algorithm

09/27/2023 X. JUVIGNY Parallel programming 2

Speed-up

Deﬁnition
Let
• ts : Sequential execution time
• tp (n) : Execution time on n computing units ;
Speed-up is deﬁned as :
ts
S(n) = (1)
tp (n)

Remark
The sequential algorithm is often different from the parallel algorithm. In this case, speed-up measure is not obvious. In particular,
the following questions must be asked among other questions :
• Is the sequential algorithm optimal in complexity ?
• Is the sequential algorithm well optimized ?
• Is the sequential algorithm exploiting at best the cache memory ?

09/27/2023 X. JUVIGNY Parallel programming 3

Amdahl’s law

Give a limit for the speed-up

• Let ts be the time necessary to run the code in sequential
• Let f be the fraction of ts , relative to the part of the code which can’t be parallelized
So, the best expected speedup is :
ts n 1
S(n) = = −−−→
f :ts +
(1−f )ts 1 + (n − 1)f n→∞ f
n

This law is useful to ﬁnd a reasonable number of computing cores to use for an application.

Limitation of the law

f may change with the volume of input data and bigger input data may improve the speed-up.

09/27/2023 X. JUVIGNY Parallel programming 4

Gustafson’s law

Speed-up behaviour with constant volume input data per process

• Hypotheses :
• ts ≥ 0 the time to execute the sequential part of the code is independent of the volume of input data ;
• tp > 0 the time to execute the parallel part of the code is linear relative of the volume of input data.
• Let’s consider ts + tp = 1 (one unit of time).
• Let ts be the time taken by the execution of the sequential part of the code ;
• Let tp be the time taken by the execution of the parallel part of the code for a ﬁxed amount of data.

ts + n:tp (1 − n)ts
S(n) = =n+ = n + (1 − n):ts
ts + tp ts + tp

09/27/2023 X. JUVIGNY Parallel programming 5

Scalability

Deﬁnition
For a parallel program, scalability is the behaviour of the speed-up when we raise up the
number of processes or/and the amount of input data.

How to evaluate the scalability ?

• Evaluate the worst speed-up : For a global ﬁxed amount of data, draw the speed-up
curve in function of the number of processes ;
• Evaluate the best speed-up : For a ﬁxed amount of data per process, draw the
speed-up curve in function of the number of processes ;
• In concrete use of the program, the speed-up may be between the worst and best
scenario.

09/27/2023 X. JUVIGNY Parallel programming 6

Granularity

Ratio between computing intensity and quantity of data exchanged between processes

• Sending and receiving data is prohibitive :

• Initial cost of a message : each message has an initial cost : set the connection, get the same protocol, etc.This cost
is constant.
• Cost of the data transfer : at last, the cost of the data ﬂow is linear with the number of data to exchange
• These costs are greater than the cost of memory operations in RAM
• Better to copy some sparse data in a buffer and send the buffer, rather than send scattered data with multiple send
and receives
• Try to minimize the number of data exchange between processes
• The greater the ratio between number of computation instructions and messages to exchange, the better will be your
speed-up !
• Low speedup can be improved with non blocking data exchanges.

09/27/2023 X. JUVIGNY Parallel programming 7

Load balancing

Deﬁnition
All processes execute a computation section of the code with same duration ;

• Speedup is badly impacted if some parts of the code are far away from load balancing ;
• Example 1 : A function takes t seconds for the half of the processes, and t for other processes. The maximal speed-up
2
for this function will be :
n n t
t + 3
S(n) = 2 2 2
= n
t 4
• Example 2 : A function takes t
seconds for n − 1 processes, and t for one process. The maximal speed-up for this
2
function will be :
(n − 1) 2t + t n−1 n+1
S(n) = = +1=
t 2 2
Remark : Longer is the time taken to execute a bad load balancing function, greater the penalty. Don’t worry about load
balancing for functions taking very small time to execute

09/27/2023 X. JUVIGNY Parallel programming 8

Overview

2 Embarrassingly parallel algorithms

1 Performance tools
3 Nearly embarrassingly parallel algorithm

09/27/2023 X. JUVIGNY Parallel programming 9

Deﬁnition

Embarrassingly parallel algorithm

• Each data used and computed are independent ;
• No data race in multithread context ;
• No communication between processes in distributed environment

Property
• In distributed parallel context, no data must be exchanged between processes to compute the results ;
• In shared parallel environment, parallelization is straightforward, but beware to the memory bound computation ;
• In distributed environment, the memory bound limitation is not an issue ;
• If data is contiguous and algorithm vectorizable, can be ideal on GPGPU for performance.

09/27/2023 X. JUVIGNY Parallel programming 10

First example : Vector addition

Add two real vectors of dimensions N

3
w = u + v ; u; v ; w ∈ R

Ideas
• For load balancing, scatter the vectors in equal parts among the threads or processes
• Each process/thread computes a part of the vector addition
• In distributed memory, the result is scattered among processes !

Some properties
• Memory access and computing operation have the same complexity : On shared memory, memory bound limits the
performance
• On distributed memory, each process uses his own physical memory and no data must be exchanged : Speed-up may be
linear relative to the number of processes (if data intensity is enough)

09/27/2023 X. JUVIGNY Parallel programming 11

Example : Block diagonal matrices multiplication C = A:B (1)

Matrix-matrix multiplication C = A:B where

0 A 0 ::: 0 1 0 B 0 ::: 0 1
11 11
B .. C B .. C
B . C B . C
B 0 A22 0 C B 0 B22 0 C
A=B C;B = B C:
B . .. .. . C B . .. .. . C
@ . . . . A @ . . . . A
. . . .
0 ::: 0 Ann 0 ::: 0 Bnn

where di = dim(Aii ) = dim(Bii ) (n independent matrix-matrix multiplications)

Problematic
Close to the vector addition multiplication, but :
• Dimensions di of diagonal blocks are inhomogeneous & for each diagonal block, computation complexity : di3 .
• How to distribute diagonal blocks among processes to obtain nearly optimal load balancing ?

09/27/2023 X. JUVIGNY Parallel programming 12

Example : Block diagonal matrices multiplication C = A:B (2)

algorithm to distribute diagonal blocks among processes

Example of algorithm to distribute the diagonal blocks

• Sort diagonal blocks with decreasing dimension ;
• Set "weight" to zero for each process
• Distribute biggest triplet blocks Aii ; Bii ; Cii among processes and add each di in the weight of each process ;

• While some diagonal blocks are not distributed :

• Add the biggest block which is not distributed to the process having the smallest weight
• Add the relative di at the weight of the process

Remark : All processes compute the distribution of the diagonal blocks. It is better to do same computation for all processes, than
having process 0 compute the distribution and send it to other processes.

09/27/2023 X. JUVIGNY Parallel programming 13

Third example : Syracuse series (1)

Deﬁnition of Syracuse series

8
< u0 chosen
> ( un
2 if un is even
: un+1 =
>
3:un + 1 if un is odd

Property of Syracuse series

• One cycle exists : 1 → 4 → 2 → 1 → · · ·
• A conjecture : ∀u0 ∈ N, the series reaches the cycle above in a ﬁnite number of iterations

Some definitions
• Length of flight : number of iterations for a series to reach the value 1 ;
• Height of the flight : maximal value reached by a series

The goal of the program : compute the length and the height of ﬂight for a lot of (odd) values of u0

09/27/2023 X. JUVIGNY Parallel programming 14

Third example : Syracuse series (2)

Problematic
• Each process computes the length and the height for a subset of initial values u0 ;
• The computation intensity depends of the length of each Syracuse series ;
• It’s impossible to know the computation complexity of a series, prior to computing it
• The problem is not naturally well balanced ;
⇒ Use a dynamic algorithm on "root" process (the "master" process) to distribute series among other processes ("slaves")

Master’s Algorithm Slave’s Algorithm

• Send a small pack of series to each slave processes ; • While (receive some series to compute in a pack)
• While(some pack of series to send) do • Compute each series of the pack ;
• Wait slave asking series and send next pack ; • end While
• end While
• Send termination order to all slave processes ;

Remark : A task is composed of a pack of series to have a good granularity.

09/27/2023 X. JUVIGNY Parallel programming 15
Overview

2 Embarrassingly parallel algorithms

1 Performance tools
3 Nearly embarrassingly parallel algorithm

09/27/2023 X. JUVIGNY Parallel programming 16

Nearly embarrassingly parallel algorithm

Definition
Independent computation for each process with a final communication to finalize the computation.

Examples
• Dot product of two vectors in Rn ;
• Compute an integral ;
• Matrix-vector product ;

Non embarrassingly parallel algorithm examples

• Parallel sort algorithms ;
• Matrix-matrix product ;
• Algorithms based on domain decomposition methods ;

09/27/2023 X. JUVIGNY Parallel programming 17

Integral computation

Integral computation
• Integral computation based on Gauss quadrature formulae :

Z b Ng
X
f (x)dx ≈ !i f (gi )2
a i=1

where !i ∈ R are the weights and gi ∈ R the integration points.

• In fact, Gauss quadrature are given on [−1; 1] interval : some variable modiﬁcation to do in the integral !;
• {g1 = 0; !1 = 2} : Order 1 Legendre Gauss quadrature ;
n“ √ ” ` ´ “ √ ”o
• g1 = − 22 ; !1 = 59 ; g2 = 0; !2 = 89 ; g3 = + 22 ; !3 = 59 : Order 3 Legendre Gauss quadrature
• Remark : Order n means that the quadrature computes the exact value of the integral for polynomials of degree less or
equal to n.
• To compute better approximation of the integral, we subdivide the interval in several smaller intervals

09/27/2023 X. JUVIGNY Parallel programming 18

Parallel integral computation

Z b N Z
X bi N
X
I= f (x)dx = f (x)dx = Ii where a1 = a; aN+1 = bN = b and ai < bi = ai+1
a i=1 ai i=1

Main ideas
• Scatter sub-intervals among the processes P to compute partial sums :

X Z bi
Sp = f (x)dx
[ai ;bi ]∈P ai

• Use reduce to compute the integral value (global sum) :

nbp
X
S= Sp
p=1

09/27/2023 X. JUVIGNY Parallel programming 19

Matrix-vector product

Let A ∈ Rn×m be a matrix and u ∈ Rm a vector.

The goal of this algorithm is to compute the matrix-vector product :

m
X
n
v = A:u ∈ R where vi = Aij :uj
j=1

Two possibilities to parallelize this algorithm :

• Partitioning the matrix by block of rows :
• Partitioning the matrix by block of columns and the vector u by block of same size.
The goal is to split the computation between processes and use a global communication operation to get the ﬁnal result.

09/27/2023 X. JUVIGNY Parallel programming 20

Matrix-vector product by rows splitting

Let 0 1
A1
B A2 C
B C
B . C
B . C
B . C n
A=B C where ∀I ∈ {1; 2; : : : ; N} ; AI ∈ R N ×m :
B AI C
B C
B . C
B . C
@ . A
AN

Algorithm
• Each process has some rows of A and all of u
n
• Each process computes a part of v : the process I computes VI = AI :u ∈ R N
• To compute another matrix-vector product with the new vector, we need to gather the vector in all processes (only
necessary for distributed parallel algorithm).

09/27/2023 X. JUVIGNY Parallel programming 21

Matrix-vector product by columns splitting

Let 0 1
U1
B U2 C
B C
B . C
B . C
B . C m m
A = (A1 |A2 | : : : |AI | : : : |AN ) and u = B C where ∀I ∈ {1; 2; : : : ; N} ; AI ∈ Rn× N and UI ∈ R N
B UI C
B C
B . C
B . C
@ . A
UN

Algorithm
• Each process has some columns of A and some rows of u
• Each process computes a partial sum for v . Process I computes

n
VI = AI :UI ∈ R

N
X
• Finally, a sum reduction is done to get the ﬁnal result : v = VI
I=1

09/27/2023 X. JUVIGNY Parallel programming 22

Buddha set

Let’s consider the complex recursive Mandelbrot series :

ȷ
z0 = 0;
zn+1 = zn2 + c where c ∈ C chosen.
Figure – Mandelbrot (left) and Buddha (right) set

Property Mandelbrot and Buddha sets

• Series is divergent if ∃n > 0; |zn | > 2 ; • Mandelbrot’s set :
• Region of interest : the disk D of radius 2 ; color c with "divergence speed" of relative series
• Buddha’s set :
• In some region of the disk, possible to prove convergence ;
Color orbit of divergent series
• But chaotic convergence behaviour in some region of D !

09/27/2023 X. JUVIGNY Parallel programming 23

Buddha’s set algorithm

Algorithm
• Draw N random values of c in the disk D where the relative series diverge ;
• Compute the orbit of this series until divergence and increment the intensity of the
pixel representing each value of the orbit ;

Parallelization of the algorithm

• Master-slave algorithm to ensure load balancing ;
• For granularity, deﬁne a task as a pack of random values c ;

09/27/2023 X. JUVIGNY Parallel programming 24

5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
JaJa Parallel - Algorithms Intro
50% (2)
JaJa Parallel - Algorithms Intro
45 pages
Lecture 10
No ratings yet
Lecture 10
125 pages
Unit 4 HPC
No ratings yet
Unit 4 HPC
82 pages
UNIT-2 Parallel Programming Challenges
No ratings yet
UNIT-2 Parallel Programming Challenges
32 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
OpenACC 1
No ratings yet
OpenACC 1
44 pages
Nscet E-Learning Presentation: Listen Learn Lead
No ratings yet
Nscet E-Learning Presentation: Listen Learn Lead
51 pages
Unit 4
No ratings yet
Unit 4
64 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
No ratings yet
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
51 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
HPC Scaling
No ratings yet
HPC Scaling
56 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Slides
No ratings yet
Slides
44 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Parallel Computing
No ratings yet
Parallel Computing
30 pages
How To Parallelise An Application
No ratings yet
How To Parallelise An Application
30 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
HPC 4th Unit - 240504 - 160030
No ratings yet
HPC 4th Unit - 240504 - 160030
19 pages
Take PHP Quiz & Online Test To Test Your Knowledge
0% (1)
Take PHP Quiz & Online Test To Test Your Knowledge
8 pages
Week 7
No ratings yet
Week 7
27 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
Dynamic Prog Final Paper
No ratings yet
Dynamic Prog Final Paper
7 pages
Unit 4 HPC Part2
No ratings yet
Unit 4 HPC Part2
18 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
Unit 2 Performance Evaluations: Structure Nos
No ratings yet
Unit 2 Performance Evaluations: Structure Nos
18 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
OOAD
No ratings yet
OOAD
67 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
Cours 2
No ratings yet
Cours 2
25 pages
Week 7
No ratings yet
Week 7
27 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Parallel Algorithm and Programming
No ratings yet
Parallel Algorithm and Programming
4 pages
Dis Top Tim Notes 1
No ratings yet
Dis Top Tim Notes 1
3 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
07 Parallel Algorithms in Parallel and Distributed Computing
No ratings yet
07 Parallel Algorithms in Parallel and Distributed Computing
13 pages
Lenovo IdeaPad 3 14IGL05 - LCFC GS453 GS53 NM-C961 Schematic
No ratings yet
Lenovo IdeaPad 3 14IGL05 - LCFC GS453 GS53 NM-C961 Schematic
60 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Parallel Models of Computation
No ratings yet
Parallel Models of Computation
3 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
Automata
88% (8)
Automata
102 pages
Microprocessor
No ratings yet
Microprocessor
626 pages
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
No ratings yet
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
114 pages
Chapter 2 Multimedia Authoring
No ratings yet
Chapter 2 Multimedia Authoring
7 pages
How To Setup An FTP Site
No ratings yet
How To Setup An FTP Site
3 pages
Ex 4 Devops
No ratings yet
Ex 4 Devops
11 pages
SOFTWARE
No ratings yet
SOFTWARE
70 pages
Service Fabric
No ratings yet
Service Fabric
789 pages
Lesson 7 Introduction To Networks
No ratings yet
Lesson 7 Introduction To Networks
20 pages
VSAM Interview Questions
No ratings yet
VSAM Interview Questions
15 pages
CURSORS
No ratings yet
CURSORS
5 pages
Evolved Packet Core EPC in 4G and 5G
No ratings yet
Evolved Packet Core EPC in 4G and 5G
9 pages
Iot Ia 1
No ratings yet
Iot Ia 1
37 pages
Cellular Technology
No ratings yet
Cellular Technology
21 pages
PhaniSaiBhogadi AWS DEVOPS
No ratings yet
PhaniSaiBhogadi AWS DEVOPS
2 pages
Logjjaja
No ratings yet
Logjjaja
8 pages
10G SFPP Optical Transceivers Data Sheet by JTOPTICS
No ratings yet
10G SFPP Optical Transceivers Data Sheet by JTOPTICS
9 pages
SPEAKING MICROCONTROLLER FOR DEAF AND DUMB - Electronicsprojects
No ratings yet
SPEAKING MICROCONTROLLER FOR DEAF AND DUMB - Electronicsprojects
14 pages
Cloudwatch 2
No ratings yet
Cloudwatch 2
11 pages
BhavaniChavala 3197120 - 05 01 - 1
No ratings yet
BhavaniChavala 3197120 - 05 01 - 1
4 pages
Archer MR400 (EU) - V1 - QIG
No ratings yet
Archer MR400 (EU) - V1 - QIG
2 pages
SQL Performance Tunning
No ratings yet
SQL Performance Tunning
11 pages
OKdo Raspberry Pi 3 Started Kit Datasheet
No ratings yet
OKdo Raspberry Pi 3 Started Kit Datasheet
2 pages
Synology DS411j Data Sheet Enu
No ratings yet
Synology DS411j Data Sheet Enu
2 pages
New Xmarto Eseecloud Cms For Windows Computer (Supports VR Cameras) - Cms Version 2.0.4
No ratings yet
New Xmarto Eseecloud Cms For Windows Computer (Supports VR Cameras) - Cms Version 2.0.4
8 pages
Remote Procedure Call
No ratings yet
Remote Procedure Call
6 pages
OSR00237 - RTOT - RTOT - PFT - DWG - 01 - V00-Rev A
No ratings yet
OSR00237 - RTOT - RTOT - PFT - DWG - 01 - V00-Rev A
1 page
Quickly Test RS-232 Signals On DB9 Ports
No ratings yet
Quickly Test RS-232 Signals On DB9 Ports
2 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

Cours 2

Uploaded by

Cours 2

Uploaded by

Parallel performance measure and embarrassingly

Xavier JUVIGNY, SN2A, DAAA, ONERA

2 Embarrassingly parallel algorithms

09/27/2023 X. JUVIGNY Parallel programming 1

2 Embarrassingly parallel algorithms

09/27/2023 X. JUVIGNY Parallel programming 2

09/27/2023 X. JUVIGNY Parallel programming 3

Give a limit for the speed-up

Limitation of the law

09/27/2023 X. JUVIGNY Parallel programming 4

Speed-up behaviour with constant volume input data per process

09/27/2023 X. JUVIGNY Parallel programming 5

How to evaluate the scalability ?

09/27/2023 X. JUVIGNY Parallel programming 6

• Sending and receiving data is prohibitive :

09/27/2023 X. JUVIGNY Parallel programming 7

09/27/2023 X. JUVIGNY Parallel programming 8

2 Embarrassingly parallel algorithms

09/27/2023 X. JUVIGNY Parallel programming 9

Embarrassingly parallel algorithm

09/27/2023 X. JUVIGNY Parallel programming 10

Add two real vectors of dimensions N

09/27/2023 X. JUVIGNY Parallel programming 11

Matrix-matrix multiplication C = A:B where

where di = dim(Aii ) = dim(Bii ) (n independent matrix-matrix multiplications)

09/27/2023 X. JUVIGNY Parallel programming 12

algorithm to distribute diagonal blocks among processes

Example of algorithm to distribute the diagonal blocks

• While some diagonal blocks are not distributed :

09/27/2023 X. JUVIGNY Parallel programming 13

Deﬁnition of Syracuse series

Property of Syracuse series

09/27/2023 X. JUVIGNY Parallel programming 14

Master’s Algorithm Slave’s Algorithm

Remark : A task is composed of a pack of series to have a good granularity.

2 Embarrassingly parallel algorithms

09/27/2023 X. JUVIGNY Parallel programming 16

Non embarrassingly parallel algorithm examples

09/27/2023 X. JUVIGNY Parallel programming 17

where !i ∈ R are the weights and gi ∈ R the integration points.

09/27/2023 X. JUVIGNY Parallel programming 18

• Use reduce to compute the integral value (global sum) :

09/27/2023 X. JUVIGNY Parallel programming 19

Let A ∈ Rn×m be a matrix and u ∈ Rm a vector.

Two possibilities to parallelize this algorithm :

09/27/2023 X. JUVIGNY Parallel programming 20

09/27/2023 X. JUVIGNY Parallel programming 21

09/27/2023 X. JUVIGNY Parallel programming 22

Let’s consider the complex recursive Mandelbrot series :

Property Mandelbrot and Buddha sets

09/27/2023 X. JUVIGNY Parallel programming 23

Parallelization of the algorithm

09/27/2023 X. JUVIGNY Parallel programming 24

You might also like