0% found this document useful (0 votes)

1 views24 pages

Slides

The document discusses multi-threaded vector addition and parallel computation concepts, focusing on data parallel problems and the use of OpenMP for parallelizing loops. It provides examples of vector addition in both serial and parallel forms, explaining the fork-join model and the implications of thread management. Additionally, it covers the parallelization of nested loops using the Mandelbrot set as a case study, highlighting common pitfalls and solutions in achieving correct and efficient parallel execution.

Uploaded by

fajeri7083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views24 pages

Slides

Uploaded by

fajeri7083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Overview

Multi-threaded vector addition

Nested loops in parallel
Summary and next lecture

COMP3221 Parallel Computation

David Head

University of Leeds

Lecture 3: Data parallel problems

David Head COMP3221 Parallel Computation

Overview
Multi-threaded vector addition Previous lectures
Nested loops in parallel Today’s lecture
Summary and next lecture

Previous lectures

In the last lecture we started looking at shared memory

parallelism (SMP):
Relevant to multi-core CPUs.
Separate processing units (cores) share some levels of
memory cache.
Various frameworks for programming SMP systems.
Widely-implemented standard: OpenMP.

David Head COMP3221 Parallel Computation

Overview
Multi-threaded vector addition Previous lectures
Nested loops in parallel Today’s lecture
Summary and next lecture

Today’s lecture

Today we are going to look at a some actual problems.

Examples of a data parallel problems, where the same
operation is applied to multiple data elements.
Also known as a map1 .
Multi-threading solution employs a fork-join pattern.
How to parallelise nested loops.
Parallel code can be non-deterministic, even when the serial
code is deterministic.

1
McCool et al., Structured parallel programming (Morgan-Kaufman, 2012).

David Head COMP3221 Parallel Computation

Overview Vector addition in serial
Multi-threaded vector addition Vector addition in parallel
Nested loops in parallel Thread-level description
Summary and next lecture Data parallel and embarrassingly parallel

Vector addition

An n-vector a can be thought of as an array of n numbers:

a = (a1 , a2 , . . . , an ).
If two vectors a and b are the same size, they can be added to
generate a new n-vector c:

a=( a1 , a2 , a3 , ..., an )
+ + + + +
b=( b1 , b2 , b3 , ... bn )
↓ ↓ ↓ ↓ ↓
c=( c1 , c2 , c3 ..., cn )

Or:
ci = ai + b i , i = 1 . . . n.

David Head COMP3221 Parallel Computation

Serial vector addition

Code on Minerva: vectorAddition serial.c

1 # define n 100
2
3 int main ()
4 {
5 float a [ n ] , b [ n ] , c [ n ];
6
7 ... // Initialise a [ n ] and b [ n ]
8
9 int i ;
10 for ( i =0; i < n ; i ++ )
11 c [ i ] = a [ i ] + b [ i ];
12
13 return 0;
14 }

Note that indices usually start at 0 for most languages, but 1 for
the usual mathematical notation (also FORTRAN, MATLAB).

David Head COMP3221 Parallel Computation

Vector addition in parallel

Code on Minerva: vectorAddition parallel.c

Add #pragma omp parallel for just before the loop:

1 # define n 100
2
3 int main ()
4 {
5 float a [ n ] , b [ n ] , c [ n ];
6
7 ... // Initialise a [ n ] and b [ n ]
8
9 int i ;
10 # pragma omp parallel for
11 for ( i =0; i < n ; i ++ )
12 c [ i ] = a [ i ] + b [ i ];
13
14 return 0;
15 }

This only parallelises this one loop, not any later ones!
David Head COMP3221 Parallel Computation
Overview Vector addition in serial
Multi-threaded vector addition Vector addition in parallel
Nested loops in parallel Thread-level description
Summary and next lecture Data parallel and embarrassingly parallel

Fork-and-join
When the executable reaches #pragma omp parallel for, it
spawns multiple threads.
Each thread computes part of the loop.
The extra threads are destroyed at the end of the loop.
This is known as a fork-join construct:
Fork: Main
Main thread spawns
thread worker threads
(serial)
#pragma omp parallel for
for() Main and
{ worker
... threads perform
} the loop

Main Join: Main

thread thread waits until
(serial) workers finish

David Head COMP3221 Parallel Computation

Example: Four threads in total

Pseudocode for the main thread:
1 // Main thread starts in serial
2 // Initialise arrays a , b ; allocate c .
3 ...
4 // REACHES # pragma omp parallel for
5 // FORK : Create three new threads .
6 worker1 = fork (...) ;
7 worker2 = fork (...) ;
8 worker3 = fork (...) ;
9
10 // Perform 1/4 of the total loop .
11 for ( i =0; i < n /4; i ++ )
12 c [ i ] = a [ i ] + b [ i ];
13
14 // JOIN : Wait for other threads to finish .
15 worker1 . join () ;
16 worker2 . join () ;
17 worker3 . join () ;
18
19 // Continue in serial after the loop

David Head COMP3221 Parallel Computation

Worker thread 1:
1 // CREATED BY MAIN ( ‘ fork ’)
2 // Perform second 1/4 of loop .
3 for ( i = n /4; i < n /2; i ++ ) c [ i ] = a [ i ] + b [ i ];
4 // FINISH ( ‘ join ’)

Worker thread 2:
1 // CREATED BY MAIN ( ‘ fork ’)
2 // Perform third 1/4 of loop .
3 for ( i = n /2; i <3* n /4; i ++ ) c [ i ] = a [ i ] + b [ i ];
4 // FINISH ( ‘ join ’)

Worker thread 3:
1 // CREATED BY MAIN ( ‘ fork ’)
2 // Perform final 1/4 of loop .
3 for ( i =3* n /4; i < n ; i ++ ) c [ i ] = a [ i ] + b [ i ];
4 // FINISH ( ‘ join ’)

David Head COMP3221 Parallel Computation

Notes

The four threads are not being executed one after the other:
Each thread runs concurrently, hopefully on separate cores,
i.e., in parallel.
Cannot be understood in terms of serial programming
concepts.

Each thread performs the same operations on diﬀerent data.

Would be SIMD in Flynn’s taxonomy, except this is
implemented in software on a MIMD device.

Have assumed n is divisible by the number of threads for clarity.

Generalising to arbitrary n is not diﬃcult, but obscures the
parallel aspects.

David Head COMP3221 Parallel Computation

#pragma omp parallel for

The total loop range was evenly divided between all threads.
Happens as soon as #pragma omp parallel for reached.
The trip count (i.e. loop range) must be known at the start
of the loop.
The start, end and stride must be constant.
Cannot break from the loop.
Cannot apply to ‘while. . . do’ or ‘do. . . while’ loops

David Head COMP3221 Parallel Computation

Data parallel and embarrassingly parallel

This is an example of a data parallel problem or a map:

Array elements distributed evenly over the threads.
Same operation performed on all elements.
Suitable for the SIMD model.

In fact, this example is so straightforward to parallelise that is also

sometimes referred to as an embarrassingly parallel problem.
Easy to get working correctly in parallel.
May still be a challenge to achieve good parallel performance.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

Mandelbrot set generator

Code on Minerva: Mandelbrot.c, makefile

Classic computationally intensive problem in two dimensions that

used to be used as a benchmark for processor speeds:

Loops over pixels, i.e. a

two dimensional, nested
double loop.
Colour of each pixel
calculated independently
of all other pixels.
Each colour calculation
requires many floating
point operations.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

Code snippet

The part of the code that interests us here is shown below:

1 // Change the colour arrays for the whole image .
2 int i , j ;
3 for ( j =0; j < numPixels_y ; j ++ )
4 for ( i =0; i < numPixels_x ; i ++ )
5 {
6 // Set the colour of pixel (i , j ) , i . e . modify the values
of red [ i ][ j ] , green [ i ][ j ] , and / or blue [ i ][ j ].
7 setPixelColour ( i , j ) ;
8 }

Note the i-loop is nested inside the j-loop.

The graphical output is performed in OpenGL/GLFW. A simple

makefile has been included that should work on school machines.
You may need to modify this for your system’s OpenGL/GLFW.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

What setPixelColour does

Purely for background interest, here’s how the colours are
calculated:
Each pixel i,j is converted to floating point numbers cx ,
cy , both in the range -2 to 2.
Two other floats zx and zy are initialised to zero.
The following iteration1 is performed until zx2 + zy2 ≥ 4, or a
maximum number of iterations maxIters is reached:

(zx , zy ) → (zx2 − zy2 + cx , 2zx zy + cy )

The colour is selected based on the number of iterations.

1
More concisely represented as complex numbers c and z [with e.g.
zx = ℜ(z)], then the iteration is just z → z 2 + c.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

Parallel Mandelbrot: First attempt

Parallelise only the inner loop.
1 int i , j ;
2 for ( j =0; j < numPixels_y ; j ++ )
3 # pragma omp parallel for
4 for ( i =0; i < numPixels_x ; i ++ )
5 {
6 setPixelColour ( i , j ) ;
7 }

This works, but is not much faster than serial – and may even be
slower (check on your system).
Multiple possibilities for this:
The fork-join is inside the j-loop, so threads are created and
destroyed numPixels y times, which incurs an overhead.
This problem suﬀers from poor load balancing; see later.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

Parallel Mandelbrot: Second attempt

Parallelise only the outer loop, so there is only a single fork event
and a single join event.
1 int i , j ;
2 # pragma omp parallel for
3 for ( j =0; j < numPixels_y ; j ++ )
4 for ( i =0; i < numPixels_x ; i ++ )
5 {
6 setPixelColour ( i , j ) ;
7 }

This is faster . . . but wrong!

A distorted image results.
The distortion is diﬀerent each time the program is
executed.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

The same variable i for the inner loop counter is being updated
by all threads:
When one thread completes a calculation, it increments i.
Therefore other threads will skip at least one pixel.
Threads do not calculate the full line of pixels.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

Parallel Mandelbrot: Third attempt

Make the inner loop variable i private to each thread:
1 int j ;
2 # pragma omp parallel for
3 for ( j =0; j < numPixels_y ; j ++ )
4 {
5 int i ;
6 for ( i =0; i < numPixels_x ; i ++ )
7 {
8 setPixelColour ( i , j ) ;
9 }
10 }

. . . or (for compilers following the C99 standard):

1 # pragma omp parallel for
2 for ( int j =0; j < numPixels_y ; j ++ )
3 for ( int i =0; i < numPixels_x ; i ++ )
4 {
5 setPixelColour ( i , j ) ;
6 }

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

The private clause

A third way to solve this is to use OpenMP’s private clause:

1 int i , j ;
2 # pragma omp parallel for private ( i )
3 for ( j =0; j < numPixels_y ; j ++ )
4 for ( i =0; i < numPixels_x ; i ++ )
5 {
6 setPixelColour ( i , j ) ;
7 }

Creates a copy of i for each thread.

Multiple variables may be listed, e.g. private(i,a,b,c)

The code now works, but is still not much faster than serial.
The primary overhead is poor load balancing. We will look
at this next lecture briefly, and detail in Lecture 13.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

The collapse clause

The collapse clause replaces 2 or more nested loops with a single
loop, at the expense of additional internal calculations.
1 # pragma omp parallel for collapse (2)
2 for ( int j =0; j < numPixels_y ; j ++ )
3 for ( int i =0; i < numPixels_x ; i ++ )
4 setPixelColour ( i , j ) ;

is equivalent to (but more readable than)

1 # pragma omp parallel for
2 for ( int k = 0; k < numPixels_x * numPixels_y ; k ++ )
3 {
4 int
5 i = k % numPixels_x ,
6 j = k / numPixels_x ;
7 setPixelColour ( i , j ) ;
8 }

This is principally intended for short loops that cannot be equally

distributed across all threads.
David Head COMP3221 Parallel Computation
Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

Determinism and non-determinism

Notice that the incorrect images were slightly diﬀerent each time:

e.g. 1. e.g. 2.

The pixels plotted depend on the order in which threads update

the shared variable i, which depends on the thread scheduler.
Will be influenced by factors outside our control.
e.g. the various background tasks that every OS must run.

David Head COMP3221 Parallel Computation

Overview Worked example: Mandelbrot set
Multi-threaded vector addition Parallelising nested loops
Nested loops in parallel The collapse clause
Summary and next lecture Determinsim and non-determinism

Our serial code was deterministic, i.e. produced the same results
each time it was run.

By contrast, our (incorrect) parallel code was non-deterministic.

Often this is the result of an error, but can sometimes be useful:

Some algorithms, often in science and engineering, do not care
about non-deterministic errors as long as they are small.
Strictly imposing determinism may result in additional
overheads and performance loss.
However, for this module we will try to develop parallel algorithms
whose results match that of the serial equivalent.

David Head COMP3221 Parallel Computation

Overview
Multi-threaded vector addition
Summary and next lecture
Nested loops in parallel
Summary and next lecture

Summary and next lecture

Today we have look at data parallel problems or maps, where the

same operation is applied to multiple data members.
Distribute data evenly across threads.
Sometimes referred to as embarrassingly parallel.
In two lectures time we will start looking at more complex problems
for which the calculations on diﬀerent threads are not independent.

Before then, we need to learn the vocabulary of parallel theory,

which is the topic of next lecture.

David Head COMP3221 Parallel Computation

Java For Beginners IB
No ratings yet
Java For Beginners IB
139 pages
Programming Parallelism: by Kelvin Chou
No ratings yet
Programming Parallelism: by Kelvin Chou
27 pages
CP4292 Multicore Architecture Lab Manual
No ratings yet
CP4292 Multicore Architecture Lab Manual
36 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Lab 7
No ratings yet
Lab 7
3 pages
Cray-1 (1976) : The World's Most Expensive Love Seat
No ratings yet
Cray-1 (1976) : The World's Most Expensive Love Seat
18 pages
Excelente
No ratings yet
Excelente
64 pages
Daa 1
No ratings yet
Daa 1
40 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Comp422 2011 Lecture8 UPC
No ratings yet
Comp422 2011 Lecture8 UPC
44 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
Openmp
No ratings yet
Openmp
115 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
No ratings yet
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
31 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
Assembly #4
No ratings yet
Assembly #4
3 pages
CSE524sp10 01
No ratings yet
CSE524sp10 01
62 pages
Ecole Militaire Polytechnique: Content
No ratings yet
Ecole Militaire Polytechnique: Content
16 pages
4 Performance.4x
No ratings yet
4 Performance.4x
14 pages
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
100% (1)
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
15 pages
Open MP1551363136163
No ratings yet
Open MP1551363136163
29 pages
Omp Hands On
No ratings yet
Omp Hands On
200 pages
Parallel Programming Using OpenMP
No ratings yet
Parallel Programming Using OpenMP
76 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Worksharing and Parallel Loops
No ratings yet
Worksharing and Parallel Loops
23 pages
CS4230 Parallel Programming Introduction To Parallel Algorithms
No ratings yet
CS4230 Parallel Programming Introduction To Parallel Algorithms
25 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Group A Assignment No: 3: Title of The Assignment
No ratings yet
Group A Assignment No: 3: Title of The Assignment
5 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Homework #4 El6201 - Parallel System: 1 Openmp Matrix Addition
No ratings yet
Homework #4 El6201 - Parallel System: 1 Openmp Matrix Addition
6 pages
Shared Memory and Accelerators
No ratings yet
Shared Memory and Accelerators
88 pages
PS3 Programming Basics: Week 1. SIMD Programming On PPE Materials Are Adapted From The Textbook
No ratings yet
PS3 Programming Basics: Week 1. SIMD Programming On PPE Materials Are Adapted From The Textbook
37 pages
W8L2 OpenMP7 Tasks
No ratings yet
W8L2 OpenMP7 Tasks
21 pages
Lab Manual
No ratings yet
Lab Manual
31 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
410A Week 5
No ratings yet
410A Week 5
23 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
PDSOpen MP
No ratings yet
PDSOpen MP
22 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Intro To CUDA
No ratings yet
Intro To CUDA
16 pages
HPC Lab Manual
No ratings yet
HPC Lab Manual
31 pages
Lab 2 Threads
No ratings yet
Lab 2 Threads
6 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
Cse 4001-Parallel and Distributed Computing Lab Digital Assessment-1 Name: Avulapati Anusha REG - NO: 17BCE0435
No ratings yet
Cse 4001-Parallel and Distributed Computing Lab Digital Assessment-1 Name: Avulapati Anusha REG - NO: 17BCE0435
5 pages
Module 2 - New 1
No ratings yet
Module 2 - New 1
72 pages
Data-Level Parallelism Vector and GPU
No ratings yet
Data-Level Parallelism Vector and GPU
6 pages
Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center
No ratings yet
Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center
17 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Bare Metal C: Embedded Programming for the Real World
From Everand
Bare Metal C: Embedded Programming for the Real World
Stephen Oualline
No ratings yet
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
Manish Soni
No ratings yet
Beginning Software Engineering
From Everand
Beginning Software Engineering
Rod Stephens
4.5/5 (2)
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
Hospital Management System
94% (53)
Hospital Management System
87 pages
Dot Net Introduction Training
No ratings yet
Dot Net Introduction Training
12 pages
User'S Manual: Revision 1.1c
No ratings yet
User'S Manual: Revision 1.1c
93 pages
(Spec) Se-4000
No ratings yet
(Spec) Se-4000
2 pages
TVersity Media Server - Quick Start Guide
No ratings yet
TVersity Media Server - Quick Start Guide
10 pages
FG Module
No ratings yet
FG Module
104 pages
Soft-Assistant (EN)
No ratings yet
Soft-Assistant (EN)
32 pages
Jntu 2025 Question Bank For Operating System
No ratings yet
Jntu 2025 Question Bank For Operating System
7 pages
VHDL TDC
No ratings yet
VHDL TDC
5 pages
Implementation of Smart Child Rescue Robot From Borewell Using Arm Mechanism
100% (1)
Implementation of Smart Child Rescue Robot From Borewell Using Arm Mechanism
37 pages
Network Administrator Project 2
No ratings yet
Network Administrator Project 2
18 pages
2010 Firmware Upgrade Instruction For LCD TV T-VALAUSC
No ratings yet
2010 Firmware Upgrade Instruction For LCD TV T-VALAUSC
4 pages
Perdix: The Iptables Query Language
No ratings yet
Perdix: The Iptables Query Language
68 pages
Motherboard Manual Ga-965gm-S2 e
No ratings yet
Motherboard Manual Ga-965gm-S2 e
88 pages
PANZURA - CCL - File - Gateway Poweredby - Panzura - Whitepaper
No ratings yet
PANZURA - CCL - File - Gateway Poweredby - Panzura - Whitepaper
14 pages
8-2 Installing WebMethods Products
No ratings yet
8-2 Installing WebMethods Products
154 pages
CS50 For AP Computer Science Principles: Curricular Requirements 2 Course Syllabus 4
No ratings yet
CS50 For AP Computer Science Principles: Curricular Requirements 2 Course Syllabus 4
21 pages
Fundamental Guide To Industrial Networking: Jeff Kordik
No ratings yet
Fundamental Guide To Industrial Networking: Jeff Kordik
52 pages
Pengaturcaraan Komputer
No ratings yet
Pengaturcaraan Komputer
6 pages
Basic Java Programming Tutorial
No ratings yet
Basic Java Programming Tutorial
18 pages
Grainstrain Manual 1 1
No ratings yet
Grainstrain Manual 1 1
10 pages
Mavric-Iib: Mega AVR Integrated Controller II Revision B Technical Manual
No ratings yet
Mavric-Iib: Mega AVR Integrated Controller II Revision B Technical Manual
17 pages
Ict Grade 7
No ratings yet
Ict Grade 7
95 pages
Katalog KEB C5 2010 (En)
No ratings yet
Katalog KEB C5 2010 (En)
24 pages
Amswms Code 0782b
No ratings yet
Amswms Code 0782b
3 pages
Nemo
No ratings yet
Nemo
35 pages
Computer Viruses
No ratings yet
Computer Viruses
19 pages
Plain J2SE Adapter Engine: Release 640
No ratings yet
Plain J2SE Adapter Engine: Release 640
87 pages
The Fullerene Isomer Database
No ratings yet
The Fullerene Isomer Database
49 pages

Slides

Uploaded by

Slides

Uploaded by

Overview

Multi-threaded vector addition

COMP3221 Parallel Computation

Lecture 3: Data parallel problems

David Head COMP3221 Parallel Computation

In the last lecture we started looking at shared memory

David Head COMP3221 Parallel Computation

Today we are going to look at a some actual problems.

David Head COMP3221 Parallel Computation

An n-vector a can be thought of as an array of n numbers:

David Head COMP3221 Parallel Computation

Serial vector addition

David Head COMP3221 Parallel Computation

Vector addition in parallel

Add #pragma omp parallel for just before the loop:

Main Join: Main

David Head COMP3221 Parallel Computation

Example: Four threads in total

David Head COMP3221 Parallel Computation

David Head COMP3221 Parallel Computation

Each thread performs the same operations on diﬀerent data.

Have assumed n is divisible by the number of threads for clarity.

David Head COMP3221 Parallel Computation

#pragma omp parallel for

David Head COMP3221 Parallel Computation

Data parallel and embarrassingly parallel

This is an example of a data parallel problem or a map:

In fact, this example is so straightforward to parallelise that is also

David Head COMP3221 Parallel Computation

Mandelbrot set generator

Classic computationally intensive problem in two dimensions that

Loops over pixels, i.e. a

David Head COMP3221 Parallel Computation

The part of the code that interests us here is shown below:

Note the i-loop is nested inside the j-loop.

The graphical output is performed in OpenGL/GLFW. A simple

David Head COMP3221 Parallel Computation

What setPixelColour does

(zx , zy ) → (zx2 − zy2 + cx , 2zx zy + cy )

The colour is selected based on the number of iterations.

David Head COMP3221 Parallel Computation

Parallel Mandelbrot: First attempt

David Head COMP3221 Parallel Computation

Parallel Mandelbrot: Second attempt

This is faster . . . but wrong!

David Head COMP3221 Parallel Computation

David Head COMP3221 Parallel Computation

Parallel Mandelbrot: Third attempt

. . . or (for compilers following the C99 standard):

David Head COMP3221 Parallel Computation

The private clause

A third way to solve this is to use OpenMP’s private clause:

Creates a copy of i for each thread.

David Head COMP3221 Parallel Computation

The collapse clause

is equivalent to (but more readable than)

This is principally intended for short loops that cannot be equally

Determinism and non-determinism

The pixels plotted depend on the order in which threads update

David Head COMP3221 Parallel Computation

By contrast, our (incorrect) parallel code was non-deterministic.

Often this is the result of an error, but can sometimes be useful:

David Head COMP3221 Parallel Computation

Summary and next lecture

Today we have look at data parallel problems or maps, where the

Before then, we need to learn the vocabulary of parallel theory,

David Head COMP3221 Parallel Computation

You might also like