0% found this document useful (0 votes)

32 views23 pages

Open MP2362 HHDHD

The document discusses various tips, tricks and common issues related to using OpenMP directives for parallel programming. It covers topics such as avoiding typos in directives, writing code that works with and without OpenMP, optimizing loop scheduling, avoiding shared data conflicts, debugging tools and more.

Uploaded by

rezaeem373373

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views23 pages

Open MP2362 HHDHD

Uploaded by

rezaeem373373

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

OPENMP

TIPS, TRICKS AND

GOTCHAS
Mark Bull
EPCC, University of Edinburgh (and OpenMP
ARB)
[email protected]
OpenMPCon 2015 2

Directives

• Mistyping the sentinel (e.g. !OMP or #pragma opm )

typically raises no error message.

• Be careful!

• Extra nasty if it is e.g. #pragma opm atomic – race condition!

• Write a script to search your code for your common typos

OpenMPCon 2015 3

Writing code that works without OpenMP too

• The macro _OPENMP is defined if code is compiled with
the OpenMP switch.
• You can use this to conditionally compile code so that it works with
and without OpenMP enabled.

• If you want to link dummy OpenMP library routines into

sequential code, there is code in the standard you can
copy (Appendix A in 4.0)
OpenMPCon 2015 4

Parallel regions
• The overhead of executing a parallel region is typically in the
tens of microseconds range
• depends on compiler, hardware, no. of threads
• The sequential execution time of a section of code has to be
several times this to make it worthwhile parallelising.
• If a code section is only sometimes long enough, use the if
clause to decide at runtime whether to go parallel or not.
• Overhead on one thread is typically much smaller (<1µs).
• You can use the EPCC OpenMP microbenchmarks to do
detailed measurements of overheads on your system.
• Download from www.epcc.ed.ac.uk/research/computing/
performance-characterisation-and-benchmarking
OpenMPCon 2015 5

Is my loop parallelisable?
• Quick and dirty test for whether the iterations of a loop are
independent.
• Run the loop in reverse order!!
• Not infallible, but counterexamples are quite hard to construct.
OpenMPCon 2015 6

Loops and nowait • This is safe so long as the

number of iterations in the
#pragma omp parallel two loops and the
{ schedules are the same
#pragma omp for schedule(static) nowait
(must be static, but you
for(i=0;i<N;i++){
a[i] = .... can specify a chunksize)
} • Guaranteed to get same
#pragma omp for schedule(static) mapping of iterations to
for(i=0;i<N;i++){
... = a[i] threads.
}
}
OpenMPCon 2015 7

Default schedule
• Note that the default schedule for loops with no schedule
clause is implementation defined.
• Doesn’t have to be STATIC.
• In practice, in all implementations I know of, it is.
• Nevertheless you should not rely on this!
• Also note that SCHEDULE(STATIC) does not completely
specify the distribution of loop iterations.
• don’t write code that relies on a particular mapping of iterations to
threads
OpenMPCon 2015 8

Tuning the chunksize

• Tuning the chunksize for static or dynamic schedules can be
tricky because the optimal chunksize can depend quite
strongly on the number of threads.

• It’s often more robust to tune the number of chunks per thread
and derive the chunksize from that.
• chunksize expression does not have to be a compile-time constant
OpenMPCon 2015 9

SINGLE or MASTER?
• Both constructs cause a code block to be executed by one
thread only, while the others skip it: which should you use?

• MASTER has lower overhead (it’s just a test, whereas

SINGLE requires some synchronisation).

• But beware that MASTER has no implied barrier!

• If you expect some threads to arrive before others, use

SINGLE, otherwise use MASTER
OpenMPCon 2015 10

Data sharing attributes

• Don’t forget that private variables are uninitialised on entry to
parallel regions!

• Can use firstprivate, but it’s more likely to be an error.

• use cases for firstprivate are surprisingly rare.
OpenMPCon 2015 11

Default(none)
• The default behaviour for parallel regions and worksharing
construct is default(shared)

• This is extremely dangerous - makes it far too easily to

accidentally share variables.

• Possibly the worst design decision in the history of

OpenMP!

• Always, always use default(none)

• I mean always. No exceptions!
• Everybody suffers from “variable blindness”.
OpenMPCon 2015 12

Spot the bug!

#pragma omp parallel for private(temp)
for(i=0;i<N;i++){
for (j=0;j<M;j++){
temp = b[i]*c[j];
a[i][j] = temp * temp + d[i];
}
}

• May always get the right result with sufficient compiler

optimisation!
OpenMPCon 2015 13

Private global variables

double foo;
extern double foo;
#pragma omp parallel \ double sumfunc(void){
private(foo) ... = foo;
{
}
foo = ....
a = somefunc();
}

• Unspecified whether the reference to foo in somefunc is to the

original storage or the private copy.
• Unportable and therefore unusable!
• If you want access to the private copy, pass it through the
argument list (or use threadprivate).
OpenMPCon 2015 14

Huge long loops

• What should I do in this situation? (typical old-fashioned
Fortran style)

do i=1,n
..... several pages of code referencing 100+
variables
end do

• Determining the correct scope (private/shared/reduction) for

all those variables is tedious, error prone and difficult to test
adequately.
OpenMPCon 2015 15

• Refactor sequential code to

do i=1,n
call loopbody(......)
end do

• Make all loop temporary variables local to loopbody

• Pass the rest through argument list
• Much easier to test for correctness!
• Then parallelise......
• C/C++ programmers can declare temporaries in the scope of
the loop body.
OpenMPCon 2015 16

Reduction race trap

#pragma omp parallel shared(sum, b)
{
sum = 0.0;
#pragma omp for reduction(+:sum)
for(i=0;i<n:i++) {
sum += b[i];
}
.... = sum;
}

• There is a race between the initialisation of sum and the

updates to it at the end of the loop.
OpenMPCon 2015 17

Missing SAVE or static

• Compiling my sequential code with the OpenMP flag caused it
to break: what happened?
• You may have a bug in your code which is assuming that the
contents of a local variable are preserved between function
calls.
• compiling with OpenMP flag forces all local variables to be stack
allocated and not heap allocated
• might also cause stack overflow
• Need to use SAVE or static correctly
• but these variables are then shared by default
• may need to make them threadprivate
• “first time through” code may need refactoring (e.g. execute it before the
parallel region)
OpenMPCon 2015 18

Stack size
• If you have large private data structures, it is possible to run
out of stack space.
• The size of thread stack apart from the master thread can be
controlled by the OMP_STACKSIZE environment variable.
• The size of the master thread’s stack is controlled in the same
way as for sequential program (e.g. compiler switch or using
ulimit ).
• OpenMP can’t control this as by the time the runtime is called it’s too
late!
OpenMPCon 2015 19

Critical and atomic

• You can’t protect updates to shared variables in one place
with atomic and another with critical, if they might contend.
• No mutual exclusion between these
• critical protects code, atomic protects memory locations.

#pragma omp parallel

{
#pragma omp critical
a+=2;
#pragma omp atomic
a+=3;
}
OpenMPCon 2015 20

Allocating storage based on number of threads

• Sometimes you want to allocate some storage whose size is
determined by the number of threads.
• but how do you know how many threads the next parallel region will
use?
• Can call omp_get_max_threads() which returns the value
of the nthreads-var ICV. The number of threads used for the
next parallel region will not exceed this
• except if a num_threads clause is used.
• Note that the implementation can always deliver fewer threads
than this value
• if your code depends on there actually being a certain number of
threads, you should always call omp_get_num_threads() to check
OpenMPCon 2015 21

Environment for performance

• There are some environment variables you should set to
maximise performance.
• don’t rely on the defaults for these!

OMP_WAIT_POLICY=active
• Encourages idle threads to spin rather than sleep
OMP_DYNAMIC=false
• Don’t let the runtime deliver fewer threads than you asked for
OMP_PROC_BIND=true
• Prevents threads migrating between cores
OpenMPCon 2015 22

Debugging tools
• Traditional debuggers such as DDT or Totalview have support
for OpenMP

• This is good, but they are not much help for tracking down
race conditions
• debugger changes the timing of event on different threads

• Race detection tools work in a different way

• capture all the memory accesses during a run, then analyse this data for
races which might have occured.
• Intel Inspector XE
• Oracle Solaris Studio Thread Analyzer
OpenMPCon 2015 23

Timers
• Make sure your timer actually does measure wall clock time!

• Do use omp_get_wtime() !

• Don’t use clock() for example

• measures CPU time accumulated across all threads
• no wonder you don’t see any speedup......

My Revision Notes AQA CS A-Level
100% (3)
My Revision Notes AQA CS A-Level
259 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Open MP
No ratings yet
Open MP
59 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Day 2 1 Advanced-Openmp
No ratings yet
Day 2 1 Advanced-Openmp
52 pages
OPENMP
No ratings yet
OPENMP
37 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
Num Tech
No ratings yet
Num Tech
39 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Openmp 6pp
No ratings yet
Openmp 6pp
5 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Worksharing and Parallel Loops
No ratings yet
Worksharing and Parallel Loops
23 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Open MP
No ratings yet
Open MP
28 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Unit III
No ratings yet
Unit III
15 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
Openmp
No ratings yet
Openmp
115 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Openmp
No ratings yet
Openmp
21 pages
Chapter 5
No ratings yet
Chapter 5
92 pages
OpenMP Tutorial - Lawrence Livermore National Laboratory
No ratings yet
OpenMP Tutorial - Lawrence Livermore National Laboratory
75 pages
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
No ratings yet
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
10 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
49 pages
Shared Memory Parallel Programming: Introduction To Openmp
No ratings yet
Shared Memory Parallel Programming: Introduction To Openmp
39 pages
Uk Openmp Users 2018 Advanced Openmp Tutorial PDF
No ratings yet
Uk Openmp Users 2018 Advanced Openmp Tutorial PDF
106 pages
Openmp: Martin Kruliš Ji Ří Dokulil
No ratings yet
Openmp: Martin Kruliš Ji Ří Dokulil
38 pages
W8L2 OpenMP6 Furthertopics
No ratings yet
W8L2 OpenMP6 Furthertopics
20 pages
Lecture - 06 (Shared Memory Programming With OpenMP)
No ratings yet
Lecture - 06 (Shared Memory Programming With OpenMP)
65 pages
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
No ratings yet
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
62 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
Cs6801 Mcap MGM
No ratings yet
Cs6801 Mcap MGM
7 pages
4 Performance.4x
No ratings yet
4 Performance.4x
14 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Openmp
No ratings yet
Openmp
61 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Unit III Mcap
No ratings yet
Unit III Mcap
24 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
Web Based Application Development With PHP: TODO List
No ratings yet
Web Based Application Development With PHP: TODO List
16 pages
Unit 2
No ratings yet
Unit 2
27 pages
Data Entry Operator With The Introduction of DTP, V2
No ratings yet
Data Entry Operator With The Introduction of DTP, V2
12 pages
Grand Assessment - Applied Data Science
No ratings yet
Grand Assessment - Applied Data Science
13 pages
02 - Networking - Assignment 1 Brief
No ratings yet
02 - Networking - Assignment 1 Brief
4 pages
4IT1 01 Que 20201113
100% (1)
4IT1 01 Que 20201113
20 pages
Project 1 - Employee Management System - WEB - SRD
No ratings yet
Project 1 - Employee Management System - WEB - SRD
9 pages
Flowchart and Algo PDF
No ratings yet
Flowchart and Algo PDF
8 pages
TIA Portal V20 Technical Slides EN
No ratings yet
TIA Portal V20 Technical Slides EN
10 pages
CS501 Assignment Solution
No ratings yet
CS501 Assignment Solution
3 pages
Fortinet Managed IPS Rules For AWS Network Firewall: Data Sheet
No ratings yet
Fortinet Managed IPS Rules For AWS Network Firewall: Data Sheet
2 pages
Device Protection With Microsoft Endpoint Manager and Microsoft Defender For Endpoint - Module 03 - Endpoint Protection Overview
No ratings yet
Device Protection With Microsoft Endpoint Manager and Microsoft Defender For Endpoint - Module 03 - Endpoint Protection Overview
28 pages
Low Power, Low Area and High Efficiency Full Adder Using XOR-XNOR Cell
No ratings yet
Low Power, Low Area and High Efficiency Full Adder Using XOR-XNOR Cell
22 pages
Topic 3: Networks: Network Fundamentals
No ratings yet
Topic 3: Networks: Network Fundamentals
17 pages
Trustworthy AI
No ratings yet
Trustworthy AI
8 pages
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
No ratings yet
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
1 page
Course: 601: Computer Graphics: Unit 1. Introduction
No ratings yet
Course: 601: Computer Graphics: Unit 1. Introduction
2 pages
CSC 203 Human Computer Interaction Chapter 1
No ratings yet
CSC 203 Human Computer Interaction Chapter 1
13 pages
Bloodhound v1
No ratings yet
Bloodhound v1
149 pages
ChiragMangla - Hadoop Architecture
No ratings yet
ChiragMangla - Hadoop Architecture
24 pages
Procedure
No ratings yet
Procedure
2 pages
Log
No ratings yet
Log
10 pages
Class XII (As Per CBSE Board) : Computer Science
No ratings yet
Class XII (As Per CBSE Board) : Computer Science
18 pages
Internet Service Providers ISP
No ratings yet
Internet Service Providers ISP
9 pages
CYS 7132 - Lect-5
No ratings yet
CYS 7132 - Lect-5
29 pages
3-3. Block Diagram: KDL-26M3000/26ML130/32M3000/32ML130
No ratings yet
3-3. Block Diagram: KDL-26M3000/26ML130/32M3000/32ML130
24 pages
1 ST
No ratings yet
1 ST
4 pages
Multiple Access Protocols in Computer Network - GeeksforGeeks
No ratings yet
Multiple Access Protocols in Computer Network - GeeksforGeeks
8 pages
Superplan Stream
No ratings yet
Superplan Stream
2 pages

Open MP2362 HHDHD

Uploaded by

Open MP2362 HHDHD

Uploaded by

OPENMP

TIPS, TRICKS AND

• Mistyping the sentinel (e.g. !OMP or #pragma opm )

• Extra nasty if it is e.g. #pragma opm atomic – race condition!

• Write a script to search your code for your common typos

Writing code that works without OpenMP too

• If you want to link dummy OpenMP library routines into

Loops and nowait • This is safe so long as the

Tuning the chunksize

• MASTER has lower overhead (it’s just a test, whereas

• But beware that MASTER has no implied barrier!

• If you expect some threads to arrive before others, use

Data sharing attributes

• Can use firstprivate, but it’s more likely to be an error.

• This is extremely dangerous - makes it far too easily to

• Possibly the worst design decision in the history of

• Always, always use default(none)

Spot the bug!

• May always get the right result with sufficient compiler

Private global variables

• Unspecified whether the reference to foo in somefunc is to the

Huge long loops

• Determining the correct scope (private/shared/reduction) for

• Refactor sequential code to

• Make all loop temporary variables local to loopbody

Reduction race trap

• There is a race between the initialisation of sum and the

Missing SAVE or static

Critical and atomic

#pragma omp parallel

Allocating storage based on number of threads

Environment for performance

• Race detection tools work in a different way

• Don’t use clock() for example

You might also like