Open MP2362 HHDHD
Open MP2362 HHDHD
Open MP2362 HHDHD
Directives
• Be careful!
Parallel regions
• The overhead of executing a parallel region is typically in the
tens of microseconds range
• depends on compiler, hardware, no. of threads
• The sequential execution time of a section of code has to be
several times this to make it worthwhile parallelising.
• If a code section is only sometimes long enough, use the if
clause to decide at runtime whether to go parallel or not.
• Overhead on one thread is typically much smaller (<1µs).
• You can use the EPCC OpenMP microbenchmarks to do
detailed measurements of overheads on your system.
• Download from www.epcc.ed.ac.uk/research/computing/
performance-characterisation-and-benchmarking
OpenMPCon 2015 5
Is my loop parallelisable?
• Quick and dirty test for whether the iterations of a loop are
independent.
• Run the loop in reverse order!!
• Not infallible, but counterexamples are quite hard to construct.
OpenMPCon 2015 6
Default schedule
• Note that the default schedule for loops with no schedule
clause is implementation defined.
• Doesn’t have to be STATIC.
• In practice, in all implementations I know of, it is.
• Nevertheless you should not rely on this!
• Also note that SCHEDULE(STATIC) does not completely
specify the distribution of loop iterations.
• don’t write code that relies on a particular mapping of iterations to
threads
OpenMPCon 2015 8
• It’s often more robust to tune the number of chunks per thread
and derive the chunksize from that.
• chunksize expression does not have to be a compile-time constant
OpenMPCon 2015 9
SINGLE or MASTER?
• Both constructs cause a code block to be executed by one
thread only, while the others skip it: which should you use?
Default(none)
• The default behaviour for parallel regions and worksharing
construct is default(shared)
do i=1,n
..... several pages of code referencing 100+
variables
end do
Stack size
• If you have large private data structures, it is possible to run
out of stack space.
• The size of thread stack apart from the master thread can be
controlled by the OMP_STACKSIZE environment variable.
• The size of the master thread’s stack is controlled in the same
way as for sequential program (e.g. compiler switch or using
ulimit ).
• OpenMP can’t control this as by the time the runtime is called it’s too
late!
OpenMPCon 2015 19
OMP_WAIT_POLICY=active
• Encourages idle threads to spin rather than sleep
OMP_DYNAMIC=false
• Don’t let the runtime deliver fewer threads than you asked for
OMP_PROC_BIND=true
• Prevents threads migrating between cores
OpenMPCon 2015 22
Debugging tools
• Traditional debuggers such as DDT or Totalview have support
for OpenMP
• This is good, but they are not much help for tracking down
race conditions
• debugger changes the timing of event on different threads
Timers
• Make sure your timer actually does measure wall clock time!
• Do use omp_get_wtime() !