0% found this document useful (0 votes)
97 views35 pages

Open MP

OpenMP is a standard for shared memory parallel programming using compiler directives. It uses a fork-join model where the master thread forks additional threads to execute a parallel region. OpenMP consists of work-sharing constructs like parallel loops to distribute work, synchronization constructs to coordinate threads, and data environment constructs to specify data scope. It supports both loop-level and task-based parallelism and can be used to write hybrid MPI/OpenMP programs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views35 pages

Open MP

OpenMP is a standard for shared memory parallel programming using compiler directives. It uses a fork-join model where the master thread forks additional threads to execute a parallel region. OpenMP consists of work-sharing constructs like parallel loops to distribute work, synchronization constructs to coordinate threads, and data environment constructs to specify data scope. It supports both loop-level and task-based parallelism and can be used to write hybrid MPI/OpenMP programs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Shared Memory Parallelism -

OpenMP

Sathish Vadhiyar
Credits/Sources:
OpenMP C/C++ standard (openmp.org)
OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openMP/#Introduction)
OpenMP sc99 tutorial presentation (openmp.org)
Dr. Eric Strohmaier (University of Tennessee, CS594 class, Feb 9, 2000)
Introduction

 A portable programming model and standard for


shared memory programming using compiler
directives
 Directives?: constructs or statements in the program
applying some action on a block of code
 A specification for a set of compiler directives, library
routines, and environment variables – standardizing
pragmas
 Easy to program; easy for code developer to convert
his sequential to parallel program by throwing
directives
 First version in 1997, development over the years till
the latest 4.5 in 2015
Fork-Join Model
 Begins as a single thread
called master thread
 Fork: When parallel construct
is encountered, team of
threads are created
 Statements in the parallel
region are executed in parallel
 Join: At the end of the parallel
region, the team threads
synchronize and terminate
OpenMP consists of…

 Work-sharing constructs
 Synchronization constructs
 Data environment constructs
 Library calls, environment variables
Introduction
 Mainly supports loop-level parallelism
 Specifies parallelism for a region of code: fine-
level parallelism
 The number of threads can be varied from one
region to another – dynamic parallelism
 Follows Amdahl’s law – sequential portions in the
code
 Applications have varying phases of parallelism
 Also supports
 Coarse-level parallelism – sections and tasks
 Executions on accelerators
 SIMD vectorizations
 task-core affinity
parallel construct

#pragma omp parallel [clause [, clause] …] new-line


structured-block
Clause: Can support nested
parallelism
Parallel construct - Example
#include <omp.h>

main () {
int nthreads, tid;

#pragma omp parallel private(nthreads, tid) {

printf("Hello World \n);


}

}
Work sharing construct

 For distributing the execution among the threads


that encounter it
 3 types of work sharing constructs – loops,
sections, single
for construct

 For distributing the iterations among the threads

#pragma omp for [clause [, clause] …] new-


line
for-loop
Clause:
for construct

 Restriction in the structure of the for


loop so that the compiler can
determine the number of iterations –
e.g. no branching out of loop
 The assignment of iterations to
threads depends on the schedule
clause
 Implicit barrier at the end of for if not
nowait
schedule clause

1. schedule(static, chunk_size) –
iterations/chunk_size chunks distributed
in round-robin
2. Schedule(dynamic, chunk_size) – same
as above, but chunks distributed
dynamically.
3. schedule(runtime) – decision at runtime.
Implementation dependent
for - Example

include <omp.h>
#define CHUNKSIZE 100
#define N 1000

main () {
int i, chunk; float a[N], b[N], c[N];

/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,chunk) private(i) {
#pragma omp for schedule(dynamic,chunk) nowait
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
} /* end of parallel section */

}
Coarse level parallelism – sections and
tasks
 sections

 tasks – dynamic mechanism

 depend clause for task


Synchronization directives
flush directive

 Point where consistent view of memory is


provided among the threads
 Thread-visible variables (global variables,
shared variables etc.) are written to memory
 If var-list is used, only variables in the list are
flushed
flush - Example
flush – Example (Contd…)
Data Scope Attribute Clauses

Most variables are shared by default


Data scopes explicitly specified by data scope attribute clauses
Clauses:
1. private
2. firstprivate
3. lastprivate
4. shared
5. default
6. reduction
7. copyin
8. copyprivate
threadprivate
• Global variable-list declared are made private to a thread
• Each thread gets its own copy
• Persist between different parallel regions
 #include <omp.h>
 int alpha[10], beta[10], i;
 #pragma omp threadprivate(alpha)
 main () {
 /* Explicitly turn off dynamic threads */
 omp_set_dynamic(0);
 /* First parallel region */
 #pragma omp parallel private(i,beta)
 for (i=0; i < 10; i++) alpha[i] = beta[i] = i;
 /* Second parallel region */
 #pragma omp parallel
 printf("alpha[3]= %d and beta[3]= %d\n",alpha[3],beta[3]);}
private, firstprivate & lastprivate

 private (variable-list)
 variable-list private to each thread
 A new object with automatic storage duration allocated for the
construct

 firstprivate (variable-list)
 The new object is initialized with the value of the old object that
existed prior to the construct

 lastprivate (variable-list)
 The value of the private object corresponding to the last iteration
or the last section is assigned to the original object
shared, default, reduction

 shared(variable-list)

 default(shared | none)
 Specifies the sharing behavior of all of the variables visible in the
construct

 Reduction(op: variable-list)
 Private copies of the variables are made for each thread
 The final object value at the end of the reduction will be
combination of all the private object values
default - Example
Library Routines (API)

 Querying function (number of threads etc.)


 General purpose locking routines
 Setting execution environment (dynamic
threads, nested parallelism etc.)
API

 OMP_SET_NUM_THREADS(num_threads)
 OMP_GET_NUM_THREADS()
 OMP_GET_MAX_THREADS()
 OMP_GET_THREAD_NUM()
 OMP_GET_NUM_PROCS()
 OMP_IN_PARALLEL()
 OMP_SET_DYNAMIC(dynamic_threads)
 OMP_GET_DYNAMIC()
 OMP_SET_NESTED(nested)
 OMP_GET_NESTED()
API(Contd..)
 omp_init_lock(omp_lock_t *lock)
 omp_init_nest_lock(omp_nest_lock_t *lock)
 omp_destroy_lock(omp_lock_t *lock)
 omp_destroy_nest_lock(omp_nest_lock_t *lock)
 omp_set_lock(omp_lock_t *lock)
 omp_set_nest_lock(omp_nest_lock_t *lock)
 omp_unset_lock(omp_lock_t *lock)
 omp_unset_nest__lock(omp_nest_lock_t *lock)
 omp_test_lock(omp_lock_t *lock)
 omp_test_nest_lock(omp_nest_lock_t *lock)

 omp_get_wtime()
 omp_get_wtick()

 omp_get_thread_num()
 omp_get_num_proc()
 omp_get_num_devices()
Lock details

 Simple locks and nestable locks


 Simple locks are not locked if they are already in a
locked state
 Nestable locks can be locked multiple times by the
same thread
 Simple locks are available if they are unlocked
 Nestable locks are available if they are unlocked or
owned by a calling thread
Example – Nested lock
Example – Nested lock (Contd..)
Example 1: Jacobi Solver
Example 2: BFS Version 1
(Nested Parallelism)
Example 3: BFS Version 3
(Using Task Construct)
Hybrid Programming – Combining MPI and
OpenMP benefits
 MPI
- explicit parallelism, no synchronization problems
- suitable for coarse grain
 OpenMP
- easy to program, dynamic scheduling allowed
- only for shared memory, data synchronization problems
 MPI/OpenMP Hybrid
- Can combine MPI data placement with OpenMP fine-grain
parallelism
- Suitable for cluster of SMPs (Clumps)
- Can implement hierarchical model
 END
Definitions

 Construct – statement containing directive and


structured block
 Directive – Based on C #pragma directives

#pragma <omp id> <other text>


#pragma omp directive-name [clause [, clause] …]
new-line

Example:
#pragma omp parallel default(shared) private(beta,pi)
Parallel construct

 Parallel region executed by multiple threads


 If num_threads, omp_set_num_threads(),
OMP_SET_NUM_THREADS not used, then
number of created threads is implementation
dependent
 Number of physical processors hosting the
thread also implementation dependent
 Threads numbered from 0 to N-1
 Nested parallelism by embedding one parallel
construct inside another

You might also like