0% found this document useful (0 votes)
48 views28 pages

Open MP2

The document discusses OpenMP directives and library functions for parallel programming. It provides an overview of functions for controlling threads and processors. It also reviews common OpenMP directives like parallel, for, and sections for parallelizing loops and code segments. Various clauses are described, including private, firstprivate, lastprivate, and reduction for managing data in parallel regions. Scheduling of loop iterations using static, dynamic, guided, and other schedules is covered.

Uploaded by

l215376
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views28 pages

Open MP2

The document discusses OpenMP directives and library functions for parallel programming. It provides an overview of functions for controlling threads and processors. It also reviews common OpenMP directives like parallel, for, and sections for parallelizing loops and code segments. Various clauses are described, including private, firstprivate, lastprivate, and reduction for managing data in parallel regions. Scheduling of loop iterations using static, dynamic, guided, and other schedules is covered.

Uploaded by

l215376
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Parallel and Distributed Computing

CS3006

Lecture 9
OpenMP-II
4th April 2022

Dr. Rana Asif Rehman

CS3006 - Spring 2022


Review of OpenMP Library Functions

 Controlling Number of Threads and Processors


 void omp_set_num_threads (int num_threads);
 int omp_get_num_threads ( );
 int omp_get_max_threads ( );
 int omp_get_thread_num ( );
 int omp_get_num_procs ( );
 int omp_in_parallel ( );
 Controlling and Monitoring Thread Creation
 void omp_set_dynamic (int dynamic_threads);
 int omp_get_dynamic ( );
 void omp_set_nested (int nested);
 int omp_get_nested ();

CS3006 - Spring 2022


OpenMP

#pragma omp directive [clause list]

CS3006 - Spring 2022


OpenMP Directives

#pragma omp parallel


#pragma omp for

CS3006 - Spring 2022


One more thing to note

Difference between omp for and omp parallel


1 #pragma omp parallel
2{
3 #pragma omp for
4 for (i = 0 < i < n; i++) {
//omp for schedules/distributes itterations between the threads
5 /* body of parallel for loop */
6 }
7}
is same as
1 #pragma omp parallel for
3 for (i = 0 < i < n; i++) {
4 /* body of parallel for loop */
6 }
CS5009 - Advanced Operating Systems
Some Useful Clauses in OpenMP
 A clause is an optional, additional component to a
pragma
Private: The private clause directs the compiler to
make one or more variables private
int k=3;
#pragma omp parallel for default(shared) private(j) shared(k)
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = MIN(a[i][j],a[i][k]+tmp);

Comments:
 Here the private variable j is undefined -
 when this parallel construct is entered
 when this parallel construct is exited

CS3006 - Spring 2022


Some Useful Clauses in OpenMP

firstprivate: It directs the compiler to create


private variables having initial values identical to
the value of the variable controlled by the master
thread as the loop is entered.
s = complex_function();
#pragma omp parallel for firstprivate(s) num_threads(2)
for (i = 0 ; i < n ; i ++) (
s = s*omp_get_thread_num();
printf(“S is %d at thread#%d\n”, s,omp_get_thread_num());
}

CS3006 - Spring 2022


Some useful clauses

lastprivate:
consider the following code
s = complex_function();
#pragma omp parallel for private(j) firstprivate(s)
for (i = 0 ; i < n ; i ++) (
s += 1
}
printf(“s after join:%d\n”,s); //undefined value
--------------------------------------------------------------------------------------------------

CS3006 - Spring 2022


Some Useful Clauses in OpenMP

lastprivate: used to copy back to the


master thread’s copy of the variable, the
private copy of the variable from the
thread that executed the last iteration.

s = complex_function();
#pragma omp parallel for private(j) firstprivate(s) lastprivate(s)
for (i = 0 ; i < n ; i ++) (
s +=1;
}
printf(“s after join:%d\n”,s);//value of s as it was for last iteration of the loop

CS3006 - Spring 2022


Reduction clause

 Reductions are so common that OpenMP provides


support for them
 May add reduction clause to parallel for pragma
 Specify reduction operation and reduction variable
 OpenMP takes care of storing partial results in private
variables and combining partial results after the loop
 The reduction clause has this syntax:
reduction (<op> :<variable>)

 Operators
 + Sum
 * Product
 & Bitwise and
 | Bitwise or
 ^ Bitwise exclusive or
 && Logical and
 || Logical or
CS3006 - Spring 2022
Reduction clause

double area, pi, x;


int i, n;
...
area = 0.0;
#pragma omp parallel for private(x) reduction(+:area)
for (i = 0; i < n; i++) {
x = (i+0.5)/n;
area += 4.0/(1.0 + x*x);
}
pi = area / n;

CS3006 - Spring 2022


Conditional Parallelism Clause

 if Clause: The if clause gives us the ability to direct


the compiler to insert code that determines at run-
time whether the loop should be executed in
parallel or not.
 The clause has this syntax: if (<scalar expression> )

double area, pi, x;


int i, n;
...
area = 0.0;
#pragma omp parallel for private(x) reduction(+:area) if(n>5000)
for (i = 0; i < n; i++) {
x = (i+0.5)/n;
area += 4.0/(1.0 + x*x);
}
pi = area / n;
CS3006 - Spring 2022
Scheduling Loops (a clause)

 Scheduling the loops means dividing number of


iterations between the processes.
 Syntax of schedule clause
schedule (<type>[,<chunk> ])

 Schedule type is required but, chunk size optional

 A chunk is a contiguous range of iterations


 Increasing chunk size reduces scheduling overhead and
may increase cache hit rate [due to operations on
contiguous memory locations]
 Decreasing chunk size allows finer balancing of workloads

CS5009 - Advanced Operating Systems


Scheduling Loops

1. Static: schedule(static[, chunk-size])


 Splits the iteration space into equal chunks of size
chunk-size and assigns them to threads in a round-
robin fashion.
 When no chunk-size is specified, the iteration space
is split into as many chunks as there are threads (i.e.,
size of each is n/tot.threads) and one chunk is
assigned to each thread.
 Decision about work division is done before actually
executing the code.
 Results in lower scheduling overhead. But, can
cause load-imbalance if all processors are not of
same compute-capability.
CS5009 - Advanced Operating Systems
Scheduling Loops

1. Static: schedule(static[, chunk-size])

Example when reducing chunk size improves load-


balancing
#pragma omp parallel for private (j) schedule(static, 1)
for (i = 0; i < n; i++)
for ( j = i; j < n; j++)
a[i][j] = complex_func(i,j);

CS5009 - Advanced Operating Systems


Scheduling Loops

2. Dynamic: schedule(dynamic[, chunk-size])


 The iteration space is partitioned into chunks given
by chunk-size
 Initially every thread is assigned single chunk. The
decision for remaining iteration chunks is done on
run-time
 This means chunk is assigned to threads as they
become idle.
 This takes care of the temporal imbalances resulting
from static scheduling.
 If no chunk-size is specified, it defaults to a single
iteration per chunk
CS5009 - Advanced Operating Systems
Scheduling Loops

3. Guided:
 schedule(guided, C): dynamic allocation of chunks to tasks
using guided self-scheduling heuristic. Initial chunks are
bigger, later chunks are smaller, minimum chunk size is C.
 schedule(guided): guided self-scheduling with minimum
chunk size 1

4. schedule(runtime): schedule chosen at run-time


based on value of OMP_SCHEDULE env variable.

CS5009 - Advanced Operating Systems


Scheduling Loops(Summary)

CS5009 - Advanced Operating Systems


No Wait Clause

 In order to avoid implicit barrier


 A thread can easily move to next after completed
its assign task/iterations

#pragma omp parallel for nowait

CS3006 - Spring 2022


Functional / Task
Parallelism in OpenMP
Functional/Task Parallelism

#pragma omp sections [clause list]

CS3006 - Spring 2022


Functional/Task Parallelism
 If your code is based on different segments or sections
that can be executed in parallel.
 Also known to as task parallelism

a b
v = alpha();
W = beta();
x = gamma(v,w);
y = delta();
g d
printf(“%6.2f\n”,epsilon(x,y));

• Can execute alpha, beta,


delta parallelly e
• Remaining ones are executed
sequentially according to
the dependency.

CS5009 - Advanced Operating Systems


parallel sections, section pragmas
#pragma omp parallel sections
{
#pragma omp section //[optional for 1st block]
v = alpha();
#pragma omp section
W = beta();
#pragma omp section
y = delta();
}
x = gamma(v,w);
printf(“%6.2f\n”, epsilon(x,y));

 #pragma omp parallel sections creates a team of threads


which executes the sections in the region parallelly
 Sections that can be executed parallel are preceded by ‘omp section’
pragma.

CS5009 - Advanced Operating Systems


Functional Parallelism
Another approach

a b
v = alpha();
W = beta();
x = gamma(v,w);
y = delta();
g d
printf(“%6.2f\n”,epsilon(x,y));

• Execute alpha and beta in


parallel. e
• Execute gamma and delta in
parallel

CS5009 - Advanced Operating Systems


omp sections pragma
 Appears inside a parallel block of code
 This pragma distributes enclosed sections among the
threads in the team

 The difference between omp parallel sections and


omp sections is that,
 Omp parallel sections generate its own team of threads
 While simple omp sections pragma uses existing team of threads
and distributes section among the threads

 If multiple sections pragmas are inside one parallel


block, may reduce fork/join costs

CS5009 - Advanced Operating Systems


sections pragma
#pragma omp parallel num_threads(2)
{
#pragma omp sections
{
#pragma omp section //optional
v = alpha();
#pragma omp section
w = beta();
} // here an implicit barrier exists

#pragma omp sections


{
x = gamma(v, w);
#pragma omp section
y = delta();
}
}
printf ("%6.2f\n", epsilon(x,y));
CS5009 - Advanced Operating Systems
Questions

CS3006 - Spring 2022


References
1. Kumar, V., Grama, A., Gupta, A., & Karypis, G. (2017). Introduction to parallel computing. Redwood City, CA:
Benjamin/Cummings.

CS3006 - Spring 2022

You might also like