0% found this document useful (0 votes)

48 views28 pages

Open MP2

The document discusses OpenMP directives and library functions for parallel programming. It provides an overview of functions for controlling threads and processors. It also reviews common OpenMP directives like parallel, for, and sections for parallelizing loops and code segments. Various clauses are described, including private, firstprivate, lastprivate, and reduction for managing data in parallel regions. Scheduling of loop iterations using static, dynamic, guided, and other schedules is covered.

Uploaded by

l215376

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views28 pages

Open MP2

Uploaded by

l215376

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Parallel and Distributed Computing

CS3006

Lecture 9
OpenMP-II
4th April 2022

Dr. Rana Asif Rehman

CS3006 - Spring 2022

Review of OpenMP Library Functions

 Controlling Number of Threads and Processors

 void omp_set_num_threads (int num_threads);
 int omp_get_num_threads ( );
 int omp_get_max_threads ( );
 int omp_get_thread_num ( );
 int omp_get_num_procs ( );
 int omp_in_parallel ( );
 Controlling and Monitoring Thread Creation
 void omp_set_dynamic (int dynamic_threads);
 int omp_get_dynamic ( );
 void omp_set_nested (int nested);
 int omp_get_nested ();

CS3006 - Spring 2022

OpenMP

#pragma omp directive [clause list]

CS3006 - Spring 2022

OpenMP Directives

#pragma omp parallel

#pragma omp for

CS3006 - Spring 2022

One more thing to note

Difference between omp for and omp parallel

1 #pragma omp parallel
2{
3 #pragma omp for
4 for (i = 0 < i < n; i++) {
//omp for schedules/distributes itterations between the threads
5 /* body of parallel for loop */
6 }
7}
is same as
1 #pragma omp parallel for
3 for (i = 0 < i < n; i++) {
4 /* body of parallel for loop */
6 }
CS5009 - Advanced Operating Systems
Some Useful Clauses in OpenMP
 A clause is an optional, additional component to a
pragma
Private: The private clause directs the compiler to
make one or more variables private
int k=3;
#pragma omp parallel for default(shared) private(j) shared(k)
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = MIN(a[i][j],a[i][k]+tmp);

Comments:
 Here the private variable j is undefined -
 when this parallel construct is entered
 when this parallel construct is exited

CS3006 - Spring 2022

Some Useful Clauses in OpenMP

firstprivate: It directs the compiler to create

private variables having initial values identical to
the value of the variable controlled by the master
thread as the loop is entered.
s = complex_function();
#pragma omp parallel for firstprivate(s) num_threads(2)
for (i = 0 ; i < n ; i ++) (
s = s*omp_get_thread_num();
printf(“S is %d at thread#%d\n”, s,omp_get_thread_num());
}

CS3006 - Spring 2022

Some useful clauses

lastprivate:
consider the following code
s = complex_function();
#pragma omp parallel for private(j) firstprivate(s)
for (i = 0 ; i < n ; i ++) (
s += 1
}
printf(“s after join:%d\n”,s); //undefined value
--------------------------------------------------------------------------------------------------

CS3006 - Spring 2022

Some Useful Clauses in OpenMP

lastprivate: used to copy back to the

master thread’s copy of the variable, the
private copy of the variable from the
thread that executed the last iteration.

s = complex_function();
#pragma omp parallel for private(j) firstprivate(s) lastprivate(s)
for (i = 0 ; i < n ; i ++) (
s +=1;
}
printf(“s after join:%d\n”,s);//value of s as it was for last iteration of the loop

CS3006 - Spring 2022

Reduction clause

 Reductions are so common that OpenMP provides

support for them
 May add reduction clause to parallel for pragma
 Specify reduction operation and reduction variable
 OpenMP takes care of storing partial results in private
variables and combining partial results after the loop
 The reduction clause has this syntax:
reduction (<op> :<variable>)

 Operators
 + Sum
 * Product
 & Bitwise and
 | Bitwise or
 ^ Bitwise exclusive or
 && Logical and
 || Logical or
CS3006 - Spring 2022
Reduction clause

double area, pi, x;

int i, n;
...
area = 0.0;
#pragma omp parallel for private(x) reduction(+:area)
for (i = 0; i < n; i++) {
x = (i+0.5)/n;
area += 4.0/(1.0 + x*x);
}
pi = area / n;

CS3006 - Spring 2022

Conditional Parallelism Clause

 if Clause: The if clause gives us the ability to direct

the compiler to insert code that determines at run-
time whether the loop should be executed in
parallel or not.
 The clause has this syntax: if (<scalar expression> )

double area, pi, x;

int i, n;
...
area = 0.0;
#pragma omp parallel for private(x) reduction(+:area) if(n>5000)
for (i = 0; i < n; i++) {
x = (i+0.5)/n;
area += 4.0/(1.0 + x*x);
}
pi = area / n;
CS3006 - Spring 2022
Scheduling Loops (a clause)

 Scheduling the loops means dividing number of

iterations between the processes.
 Syntax of schedule clause
schedule (<type>[,<chunk> ])

 Schedule type is required but, chunk size optional

 A chunk is a contiguous range of iterations

 Increasing chunk size reduces scheduling overhead and
may increase cache hit rate [due to operations on
contiguous memory locations]
 Decreasing chunk size allows finer balancing of workloads

CS5009 - Advanced Operating Systems

Scheduling Loops

1. Static: schedule(static[, chunk-size])

 Splits the iteration space into equal chunks of size
chunk-size and assigns them to threads in a round-
robin fashion.
 When no chunk-size is specified, the iteration space
is split into as many chunks as there are threads (i.e.,
size of each is n/tot.threads) and one chunk is
assigned to each thread.
 Decision about work division is done before actually
executing the code.
 Results in lower scheduling overhead. But, can
cause load-imbalance if all processors are not of
same compute-capability.
CS5009 - Advanced Operating Systems
Scheduling Loops

1. Static: schedule(static[, chunk-size])

Example when reducing chunk size improves load-

balancing
#pragma omp parallel for private (j) schedule(static, 1)
for (i = 0; i < n; i++)
for ( j = i; j < n; j++)
a[i][j] = complex_func(i,j);

CS5009 - Advanced Operating Systems

Scheduling Loops

2. Dynamic: schedule(dynamic[, chunk-size])

 The iteration space is partitioned into chunks given
by chunk-size
 Initially every thread is assigned single chunk. The
decision for remaining iteration chunks is done on
run-time
 This means chunk is assigned to threads as they
become idle.
 This takes care of the temporal imbalances resulting
from static scheduling.
 If no chunk-size is specified, it defaults to a single
iteration per chunk
CS5009 - Advanced Operating Systems
Scheduling Loops

3. Guided:
 schedule(guided, C): dynamic allocation of chunks to tasks
using guided self-scheduling heuristic. Initial chunks are
bigger, later chunks are smaller, minimum chunk size is C.
 schedule(guided): guided self-scheduling with minimum
chunk size 1

4. schedule(runtime): schedule chosen at run-time

based on value of OMP_SCHEDULE env variable.

CS5009 - Advanced Operating Systems

Scheduling Loops(Summary)

CS5009 - Advanced Operating Systems

No Wait Clause

 In order to avoid implicit barrier

 A thread can easily move to next after completed
its assign task/iterations

#pragma omp parallel for nowait

CS3006 - Spring 2022

Functional / Task
Parallelism in OpenMP
Functional/Task Parallelism

#pragma omp sections [clause list]

CS3006 - Spring 2022

Functional/Task Parallelism
 If your code is based on different segments or sections
that can be executed in parallel.
 Also known to as task parallelism

a b
v = alpha();
W = beta();
x = gamma(v,w);
y = delta();
g d
printf(“%6.2f\n”,epsilon(x,y));

• Can execute alpha, beta,

delta parallelly e
• Remaining ones are executed
sequentially according to
the dependency.

CS5009 - Advanced Operating Systems

parallel sections, section pragmas
#pragma omp parallel sections
{
#pragma omp section //[optional for 1st block]
v = alpha();
#pragma omp section
W = beta();
#pragma omp section
y = delta();
}
x = gamma(v,w);
printf(“%6.2f\n”, epsilon(x,y));

 #pragma omp parallel sections creates a team of threads

which executes the sections in the region parallelly
 Sections that can be executed parallel are preceded by ‘omp section’
pragma.

CS5009 - Advanced Operating Systems

Functional Parallelism
Another approach

a b
v = alpha();
W = beta();
x = gamma(v,w);
y = delta();
g d
printf(“%6.2f\n”,epsilon(x,y));

• Execute alpha and beta in

parallel. e
• Execute gamma and delta in
parallel

CS5009 - Advanced Operating Systems

omp sections pragma
 Appears inside a parallel block of code
 This pragma distributes enclosed sections among the
threads in the team

 The difference between omp parallel sections and

omp sections is that,
 Omp parallel sections generate its own team of threads
 While simple omp sections pragma uses existing team of threads
and distributes section among the threads

 If multiple sections pragmas are inside one parallel

block, may reduce fork/join costs

CS5009 - Advanced Operating Systems

sections pragma
#pragma omp parallel num_threads(2)
{
#pragma omp sections
{
#pragma omp section //optional
v = alpha();
#pragma omp section
w = beta();
} // here an implicit barrier exists

#pragma omp sections

{
x = gamma(v, w);
#pragma omp section
y = delta();
}
}
printf ("%6.2f\n", epsilon(x,y));
CS5009 - Advanced Operating Systems
Questions

CS3006 - Spring 2022

References
1. Kumar, V., Grama, A., Gupta, A., & Karypis, G. (2017). Introduction to parallel computing. Redwood City, CA:
Benjamin/Cummings.

CS3006 - Spring 2022

ACAv3 EN M12 CachingContent Instructor Deck
No ratings yet
ACAv3 EN M12 CachingContent Instructor Deck
51 pages
SRS Emotion Detection
100% (2)
SRS Emotion Detection
8 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
Parallel Programming Using OpenMP
No ratings yet
Parallel Programming Using OpenMP
76 pages
High Performance Computing (HPC) Lec4
No ratings yet
High Performance Computing (HPC) Lec4
32 pages
Software Update Procedure For S40 - Based On 3083 Step by Step PDF
100% (2)
Software Update Procedure For S40 - Based On 3083 Step by Step PDF
18 pages
Omp Sync Data Runtime Environment
No ratings yet
Omp Sync Data Runtime Environment
59 pages
Analog and Digital CMOS Study Notes
No ratings yet
Analog and Digital CMOS Study Notes
17 pages
TLE 7-8 ICT-CSS Q1 - M2 For Printing
100% (3)
TLE 7-8 ICT-CSS Q1 - M2 For Printing
32 pages
Simatic Manual de Funções S7 300 S7 400
No ratings yet
Simatic Manual de Funções S7 300 S7 400
114 pages
SoT Sep Dec2023 Timetable Version1
No ratings yet
SoT Sep Dec2023 Timetable Version1
143 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Neuromorphic Computing Brief Market Analysis
No ratings yet
Neuromorphic Computing Brief Market Analysis
57 pages
Schindler 3300-5300 SCPU Replacement Guide
100% (2)
Schindler 3300-5300 SCPU Replacement Guide
4 pages
Open MP3
No ratings yet
Open MP3
38 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
SME Server - 10.1 - SME Server
No ratings yet
SME Server - 10.1 - SME Server
25 pages
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
Openmp
No ratings yet
Openmp
115 pages
Advanced Motion Controls PS30A
No ratings yet
Advanced Motion Controls PS30A
3 pages
Lecture 13 PDC Bcs 6ef Smi Spring 2025
No ratings yet
Lecture 13 PDC Bcs 6ef Smi Spring 2025
17 pages
Shared Memory and Accelerators
No ratings yet
Shared Memory and Accelerators
88 pages
OpenMP 4
No ratings yet
OpenMP 4
18 pages
Sangean DCR-89 Plus (EN)
No ratings yet
Sangean DCR-89 Plus (EN)
26 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
Led Driver Aging Rack
No ratings yet
Led Driver Aging Rack
5 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Full Wave Rectifier
0% (1)
Full Wave Rectifier
5 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Parallel Programming Module 3
No ratings yet
Parallel Programming Module 3
44 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Very Low Phase Noise Vackar VFO
100% (2)
Very Low Phase Noise Vackar VFO
4 pages
PDSOpen MP
No ratings yet
PDSOpen MP
22 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Secure Vault Mobile Application
100% (1)
Secure Vault Mobile Application
62 pages
W7L2 OpenMP4 Worksharing
No ratings yet
W7L2 OpenMP4 Worksharing
26 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Dynamo1300 640SF
No ratings yet
Dynamo1300 640SF
2 pages
Open MP1
No ratings yet
Open MP1
15 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
OpenMP and MPI Multiple Choice Questions (MCQS) For Exam Preparation
No ratings yet
OpenMP and MPI Multiple Choice Questions (MCQS) For Exam Preparation
13 pages
Enterprise Programming Lab
No ratings yet
Enterprise Programming Lab
2 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
OPENMP Language Features - Part 1 - 2
No ratings yet
OPENMP Language Features - Part 1 - 2
38 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Unit III
No ratings yet
Unit III
15 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
HPC Summary
No ratings yet
HPC Summary
17 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Openmp
No ratings yet
Openmp
61 pages
Fortios v6.2.7 Release Notes
No ratings yet
Fortios v6.2.7 Release Notes
38 pages
Wadia 102 Folleto
No ratings yet
Wadia 102 Folleto
2 pages
Unit 3
No ratings yet
Unit 3
13 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Open MP
No ratings yet
Open MP
35 pages
Business Object in SAP
No ratings yet
Business Object in SAP
27 pages
IJ82
No ratings yet
IJ82
5 pages
Num Tech
No ratings yet
Num Tech
39 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Worksharing and Parallel Loops
No ratings yet
Worksharing and Parallel Loops
23 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
4B-2084TA - 4B-3064TA - Serise - User - Manual 20190505
No ratings yet
4B-2084TA - 4B-3064TA - Serise - User - Manual 20190505
57 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
82-SW/TRM RTM Terminal Software: Manuale Di Istruzioni Instruction Manual
No ratings yet
82-SW/TRM RTM Terminal Software: Manuale Di Istruzioni Instruction Manual
24 pages
Software Testing Essentials
No ratings yet
Software Testing Essentials
3 pages
How To Attempt Test: (Laptop + Mobile)
No ratings yet
How To Attempt Test: (Laptop + Mobile)
29 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Polaroid Prof 700 Root
No ratings yet
Polaroid Prof 700 Root
13 pages
CCNA Exploration 2: Module Final Exam Answers 2
No ratings yet
CCNA Exploration 2: Module Final Exam Answers 2
8 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
IC-2720 Service Manual
No ratings yet
IC-2720 Service Manual
51 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Open MP2

Uploaded by

Open MP2

Uploaded by

Parallel and Distributed Computing

Dr. Rana Asif Rehman

CS3006 - Spring 2022

 Controlling Number of Threads and Processors

CS3006 - Spring 2022

#pragma omp directive [clause list]

CS3006 - Spring 2022

#pragma omp parallel

CS3006 - Spring 2022

Difference between omp for and omp parallel

CS3006 - Spring 2022

firstprivate: It directs the compiler to create

CS3006 - Spring 2022

CS3006 - Spring 2022

lastprivate: used to copy back to the

CS3006 - Spring 2022

 Reductions are so common that OpenMP provides

double area, pi, x;

CS3006 - Spring 2022

 if Clause: The if clause gives us the ability to direct

double area, pi, x;

 Scheduling the loops means dividing number of

 Schedule type is required but, chunk size optional

 A chunk is a contiguous range of iterations

CS5009 - Advanced Operating Systems

1. Static: schedule(static[, chunk-size])

1. Static: schedule(static[, chunk-size])

Example when reducing chunk size improves load-

CS5009 - Advanced Operating Systems

2. Dynamic: schedule(dynamic[, chunk-size])

4. schedule(runtime): schedule chosen at run-time

CS5009 - Advanced Operating Systems

CS5009 - Advanced Operating Systems

 In order to avoid implicit barrier

#pragma omp parallel for nowait

CS3006 - Spring 2022

#pragma omp sections [clause list]

CS3006 - Spring 2022

• Can execute alpha, beta,

CS5009 - Advanced Operating Systems

 #pragma omp parallel sections creates a team of threads

CS5009 - Advanced Operating Systems

• Execute alpha and beta in

CS5009 - Advanced Operating Systems

 The difference between omp parallel sections and

 If multiple sections pragmas are inside one parallel

CS5009 - Advanced Operating Systems

#pragma omp sections

CS3006 - Spring 2022

CS3006 - Spring 2022

You might also like