0% found this document useful (0 votes)

48 views112 pages

Parallel Programming Module 2

Uploaded by

divyansh.death

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views112 pages

Parallel Programming Module 2

Uploaded by

divyansh.death

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

DS3202:

PARALLEL PROGRAMMING
MODULE – 2
Basics of OpenMP

6th SEM
B.Tech
DSE
Basic Architecture

2
Sequential Programming

3
Introduction to OpenMP
• OpenMP (Open Multi-Processing) is an application programming
interface (API) designed for parallel and multi-threaded programming.
• Open stands for open source
• It provides a set of directives and functions that allow developers to
write parallel code.
• OpenMP is particularly well-suited for shared-memory parallel
programming on multi-core processors.
• In 1991, Parallel Computing Forum (PCF) group invented a set of
directives for specifying loop parallelism in Fortran programs.
• In 1997, the first version of OpenMP for Fortran was defined by
OpenMP Architecture Review Board.
• Binding for C/C++ was introduced later.
4
• API components:
– Compiler directives
– Runtime library routines
– Environment variables
• Portability
– API is specified for C/C++ and Fortran
– Implementations on almost all platforms including Unix/Linux
and Windows

5
6
Overview of OpenMP
• OpenMP API consists of :

• Compiler Directives
• Runtime Subroutines/Functions
• Environment Variables

7
OpenMP - Compiler Directives
• Compiler directive is a statement that causes the compiler to take a
specific action during compilation
• OpenMP compiler directives begin with #pragma omp in C/C++ .
• “Pragma”: stands for pragmatic information.
• A pragma is a way to communicate the information to the compiler.
• A directive consists of a directive name followed by clauses
• Example: #pragma omp parallel default (shared) private (var1,var2)
• Other directives include sections, single, atomic, flush,etc

8
• The #pragma omp parallel directive explicitly instructs the compiler
to parallelize the chosen block of code (or) guide the compiler to
generate parallelized code.
• They define parallel regions, specify work-sharing constructs, and
control data scope within parallel regions.

9
.

OpenMP - Runtime Subroutines/Functions

• The OpenMP functions/subroutines are included in a header file
called omp.h
• Runtime functions are used to query information about the current
thread, the number of threads, control parallel regions dynamically,
and perform other runtime operations.
• These functions are included in the OpenMP runtime library.

10
OpenMP - Environment Variables
• Environment variables are settings that influence the behavior of
OpenMP programs at runtime.
• These variables can be set externally, affecting the execution of
parallelized code without modifying the source.
• Environment variables, such as OMP_NUM_THREADS, control the
number of threads used in parallel regions, adjust thread affinity, and
configure other aspects of the runtime environment.

11
More Environment Variables

12
OpenMP Programming Model

• Shared memory, thread-based parallelism

– OpenMP is based on the existence of multiple threads in the shared memory
programming paradigm.
– A shared memory process consists of multiple threads.
• Explicit Parallelism
– Programmer has full control over parallelization. OpenMP is not an automatic
parallel programming model.
• Compiler directive based
– Most OpenMP parallelism is specified through the use of compiler directives
which are embedded in the source code.

13
OpenMP Thread vs Core

• Thread is independent sequence of execution of program code.

• Thread is a block of code with one entry and one exit.
• OpenMP threads are mapped onto physical cores.
• Possible to map more than 1 thread on a core.
• In practice best to have one-to-one mapping.

14
• Each process starts with one main thread. This thread is called master thread in
OpenMP.
• For a particular block of code, we create multiple threads along with this master
thread. These extra threads other than master thread are called as slave
threads.
• OpenMP is called as Shared Memory model as OpenMP is used to create
multiple threads and these multiple threads share the memory of main process.
• OpenMP is called as Fork-Join model. It is because, all slave threads after
execution get joined to the master thread.
• i.e. Process starts with single master thread which executes sequentially until a
parallel region is encountered and ends with single master thread.
• When parallel region is encountered, it creates a team of parallel threads
(FORK). When the team threads complete the parallel region, they synchronize
and terminate, leaving only the master thread that executes sequentially (JOIN).

15
16
Creating Threads
• We create threads in OpenMP with the parallel construct.
• A simple example to create threads:

#pragma omp parallel

{
printf("Hello World");

• Suppose we are adding this code in a C language program.

• This snippet will create multiple threads.
• All of which are printing "Hello World".
• By default, number of threads created is equal to number of Processor cores.

17
• How to create required number of threads?

#pragma omp parallel num_threads(7)

{
printf("Hello World");
}

• If you want create specific number of threads, use num_threads()

and a number indicating number of threads to be created should be
passed as argument to num_threads().
• In above example, seven threads will be created.
• Each one will be responsible for printing "Hello World".

18
Hello World 4 times
• The num_threads(4) clause in the #pragma omp parallel directive
specifies that only 4 threads are created.
• The #pragma omp single directive ensures that the enclosed block of
code is executed by only one thread.

19
What happens at run time

20
• How to create multiple threads using "for" loop?
We can create multiple threads for "for" block. Check following
snippet:

#pragma omp parallel for

for(i=0;i<6;i++)
{
printf("Hello World");
}

• In above code snippet, since we are not mentioning number of threads,

number of threads will be equal to number of cores.
• This "for" loop will have six iterations which are done by these many number
of threads.

21
• How to allocate different work to different thread?
In OpenMP, we can allocate different work to different thread by using "sections".
• #pragma omp parallel sections num_threads(3)
{
#pragma omp section
{
printf("Hello World One");
}

#pragma omp section

{
printf("Hello World Two");
}

#pragma omp section

{
printf("Hello World Three");
}

• In above example, we have created three threads by mentioning num_threads(3). First thread will print "Hello
World One". Second thread will print "Hello World Two" and third thread will print "Hello World Three". 22
Thread Synchronization
• Synchronization is used to impose order constraints and to protect
access to shared data
• A synchronisation allows you get control on thread execution.

High level synchronization:

• critical
• atomic
• barrier
• ordered
• master
• single
• Low level synchronization
• – flush 23
• – locks (both simple and nested)
High Level Synchronization
1. Barrier
• OMP barrier is a very important concept in synchronization
• Sometimes you may need all threads to wait at a certain point of your
code before moving on.
• Barrier provides a synchronization point at which threads in a parallel
region will wait until all other threads in that section reach the same
point.
• For example, if you build up a data structure in parallel and then you want
to perform some operations on said data structure, you need to ensure all
of the threads have finished the first stage before the second can begin.
• To do this, enter a barrier into your code as follows:
• #pragma omp barrier
24
25
26
27
• There are two kinds of barrier synchronisations in openmp:
• Explicit
• Implicit.
• Explicit synchronisation is done with a specific openmp construct or
directive clause that allows to create a barrier:
• #pragma omp barrier .
• Implicit synchronisation is done in two situations:
• At the end of a parallel region
• Eg: #pragma omp parallel{}
• At the end of some openmp constructs
• Eg: #pragma omp for
#pragma omp sections
• If the user do not need implicit barriers, he can specify “nowait”
clause at the end of the work-sharing pragma directive 28
Nowait Clause
• Overrides any implied barrier at the end of a loop construct in a
region.
• That is using nowait clause can "break" the implicit barrier.
• #pragma omp for nowait

29
Example without nowait clause

30
Because of the implicit barrier in "pragma omp for" region, the second
"for-loop" won't run until the threads in first "for-loop" all finish.

31
Now add nowait clause in first "for-loop" construct:

32
33
2. Atomic & Critical
• Within a parallel region you may want to execute some code that only
one thread should do at a time (eg. updating a shared variable). In these
cases, you should use an atomic or critical region. It ensures that only
one thread is able to enter a critical region at a time.
• These define blocks of code within a parallel region that will only be
executed by one thread at a time.
• It is important to note that all threads will eventually run the code
within the atomic/critical block.
• You should use an atomic block if you are executing a simple statement
within the block. Atomic operation has much lower overhead.
• A critical region is used for lengthier blocks of code. It has high
overhead.
34
Example
#pragma omp parallel shared(x)
{
...

#pragma omp atomic

{
x++;
}
...

#pragma omp critical

{
lengthier code involving variable x
}
}
35
3. Ordered
• The ordered region executes in the sequential order
• Parallel threads execute concurrently until they encounter the order
block, which is then executed sequentially in the same order as it
would get executed in a serial loop.

36
37
38
39
40
4. Master & Single Sections
• Within a parallel region it is also possible that you have a block of
code that should only be executed once.
• You can do this with a single block (#pragma omp single), which
means that the following block of code will only be executed once by
the first thread to reach it (can be other than master thread)
• or a master block (#pragma omp master) does the same but chooses
the master thread only., which means it will only be executed by the
master thread (thread ID 0).

41
Low Level Synchronization
Flush
• Flush operation does not actually synchronize different threads.
• It just ensures that a thread’s values are made consistent with main
memory.
• Flush forces data to updated in memory so other threads see the
most recent value.
• Locks
• Locks can be used to protect resources.
• A lock implies a memory fence (a “flush”) of all thread variables.

42
Performance Analysis

• For parallel programs, performance is the primary concern.

• The relative ease of using OpenMP is a mixed blessing.
• In this section, we briefly introduce some factors that affect
performance when programming, and some “best practices” to
improve performance.

43
Performance Considerations
Factors Impacting (Affecting)Performance

• Performance of the serial regions

• Proportion of the serial parts and the parallel parts.
• Overhead of thread management
• Memory access

44
Factors Impacting (Affecting)Performance
• The OpenMP program includes serial regions and parallel regions,
• Firstly, the performance of the serial regions will affect the
performance of the whole program. [Optimazation needed]
• There are many optimization methods for serial programs, such as
eliminating data dependencies, constant folding, copy propagation,
removing dead code, etc.
• Although the compiler has made some optimization efforts for serial
programs, in fact, for current compilers the effect of this optimization
is minimal.
• A simpler and more effective method is to parallelize and vectorize
the parts that can be parallelized or vectorized.
45
• The second factor is the proportion of the serial parts and the
parallel parts.
• It is well understood that the more parts that can be parallelized in a
program, the greater the room for program performance
improvement.
• Amdahl’s law gives the theoretical speedup in latency of the
execution of a task at a fixed workload that can be expected of a
system whose resources are improved.
• This formula is often applied to parallel processing systems.

46
• The third is the overhead of thread management.
• Thread management usually includes creating, resuming, managing,
suspending, destroying and synchronizing. The management
overhead of threads is often large compared to computation,
especially synchronization.
• Critical sections and atomic sections serialize the execution and
eliminate the concurrent execution of threads.
• If used unwisely, OpenMP code can be worse than serial code
because of all the thread overhead.

47
• The fourth aspect is memory access
• Usually, memory is the main limiting factor in the performance of
shared memory programs.
• Problems such as memory access conflicts will seriously affect the
performance of the program.
• The access location of data is the key to the latency and bandwidth of
memory access.
• The most commonly used method is to optimize data locality.

48
Ideas for Improving the performance

• Based on the factors which affect the performance, there are several
performance-enhancing efforts that can be considered:
• Thread load balancing
• Minimizing synchronization.
• Using more efficient directives. For example, using PARALLEL DO/FOR
instead of worksharing DO/FOR directives in parallel regions.
• Expansion or merging of parallel regions
• Improving cache hit rate
• Optimization of round-robin scheduling
• Variable property optimization
• Optimization to reduce false sharing problem
49
OPENMP LANGUAGE
FEATURES
OpenMP Language Constructs

51
52
Parallel Construct
• A directive that creates a parallel region in the code.
• The parallel region is executed by a team of threads, and the work inside the
region is divided among them.
Working
• When a thread encounters a parallel construct, a team of threads is created to
execute the parallel region
• The thread that encountered the parallel construct becomes the master
thread of the new team, with a thread number of zero for the duration of the
new parallel region.
• All threads in the new team, including the master thread, execute the region.
• Once the team is created, the number of threads in the team remains
constant for the duration of that parallel region.

53
Parallel Region Construct

54
Syntax of Parallel Construct
• #pragma omp parallel: This is the directive that indicates the start of
a parallel region. It tells the compiler to create a team of threads to
execute the enclosed block of code in parallel.
• [clause [...]] : Optional clauses that can specify additional information
such as the number of threads to use (num_threads clause) or the
private variables (private clause).

55
OpenMP clauses
• OpenMP clauses are specified as part of a directive-specification.
• Clauses are optional and, thus, may be omitted from a directive-
specification unless otherwise specified.
• The order in which clauses appear on directives is not significant
unless otherwise specified.

56
Typical Clauses in OpenMP (Examples)
• copyin
• copyprivate
• default
• firstprivate
• if (OpenMP)
• lastprivate
• nowait
• num_threads
• ordered
• private
• reduction
• schedule
• shared
57
Example of Parallel Construct
• The #pragma omp parallel directive creates a parallel region.
• The omp_get_thread_num() function is used to obtain the thread ID,
and each thread prints a message with its ID.

58
59
Setting Threads

60
Examples (3 ways)

• omp_set_num_threads( num );

• #pragma omp parallel num_threads(7)

{
printf("Hello World");
}

61
Work Sharing Construct
• Used to assign independent work to threads.
• Facilitates the parallelization of loops and other regions of code, where each
thread takes responsibility for a specific subset of the iterations or workload.
• Work-sharing constructs help exploit parallelism in loops, sections, and other
parts of the code, distributing the workload among the available threads to
improve overall performance.
• Organization of work sharing :
• for or do: concurrent loop iterations.
• sections: concurrent tasks.
• single: block of code for single directive is executed by only one thread, a
barrier is implied in the end.

62
Features of work sharing constructs
• Work-sharing constructs do not create new threads.
• Work-sharing construct must be enclosed dynamically within a parallel region.
• A work-sharing construct distributes the execution of the corresponding
region among the members of the team that encounters it.
• Work-sharing constructs must be encountered by all members of a team or
none at all.
• A work-sharing region has no barrier on entry; however, an implied barrier
exists at the end of the work-sharing region, unless a no wait clause is
specified.
• If a no wait clause is present, an implementation may omit the barrier at the
end of the work-sharing region.
• In this case, threads that finish early may proceed straight to the instructions
that follow the work-sharing region without waiting for the other members of
the team to finish the work-sharing region. 63
“For” directive in work sharing construct
• #pragma omp parallel directive creates a team of threads, and within
that parallel region.
• #pragma omp for directive will partition parallel iterations across
threads.
• Do is the analogous directive in Fortran.
• The actual number of threads and the order in which they print the
messages varies depending on the system.

64
65
Example of “for” directive

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
int main() {
int nthreads, tid;
omp_set_num_threads(3);
#pragma omp parallel private(tid)
{
int i;
tid = omp_get_thread_num();
printf("Hello world from (%d)\n", tid);
#pragma omp for
for (i = 0; i <= 4; i++) {
printf("Iteration %d by %d\n", i, tid);
}
} // all threads join master thread and terminate
return 0; } 66
Explanation of “for” directive
• The "Hello world from" messages are printed by each thread.
• The loop is parallelized using #pragma omp for, and each thread
executes a subset of the loop iterations.
• The "Iteration" messages within the loop show which iteration is
being processed by each thread.

67
“Section” directive in work sharing
construct
• Divides the code into sections, and each section is executed in parallel
by a team of threads.
• Each section can be considered a distinct unit of work.

68
Example of Section Directive
#include <stdio.h>
#include <omp.h>

int main() {
#pragma omp parallel sections
{
#pragma omp section
{
printf("Thread %d executes section 1\n",
omp_get_thread_num());
}
#pragma omp section
{
printf("Thread %d executes section 2\n",
omp_get_thread_num());
}
}
return 0; } 69
Output of Example – Section Directive
• Two threads are created within the parallel region, and each thread
executes one of the sections.
• The omp_get_thread_num() function is used to retrieve the thread
ID, and each thread prints a message indicating which section it is
executing.
• The output can vary because the order of section execution is not
deterministic.

70
• How to allocate different work to different threads which calculate addition,
subtraction and multiplication of two numbers?

71
Single directive in work sharing construct
• Specifies a block of code that is executed by a single thread.
• Used for tasks that should be performed by only one thread, such as
initialization.

72
Example of Single Directive
#include <stdio.h>
#include <omp.h>

int main() {
#pragma omp parallel
{
// Code executed by all threads

#pragma omp single

{
// Code executed by a single thread
printf("Thread %d executes a single task\n", omp_get_thread_num());
}
}
return 0; } 73
Output of Example – Single Directive
• In this output, the single thread (in this case, with ID 0) that
encounters the single construct is the one executing the block of code
within the single construct.
• The other threads in the parallel region do not execute this particular
block of code.

74
Single directive performs the task of initialization and additional tasks are done by other threads

75
Parallel and Work-sharing Clauses
• After an OpenMP sentinel, the parallel directive name can be
followed by one or more clauses to modify the parallel construct.
• The available clauses for the OpenMP parallel construct are:

76
If Clause
• Specifies whether a loop should be executed in parallel or in serial.
• The if clause, if present, helps to determine whether the parallel
construct will be used.
• There are two situations where the parallel construct would not be
used:
• if the scalar expression in this clause evaluates to false or
• if the parallel construct appears while another parallel construct is
active

77
Syntax:
• if (scalar expression)
{
}

• Only execute in parallel if expression evaluates to true

• Otherwise, execute serially

78
79
80
Num_threads Clause
• The num_threads clause is one of the three ways to specify the
number of threads to use when creating parallel regions.
• The num_threads clause specifies, during execution, the number of
threads to use when creating the parallel region from the directive it
is part of.
• Sets the number of threads in a thread team.

• #pragma omp parallel num_threads(7)

{
printf("Hello World");
}

81
Data Scope Attribute Clause
• OpenMP programs have two different basic types of memory: private
and shared (or) in a parallel section variables can be private or shared:
• SHARED : The data within a parallel region is shared, which means
visible and accessible by all threads simultaneously. Any update
command will modify data for all threads. By default, all variables in
the work sharing region are shared except the loop iteration
counter. Shared variables must be used with care because they cause
race conditions.
• PRIVATE : Specifies that each thread should have its own instance of a
variable. Private variables are local to each thread, and each thread
gets its own private copy of the specified variables. A private variable
is not initialized, and the value is not maintained for use outside the
parallel region. By default, the loop iteration counters in the OpenMP
loop constructs are private. 82
Shared Clause
• The shared clause can be used to list variables that are shared among
all threads.
• This isn't usually necessary because most variables are shared by
default.
• A shared variable has the same address in the execution context of
every thread.
• All threads have access to shared variables.

83
Example of SHARED Clause
#include <stdio.h>
#include <omp.h>

int main() {
int x = 10;

#pragma omp parallel shared(x)

{
// All threads share the same copy of x
x += omp_get_thread_num();
printf("Thread %d: x = %d\n", omp_get_thread_num(), x);
}

// x outside the parallel region is affected by the threads

printf("Outside parallel region: x = %d\n", x);
return 0; }
84
Output of SHARED code example
• The variable x is declared outside the parallel region and is shared
among all threads by default.
• Inside the parallel region, each thread increments its own thread
number to the shared variable x.
• As a result, the value of x is modified by each thread, and the changes
made by the threads are visible outside the parallel region as well since
it is sharing the same address space.

85
Private Clause
• Direct the compiler to make one or more variables private.
• #pragma omp parallel private (var)
• #pragma omp parallel private (var1, var2,.....)
• The private clause is followed by a list of variables that will be instantiated separately for
each thread in the parallel region.
• The type of each private variable is determined by its type in the enclosing context, but
its value is undefined at the beginning of the parallel region.
• Certain global variables cannot be listed as private because their status can't be changed
to undefined.
• Each thread has its own private copy of variable, and modifications inside the parallel
region do not affect the original variable outside the parallel region.
• Use private clause can let every thread get its own copy of shared variable, but the
initial value is still unspecified
• The actual output may differ depending on the order in which threads execute.
86
Example 1

87
88
Example 2

89
90
• The two special cases of private clause are:
• Firstprivate clause
• Lastprivate clause

91
FIRSTPRIVATE Clause
• Used to declare and initialize variables within a parallel region that should be
private to each thread.
• The firstprivate clause contains a list of variables that, like the private variables,
are instantiated separately for each thread in the parallel region.
• The one difference is that instead of being undefined at the beginning of the
parallel region, the value of each variable is initialized to the value it had when
the parallel region was entered. Ie, the firstprivate clause initializes the private
copies of these variables with the values of the original variables before the
parallel region.
• The firstprivate clause is particularly useful when you want each thread to work
with a private copy of a variable and initialize it with a specific value from the
original variable before entering the parallel region.
• If you specify just private(x), the value of x in parallel loop is not initialized.
Thus, x will have random floating number. On the other hand, firstprivate(x)
will allow x to have a specific value, which is declared before a parallel region.92
Example 1

93
94
LASTPRIVATE Clause
• It performs finalization of private variables.
• The iteration that occurs last when the loop is executed sequentially.
• For each thread, a private copy of the specified variable is created.
• These private copies are used for computations within the parallel region.
• After the loop is executed, the value of the variable from the last iteration
is shared among all threads.
• The shared value is the one that was produced by the thread executing the
last iteration of the loop.
• This clause is commonly used when the variable inside the loop represents
an accumulation or result that needs to be shared among threads after the
loop completes.
• That is, lastprivate clause makes the value in last sequential iteration assigned
back to the shared variable 95
Suppose we wanted to keep the last value of x after the parallel region.
This can be achieved with lastprivate. Replace private(x) with
lastprivate(x) in the example 2 of private clause and this is the result.

96
Output:
(Notice that it is 10 and not 8. That is to say, it is the last iteration which is
kept, not the last operation.)

97
REDUCTION CLAUSE
• An operation that “combines” multiple elements to form a single result, such
as a summation, is called a reduction operation.
• A variable that accumulates this result is called a reduction variable.
• A private copy for each list variable is created for each thread.
• At the end of the loop, the reduction operation is applied to all private copies
of the shared variable, and the final result is written to the global shared
variable.
• The reduction clause lists variables upon which a reduction operation will be
done at the end of the parallel region.
• The clause also specifies the operator for the reduction.
• It performs a reduction on the variables subject to given operator.
98
• A private copy of each reduction variable is created for each thread and is
initialized as described in the table in the next slide.
• The private copies of the variables are updated as the threads execute and then
the private copies from all threads are combined at the end of the parallel region.
• The reduction operations and the corresponding initial values are given in the
next slide.
Format
• reduction(operator:list)

99
Reduction operator initial values

100
Reduction Operator Names

101
Example

102
103
Output Explanation
• From the output, we can see every thread gets a local copy of sum
(different addresses), but it is initialized to 0 (this is according to the +
operation), not an unspecified value.
• After finishing this parallel region, the local sum of every thread are
reduced to one value and combined to the original shared sum
defined in main function (the value is 110).

104
Shorthand Assignment Operators
• Reduction supports shorthand operations like +=, -=, *=, /= etc.
• Operations like sum=sum+a[i] can be written as a shorthand operation.
• It is similar for subtraction, multiplication, division and other operations.
• Associativity of shorthand operator is always right to left.

105
Data Copying Clauses

• There are two clauses:

• The copyin clause (allowed on the parallel construct and combined
parallel work-sharing constructs)
• The copyprivate clause (allowed on the single construct).
• These clauses support the copying of data values from private or
threadprivate variables on one implicit task or thread to the
corresponding variables on other implicit tasks or threads in the team.
• The clauses accept a comma-separated list of list items.
• All list items appearing in a clause must be visible, according to the
scoping rules of the base language.
• Clauses may be repeated as needed, but a list item that specifies a
given variable may not appear in more than one clause on the same
directive. 106
• Before introducing these two clauses, we need to introduce a new
data-sharing attribute, named threadprivate.
• The difference between private variables and thread private variables
can be briefly described as follows:
• Private variables are local to a region and are placed on the stack most of the
time. The lifetime of a private variable is the duration defined by the data
scope clause. Every thread, including the main thread, makes a private copy
of the original variable, and the new variable is no longer storage-associated
to the original.
• Threadprivate variables are persist across regions, most likely to be placed in
the heap or in thread-local storage, which can be seen as being stored in a
local memory local to the thread. The main thread uses the original variable
and other threads make private copies of the original variable. The host
variable is still store-associated with the original variable.

107
Copyin Clause
• The copyin clause is the one of two data copying clauses that can be used
on the parallel construct or combined parallel worksharing constructs.
• The copyin clause causes the listed variables to be copied from the
primary thread to all other threads in the team immediately after the
threads have been created and before they do any other work.
• The variables in the list must also be threadprivate because if they were
shared, copying would be meaningless.
• Specifies that the value of the master thread's threadprivate variable or
list of threadprivate variables is copied to the threadprivate variable of
the other threads in the parallel region.
• Format
• copyin(list) (or) copyin(var)
108
Copyprivate Clause
• The copyprivate clause which is only allowed on the single construct.
• Specifies that the value of a variable or list of variables acquired by one thread is
shared with all other threads.
• The typical usage is to have one thread read or initialize private data that is
subsequently used by the other threads as well.
• After the single construct has ended, but before the threads have left the associated
barrier, the values of variables specified in the associated list are copied to the other
threads.
• Do not use copyprivate in combination with the nowait clause
• The copyprivate clause provides a mechanism to use a private variable to broadcast a
value from the data environment of one implicit task to the data environments of the
other implicit tasks that belong to the parallel region.
• To avoid data races, concurrent reads or updates of the list item must be synchronized
with the update of the list item that occurs as a result of the copyprivate clause.
Format
• copyprivate (list) 109
Default Clause
• The default clause sets the sharing status of any variables whose
sharing status is not explicitly determined.
• The default clause determines the implicit data-sharing attribute of
certain variables that are referenced in the construct
• The private option is available only in Fortran.
• The none option requires that every variable referenced in the parallel
region have a sharing attribute.
Format
• default (shared | none)
• #pragma omp parallel default(none) shared(n,x) reduction(+: sum)
num_threads(8)

110
• Default clause specifies the scope of variables within the parallel
region that are not explicity scoped by another clause. The following
options are valid modes for this clause:
• shared—Sets all variables in the parallel region whose sharing
attributes are implicitly determined to shared. That is, in effect if
the default clause is unspecified, means that any variable in a
parallel region will be treated as if it were specified with the
shared clause.
• none—Requires each variable in the parallel region be scoped by
an OpenMP clause. This means that any variables used in a parallel
region that aren't scoped with the private, shared, reduction,
firstprivate, or lastprivate clause will cause a compiler error.

111
End of Module 2
(THANK YOU)

112

Introduction to OpenMP Basics
No ratings yet
Introduction to OpenMP Basics
152 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
OpenMP Overview and Programming Model
No ratings yet
OpenMP Overview and Programming Model
46 pages
Updated - CS8083 MCP UNIT III Notes
No ratings yet
Updated - CS8083 MCP UNIT III Notes
26 pages
HPC - Unit 3
No ratings yet
HPC - Unit 3
15 pages
Unit 3
No ratings yet
Unit 3
13 pages
21th 22th Lecture
No ratings yet
21th 22th Lecture
22 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
40 pages
OpenMP Intro
No ratings yet
OpenMP Intro
52 pages
Program Excecution ExpFinal
No ratings yet
Program Excecution ExpFinal
10 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
29 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
OpenMP Guide for Parallel Computing
No ratings yet
OpenMP Guide for Parallel Computing
32 pages
OpenMP Synchronization Guide
No ratings yet
OpenMP Synchronization Guide
32 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
Introduction to OpenMP Programming
No ratings yet
Introduction to OpenMP Programming
35 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
25 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
OpenMP Overview for Parallel Computing
No ratings yet
OpenMP Overview for Parallel Computing
21 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
11 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Omp Sync Data Runtime Environment
No ratings yet
Omp Sync Data Runtime Environment
59 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
NGK Openmp
No ratings yet
NGK Openmp
13 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
Open MP
No ratings yet
Open MP
28 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OpenMP Library Functions Overview
No ratings yet
OpenMP Library Functions Overview
26 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
OpenMP Guide for C/Fortran Programming
No ratings yet
OpenMP Guide for C/Fortran Programming
15 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
Open MP
No ratings yet
Open MP
30 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Parallel Programming Module 3
No ratings yet
Parallel Programming Module 3
44 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
OpenMP Multithreaded Programming Guide
No ratings yet
OpenMP Multithreaded Programming Guide
27 pages
OpenMP Programming Overview and Examples
No ratings yet
OpenMP Programming Overview and Examples
46 pages
OpenMP for Multi-Core Programming
No ratings yet
OpenMP for Multi-Core Programming
100 pages
OpenMP Pragma Directives
No ratings yet
OpenMP Pragma Directives
4 pages
OpenMP Basics for Programmers
No ratings yet
OpenMP Basics for Programmers
5 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
10 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
Cryptarithmetic Puzzles and Solutions
No ratings yet
Cryptarithmetic Puzzles and Solutions
15 pages
Capstone 101
No ratings yet
Capstone 101
14 pages
Topic ATM Transaction Flow
No ratings yet
Topic ATM Transaction Flow
14 pages
Computer - Cheat Sheet by SSC Panacea
No ratings yet
Computer - Cheat Sheet by SSC Panacea
13 pages
HP Thinclient t240 Internal HPDM Repository For External Client
No ratings yet
HP Thinclient t240 Internal HPDM Repository For External Client
9 pages
Information Seeking Behavior
No ratings yet
Information Seeking Behavior
27 pages
How To Submit Your ENG1512 Assignments
No ratings yet
How To Submit Your ENG1512 Assignments
6 pages
Dual Comparators for Engineers
No ratings yet
Dual Comparators for Engineers
54 pages
Data Structures & Algorithms Guide
No ratings yet
Data Structures & Algorithms Guide
12 pages
AIM (Unit 1)
No ratings yet
AIM (Unit 1)
46 pages
PyCDS 05 ListsTuples
No ratings yet
PyCDS 05 ListsTuples
18 pages
BTS-ISEM-INS-001 Setup Internet Information Services (IIS)
No ratings yet
BTS-ISEM-INS-001 Setup Internet Information Services (IIS)
12 pages
Internal HC3 Vs Nutanix BattleCard
No ratings yet
Internal HC3 Vs Nutanix BattleCard
2 pages
Goodwe 5048D-ES Installation Guide
No ratings yet
Goodwe 5048D-ES Installation Guide
13 pages
Bba Statistics Assignment
No ratings yet
Bba Statistics Assignment
2 pages
Lift1e l1 U01 Assessment
No ratings yet
Lift1e l1 U01 Assessment
13 pages
ISM Syllabus
No ratings yet
ISM Syllabus
2 pages
Naplan 2012 Final Test Numeracy Year 9 (Calculator)
No ratings yet
Naplan 2012 Final Test Numeracy Year 9 (Calculator)
13 pages
Epson M15140 Manual
No ratings yet
Epson M15140 Manual
345 pages
Special Thanks To:: Djeman
No ratings yet
Special Thanks To:: Djeman
32 pages
Log
No ratings yet
Log
5 pages
SONY-XBR65X755D SM
No ratings yet
SONY-XBR65X755D SM
31 pages
Advanced Task1 Network Vapt
No ratings yet
Advanced Task1 Network Vapt
17 pages
Java Class and Variable Naming Rules
No ratings yet
Java Class and Variable Naming Rules
18 pages
Computer Science: Pearson Edexcel International GCSE
No ratings yet
Computer Science: Pearson Edexcel International GCSE
9 pages
Class 12 IP: Pandas Code Examples
No ratings yet
Class 12 IP: Pandas Code Examples
5 pages
FujiFilm FinePix 4700 Owner's Manual
No ratings yet
FujiFilm FinePix 4700 Owner's Manual
63 pages
SEC Staff Report On Algorithmic Trading in US Capital Markets
No ratings yet
SEC Staff Report On Algorithmic Trading in US Capital Markets
99 pages
Kisan360: Digital Marketing for Farmers
No ratings yet
Kisan360: Digital Marketing for Farmers
14 pages
How To Install Jellyfin On TrueNAS in 2023 - WunderTech
No ratings yet
How To Install Jellyfin On TrueNAS in 2023 - WunderTech
6 pages

Parallel Programming Module 2

Uploaded by

Parallel Programming Module 2

Uploaded by

DS3202:

OpenMP - Runtime Subroutines/Functions

• Shared memory, thread-based parallelism

• Thread is independent sequence of execution of program code.

#pragma omp parallel

• Suppose we are adding this code in a C language program.

#pragma omp parallel num_threads(7)

• If you want create specific number of threads, use num_threads()

#pragma omp parallel for

• In above code snippet, since we are not mentioning number of threads,

#pragma omp section

#pragma omp section

High level synchronization:

#pragma omp atomic

#pragma omp critical

• For parallel programs, performance is the primary concern.

• Performance of the serial regions

• #pragma omp parallel num_threads(7)

#pragma omp single

• Only execute in parallel if expression evaluates to true

• #pragma omp parallel num_threads(7)

#pragma omp parallel shared(x)

// x outside the parallel region is affected by the threads

• There are two clauses:

You might also like