0% found this document useful (0 votes)

8 views76 pages

CS-3006 5 UsingOpenMP SharedMemoryProgramming

The document provides an overview of OpenMP, a standard for parallel programming in shared memory systems, emphasizing its goals, syntax, and execution model. It discusses the use of compiler directives for parallelism, the management of shared and private data, and various constructs for work-sharing and synchronization. Additionally, it highlights the importance of thread management and the critical section problem in concurrent programming.

Uploaded by

i210792 Jahan Zaib Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views76 pages

CS-3006 5 UsingOpenMP SharedMemoryProgramming

Uploaded by

i210792 Jahan Zaib Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 76

Shared Memory Parallel Systems -

OpenMP

Dr. Muhammad Mateen Yaqoob,

Department of AI & DS,

National University of Computer & Emerging Sciences,
Islamabad Campus
System Architecture
Sequential Program Execution
Parallel computing: Shared Memory Model
OpenMP
Goals
• Standardization
– Provide a standard among a variety of shared
memory architectures (platforms)
• High-level interfaces to thread programming
• Multi-vendor support
• Multi-OS support (Unix, Windows, Mac, etc.)
• The MP in OpenMP is for Multi-Processing
• Don’t confuse OpenMP with Open MPI! :)
Release History
Programming Shared Memory Systems
• Explicit Parallelism
– For example, pthreads

• Programmer Directed
– For example, OpenMP
Shared Memory Programming

• Pthreads
• C++ Threads
• OpenMP
pthreads
C++ threads
OpenMP
Compiling
Intel (icc, ifort, icpc)
-openmp
PGI (pgcc, pgf90, …)
-mp
GNU (gcc, gfortran, g++)
-fopenmp
OpenMP - User Interface Model
• Shared Memory with thread based parallelism

• Not a new language

• Compiler directives, library calls, and environment

variables extend the base language
– f77, f90, f95, C, C++

• Not automatic parallelization

– User explicitly specifies parallelism
– NOTE: Compiler does not ignore user directives even if
wrong
OpenMP - Syntax
• Parallelism is highlighted using compiler directives
or pragmas

• For C and C++, the pragmas take the form:

#pragma omp construct [clause [clause]…]

• Any compiler (even if it does not have OpenMP support)

can compile the program (with no parallelism though)
Fork*/Join Execution Model
• An OpenMP program starts as a single thread (master thread).
• Additional threads (Team) are created when the master hits a
parallel region.
• When all threads finished the parallel region, the new threads are
given back to the runtime or operating system

*Not to be confused with fork() system call

Using OpenMP
• OpenMP is usually used to parallelize loops:
– Find most time consuming loops
– Split them among threads

Split-up this loop between multiple threads

void main( ) void main( )
{ {
double Res[1000]; double Res[1000];
#pragma omp parallel for
for(int i=0;i<1000;i++) { for(int i=0;i<1000;i++) {
do_huge_comp(Res[i]); do_huge_comp(Res[i]);
} }
} }
Sequential program Parallel program
OpenMP Directives
OpenMP - Directives
• OpenMP compiler directives are used for various
purposes:
– Spawning a parallel region
– Dividing blocks of code among threads
– Distributing loop iterations between threads
–…

sentinel directive-name [clause, ...]

#pragma omp parallel private(var)

Supported Clauses for the Parallel Construct

Valid Clauses:
if (logical expression)
num_threads (integer)
private (list of variables)
firstprivate (list of variables)
shared (list of variables)
default (none|shared|private *fortran only*)
copyin (list of variables)
reduction (operator: list)
…
OpenMP Constructs
• OpenMP constructs can be divided into 5
categories:
1. Parallel Regions
2. Work-sharing
3. Data Environment
4. Synchronization
5. Runtime functions/environment variables
OpenMP: Parallel Regions
• You create threads in OpenMP with “omp parallel” pragma
• For example: a 4-thread based Parallel region:
int A[10];
omp_set_num_threads(4);
#pragma omp parallel Demo: helloFun.c
{
int ID =omp_get_thread_num();
fun1(ID,A);
}

• Implicit barrier at the end of parallel block

• Each thread calls fun1(ID,A) for ID = 0 to 3
• Each thread executes the same code within the block
Credits: University of Houston
The parallel directive
• A parallel region is a block of code that will be executed by
multiple threads
• When (in serial program) a PARALLEL directive is found, a
team of threads is created and main-thread (serial
execution thread) becomes the master of the team
• Master thread has id or number 0 (within that team)
• The code is duplicated and all threads will execute that
code
• There is an implicit barrier at the end of a parallel region
• Master thread continues execution after this point
The parallel directive
• Some common clauses include:
– if (expression)
– private (list)
– shared (list)
– num_threads (integer-expression)
How Many Threads?
• The number of threads in a parallel region is determined
by the following factors, in order of precedence:
1. Evaluation of the if clause
2. Setting of the NUM_THREADS clause
3. Use of the omp_set_num_threads( ) library function
4. Setting of the OMP_NUM_THREADS environment
variable
5. Implementation default: Usually the number of CPUs on
a node

• Threads are numbered from 0 (master thread) to N-1

IF clause

• Execute in parallel if expression is true

• Otherwise serial execution
NUM_THREADS clause

#pragma omp parallel if(np>1)

num_threads(np)
{
…
}

• Execute in parallel if expression is true

• Executes using np number of threads
omp_set_num_threads( ) function
#define TOTAL_THREADS 8
int main( )
{
omp_set_num_threads(TOTAL_THREADS);
#pragma omp parallel
{
. . .
}
. . .

• Execute in parallel using 8 threads

OMP_NUM_THREADS – Environment Variable

$ export OMP_NUM_THREADS=4
$ echo $OMP_NUM_THREADS

• Sets and displays the value of the environment

variable OMP_NUM_THREADS
Execution Status in Parallel Region

int omp_in_parallel()

• Returns non-zero: if execution is in parallel region

• Returns zero: if execution in non-parallel region

Demo: PRegion.c
Shared and Private Data
• Shared data are accessible by all threads

• A reference a[5] to a shared array accesses the

same address in all threads

• Private data are accessible only by a thread

– Each thread has its own copy

• The default is shared

Shared and Private Data
int main(int argc, char* argv[])
{
int threadData = 10;

// Beginning of parallel region

#pragma omp parallel private(threadData)
{
threadData =200;
}

// Ending of parallel region

printf("Value: %d\n", threadData);
}

Demo: SPData.c
Shared and Private Data
#pragma omp parallel shared(list)

• Default behavior
• List will be shared
• Each thread access the same memory location
• Initial value (for the first thread) will be same as before
the region
• Final value will be updated by the last thread leaving the
region
• Problems: Data Race
Shared and Private Data

• Data local to thread

• You should not rely on any initial and terminal value
(after execution of the parallel region)
• Separate “Stack Memory” for each thread’s private data
• No storage associated with original object (even with
same name for data-items)

• Use firstprivate and/or lastprivate clause to override

Shared and Private Data

• Variables in list are private

• Initialized with the value the variable had before entering
the construct

• Used in “for” loops

• Variables in list are private
• The thread that executes the final iteration of the loop
Shared and Private Data
#pragma omp parallel default (private) shared(list)
#pragma omp parallel default (shared) private(list)
#pragma omp parallel default (none) private(list)
shared(list)

• Alter the default behavior

• To implement customized access behavior
Shared and Private Data – Example (1/4)

Demo: SPDE1.c
Shared and Private Data – Example (2/4)

Demo: SPDE1.c
Shared and Private Data – Example (3/4)

Demo: SPDE1.c
Shared and Private Data – Example (4/4)

Demo: SPDE1.c
Getting ID of Current Thread
int main(int argc, char* argv[])
{
int iam, nthreads;
#pragma omp parallel private(iam,nthreads)
num_threads(2)
{
iam = omp_get_thread_num();
nthreads = omp_get_num_threads();
printf(“ThradID %d, out of %d threads\n”, iam,
nthreads);
if (iam == 0)
printf(“Here is the Master Thread.\n”);
else
printf(“Here is another thread.\n”);
}

}
Demo: CTID.c
Work-Sharing Constructs
• If all the threads are doing the same thing, what is the
advantage then?

• Within each “Team” threads are assigned IDs, with master

thread assigned ID 0
– omp_get_thread_num() //to get thread number

Can we use this to distribute tasks amongst the

“team” members?

• Work-sharing constructs distribute the specified work to

all threads within the current team
For Work-Sharing Construct
• for shares iterations of a loop across the team

#pragma omp for [clause ...] newline

There is an implicit synchronization after

#pragma omp for
Loop work-sharing visualized
For Work-Sharing Construct
• SCHEDULE clause describes how iterations of the
loop are divided among the threads in the team

Chunks of specified size assigned round-robin

Chunks of specified size are assigned when thread finishes

previous chunk (work-Stealing mechanism)
Do/For Work-Sharing Construct
int main(int argc, char* argv[])
{
int i, a[10];
#pragma omp parallel num_threads(2)
{
#pragma omp for schedule(static, 2)
for ( i=0; i<10;i++)
a[i] = omp_get_thread_num();
}

for ( i=0; i<10;i++)

printf("%d",a[i]);
}

Demo: ForConst.c
Do/For Work-Sharing Construct
int main(int argc, char* argv[])
{
int sum, counter, inputList[6] = {11,45,3,5,12,-3};
#pragma omp parallel num_threads(2)
{
#pragma omp for schedule(static, 3)
for (counter=0; counter<6; counter++) {
printf("%d adding %d to the
sum\n",omp_get_thread_num(),
inputList[counter]);

sum+=inputList[counter];
} //end of for
} //end of parallel section

printf("The summed up Value: %d", sum);

}
Demo: ForConst2.c
For Work-Sharing –Synchronized
For Work-Sharing – Non Synchronized
Problems with Static Scheduling
• What happens if loop iterations do not take the same
amount of time?
 Load imbalance
Dynamic Scheduling
• Fixed size chunks assigned on the fly
• Work-stealing mechanism

• Disadvantage: more overhead as compared to Static

Demo: LoopSched.c
Guided Schedule
• Each thread also executes a chunk, and when a thread
finishes a chunk, it requests another one.
• However, in a guided schedule, as chunks are completed
the size of the new chunks decreases.
• If no chunksize is specified, the size of the chunks
decreases down to 1.
• If chunksize is specified, it decreases down to chunksize,
with the exception that the very last chunk can be smaller
than chunksize.

Ordered Clause
• Must appear within the context of a
– omp for
– omp parallel for

• Executed in the same order in which iterations are

executed in a sequential loop

int sharedData = 0;
void* incrementData(void* arg) {
sharedData++;
pthread_exit(NULL);
}

int main()
{
pthread_t threadID;
for (int counter=0; counter<NUM_THREADS;counter++) {
pthread_create(&threadID, NULL, incrementData, NULL);
}
cout << "ThreadCount:" << sharedData <<endl;
pthread_exit(NULL);
}
The output for the pthread version?

>./globalData
ThreadCount:10

>./globalData
ThreadCount:8
ThreadCount: A better implementation
#include <pthread.h>
#include <iostream>
#include <unistd.h>
using namespace std;
#define NUM_THREADS 100
int sharedData = 0;
void* incrementData(void* arg)
{
sharedData++;
pthread_exit(NULL); }

int main()
{
pthread_t threadID[NUM_THREADS];
for (int counter=0; counter<NUM_THREADS;counter++) {
pthread_create(&threadID[counter], NULL, incrementData, NULL);
}
//waiting for all threads
int statusReturned;
for (int counter=0; counter<NUM_THREADS;counter++) {
pthread_join(threadID[counter], NULL);
}
cout << "ThreadCount:" << sharedData <<endl;
pthread_exit(NULL);
}
Is the problem solved?
• Unfortunately, not yet :(
• The output from running it with 1000 threads is as below:

>./6join
ThreadCount:990
>./6join
ThreadCount:978
>./6join
ThreadCount:1000
>

• Reasons?
• What can be done?
ThreadCount: OpenMP Implementation
int main(int argc, char* argv[])
{
int threadCount=0;
#pragma omp parallel num_threads(100)
{
int myLocalCount = threadCount;
threadCount++;
sleep(1);
myLocalCount++;
myLocalCount++;
threadCount = myLocalCount;
threadCount = myLocalCount;

}
printf("Total Number of Threads: %d\n", threadCount);
}

Demo: TCount1.c
Critical-Section (CS) Problem
 n processes all competing to use some shared data
 Each process has a code segment, called critical section,
in which the shared data is accessed

 Problem (ensures that):

– Two process are not allowed to execute in their critical
section at the same time
– Access to the critical section must be an atomic action
Critical Section
A leaves critical section
A enters critical section

Thread A

B enters critical section

B blocked

Thread B

T1 T2 T3 T4
B attempts to enter B leaves
critical section critical section

Mutual Exclusion
At any given time, only one Thread is in the critical
…back to threads counting
int sharedData = 0;
pthread_mutex_t mutexIncrement;

void* incrementData(void* arg)

{
pthread_mutex_lock(&mutexIncrement);
sharedData++;
pthread_mutex_unlock(&mutexIncrement);
pthread_exit(NULL);
}

int main()
{
pthread_mutex_init(&mutexIncrement, NULL);

pthread_t threadID[NUM_THREADS];
for (int counter=0; counter<NUM_THREADS;counter++) {
pthread_create(&threadID[counter], NULL, incrementData, NULL);
}
//waiting for all threads
int statusReturned;
for (int counter=0; counter<NUM_THREADS;counter++) {
pthread_join(threadID[counter], NULL);
}
cout << "ThreadCount:" << sharedData <<endl;
pthread_exit(NULL);
}
OpenMP - Synchronization Constructs
• The CRITICAL directive specifies a region of code that
must be executed by only one thread at a time

• If a thread is currently executing inside a CRITICAL region

and another thread attempts to execute it, it will block
until the first thread exits that CRITICAL region.

#pragma omp critical [ name ]

…
… back to threadCount
int main(int argc, char* argv[])
{
int threadCount;
#pragma omp parallel num_threads(5)
{
#pragma omp critical
{
int myLocalCount = threadCount;
sleep(1);
myLocalCount++;
threadCount = myLocalCount;
}
}
printf("Total Number of Threads: %d\n", threadCount);
}

Demo: TCount2.c
OpenMP - Synchronization Constructs
• The MASTER directive specifies a region that is to
be executed only by the master thread of the
team

• All other threads on the team skip this section of

code

#pragma omp master

…

Demo: MasterOnly.c
OpenMP - Synchronization Constructs
• When a BARRIER directive is reached, a thread will wait
at that point until all other threads have reached that
barrier

• All threads then resume executing in parallel the code

that follows the barrier.

#pragma omp barrier

…
Barrier Synchronization

all here?

Demo: Barrier.c
Reduction (Data-sharing Attribute Clause)
• The REDUCTION clause performs a reduction operation on
the variables that appear in the list
• A private copy for each list variable is created and initialized
for each thread
• At the end of the reduction, the reduction variable (all private
copies) is examined and the shared variable’s final result is
written.

#pragma omp operator: list

…
operator can be +,-,*,&&,||,max,min …
Reduction (Data-sharing Attribute Clause)
int main(int argc, char* argv[])
{
srand(time(NULL));
int winner;
#pragma omp parallel reduction(max:winner) num_threads(10)
{
winner = (rand() % 1000) + omp_get_thread_num();
printf("Thread: %d has Chosen: %d\n",
omp_get_thread_num(),winner);
}
printf("Winner: %d\n", winner);

Demo: Reduction.c
Practice
• Each encountering thread creates a task
– Package code and data environment
– Can be nested
• Inside parallel regions
• Inside other tasks
• Inside worksharing
• An OpenMP barrier (implicit or explicit):
– All tasks created by any thread of the current team are guaranteed to be completed
at barrier exit.
• Data Scope Clauses
– Shared, private, default, firstprivate, lastprivate
• Task Synchronization
– Barrier, atomic
• Task barrier (taskwait):
– Encountering thread suspends until all child tasks it has generated are complete.
Serial PI program
Practice
Serial Computation of Fibonacci
Practice
• Each encountering thread creates a task
– Package code and data environment
– Can be nested
• Inside parallel regions
• Inside other tasks
• Inside worksharing
• An OpenMP barrier (implicit or explicit):
– All tasks created by any thread of the current team are guaranteed to be completed
at barrier exit.
• Data Scope Clauses
– Shared, private, default, firstprivate, lastprivate
• Task Synchronization
– Barrier, atomic
• Task barrier (taskwait):
– Encountering thread suspends until all child tasks it has generated are complete.
Any Questions?

QP-STD-Q-004 R1 Quality Reqts For Projects
100% (7)
QP-STD-Q-004 R1 Quality Reqts For Projects
49 pages
Historical Book: Edificio Polifunzionale. Studio Passarelli.
No ratings yet
Historical Book: Edificio Polifunzionale. Studio Passarelli.
21 pages
Bridge Bearings & Expansion Joints
100% (3)
Bridge Bearings & Expansion Joints
16 pages
Techno-Commercial Proposal: Rooftop Solar Power Plant - 100 KW NTPC Limited
No ratings yet
Techno-Commercial Proposal: Rooftop Solar Power Plant - 100 KW NTPC Limited
11 pages
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
No ratings yet
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
61 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Open MP
No ratings yet
Open MP
35 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Num Tech
No ratings yet
Num Tech
39 pages
Lecture 10 Shared Memory Programming with OpenMP.pptx
No ratings yet
Lecture 10 Shared Memory Programming with OpenMP.pptx
30 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
openmp_HPC_ass1
No ratings yet
openmp_HPC_ass1
43 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Unit III
No ratings yet
Unit III
15 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
OPENMP
No ratings yet
OPENMP
37 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Week5 Lec13 OpenMP Parallel Sum
No ratings yet
Week5 Lec13 OpenMP Parallel Sum
14 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Open MP
No ratings yet
Open MP
30 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
UNIT 3
No ratings yet
UNIT 3
13 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
No ratings yet
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
50 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
UNIT III
No ratings yet
UNIT III
61 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Program Excecution ExpFinal
No ratings yet
Program Excecution ExpFinal
10 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Openmp
No ratings yet
Openmp
115 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
High Performance Computing (HPC) Lec4
No ratings yet
High Performance Computing (HPC) Lec4
32 pages
Openmp: Martin Kruliš Ji Ří Dokulil
No ratings yet
Openmp: Martin Kruliš Ji Ří Dokulil
38 pages
Parallel Programming 2
No ratings yet
Parallel Programming 2
20 pages
OPENMP Language Features - Part 1 - 2
No ratings yet
OPENMP Language Features - Part 1 - 2
38 pages
openMP
No ratings yet
openMP
28 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Mastering Python
From Everand
Mastering Python
Rick van Hattem
No ratings yet
me261_24-25_hw6c
No ratings yet
me261_24-25_hw6c
10 pages
Lecture 2. Measuring Tools-Rules and Calipers
No ratings yet
Lecture 2. Measuring Tools-Rules and Calipers
45 pages
Unit 5.Software Reliability
No ratings yet
Unit 5.Software Reliability
15 pages
Determinants Master Teacher II
100% (1)
Determinants Master Teacher II
1 page
Product Brochure: Manufacturers and Stockists of High Pressure Pipeline and Drilling Equipment
No ratings yet
Product Brochure: Manufacturers and Stockists of High Pressure Pipeline and Drilling Equipment
30 pages
Time Table IMO Model Course 1.08
100% (1)
Time Table IMO Model Course 1.08
2 pages
Debye Model For Specific Heat
No ratings yet
Debye Model For Specific Heat
4 pages
QUARTER 1 MELC 2 Plate Boundaries
No ratings yet
QUARTER 1 MELC 2 Plate Boundaries
19 pages
CSE - CS403 - OPERATING SYSTEMS - R21 - Booklet
No ratings yet
CSE - CS403 - OPERATING SYSTEMS - R21 - Booklet
2 pages
Communication Action Meaning - Pearce Cronen Searchable Cropped PDF
100% (1)
Communication Action Meaning - Pearce Cronen Searchable Cropped PDF
345 pages
The Real You by DR - Sudipta Rath
100% (2)
The Real You by DR - Sudipta Rath
84 pages
MS2-Sequence1-Me-my Friends and My Family
No ratings yet
MS2-Sequence1-Me-my Friends and My Family
28 pages
XVZ1300TF: Owner'S Manual
No ratings yet
XVZ1300TF: Owner'S Manual
132 pages
Baic Immo
No ratings yet
Baic Immo
21 pages
Research Proposal Sample
No ratings yet
Research Proposal Sample
4 pages
Theory of Cognitive Development (Jean Piaget)
No ratings yet
Theory of Cognitive Development (Jean Piaget)
6 pages
Page 133 (Get Smart Plus 4)
No ratings yet
Page 133 (Get Smart Plus 4)
2 pages
Vocational Electives - Group-A (2016-2018) Subject Code Subject Name (Class Xi)
No ratings yet
Vocational Electives - Group-A (2016-2018) Subject Code Subject Name (Class Xi)
8 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
20 pages
BH Series Table
No ratings yet
BH Series Table
1 page
Evaluating AI Agents
No ratings yet
Evaluating AI Agents
22 pages
Zelio Time Re8rb11bu
No ratings yet
Zelio Time Re8rb11bu
2 pages
8 Cohesive Devices Respect Quiz
No ratings yet
8 Cohesive Devices Respect Quiz
3 pages
Thrive: Solar LED Home Lighting System
No ratings yet
Thrive: Solar LED Home Lighting System
2 pages
Kiln Alignment Analysis
100% (6)
Kiln Alignment Analysis
32 pages
Data Structures Using Java
No ratings yet
Data Structures Using Java
16 pages

CS-3006 5 UsingOpenMP SharedMemoryProgramming

Uploaded by

CS-3006 5 UsingOpenMP SharedMemoryProgramming

Uploaded by

Shared Memory Parallel Systems -

Dr. Muhammad Mateen Yaqoob,

Department of AI & DS,

• Not a new language

• Compiler directives, library calls, and environment

• Not automatic parallelization

• For C and C++, the pragmas take the form:

• Any compiler (even if it does not have OpenMP support)

*Not to be confused with fork() system call

Split-up this loop between multiple threads

sentinel directive-name [clause, ...]

#pragma omp parallel private(var)

• Implicit barrier at the end of parallel block

• Threads are numbered from 0 (master thread) to N-1

• Execute in parallel if expression is true

#pragma omp parallel if(np>1)

• Execute in parallel if expression is true

• Execute in parallel using 8 threads

• Sets and displays the value of the environment

• Returns non-zero: if execution is in parallel region

• A reference a[5] to a shared array accesses the

• Private data are accessible only by a thread

– Each thread has its own copy

• The default is shared

// Beginning of parallel region

// Ending of parallel region

• Data local to thread

• Use firstprivate and/or lastprivate clause to override

• Variables in list are private

• Used in “for” loops

• Alter the default behavior

• Within each “Team” threads are assigned IDs, with master

Can we use this to distribute tasks amongst the

• Work-sharing constructs distribute the specified work to

#pragma omp for [clause ...] newline

There is an implicit synchronization after

Chunks of specified size assigned round-robin

Chunks of specified size are assigned when thread finishes

for ( i=0; i<10;i++)

printf("The summed up Value: %d", sum);

• Disadvantage: more overhead as compared to Static

Credits: Copyright © 2010, Elsevier Inc.

• Executed in the same order in which iterations are

 Problem (ensures that):

B enters critical section

void* incrementData(void* arg)

• If a thread is currently executing inside a CRITICAL region

#pragma omp critical [ name ]

• All other threads on the team skip this section of

#pragma omp master

• All threads then resume executing in parallel the code

#pragma omp barrier

#pragma omp operator: list

You might also like