CS-3006_8_UsingOpenMP_SharedMemoryProgramming
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
Systems - OpenMP
• Programmer Directed
– For example, OpenMP
Hello World – pthreads based version
#include <pthread.h>
#include <stdio.h>
int main(void) {
pthread_t thread[4];
pthread_attr_t attr;
int arg[4] = {0,1,2,3};
int i;
// setup joinable threads
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_JOINABLE);
// create N threads
for(i=0; i<4; i++)
pthread_create(&thread[i], &attr, thrfunc,
(void*)&arg[i]);
Demo: hello.c
Compiling
Intel (icc, ifort, icpc)
-openmp
PGI (pgcc, pgf90, …)
-mp
GNU (gcc, gfortran, g++)
-fopenmp
OpenMP - User Interface Model
• Shared Memory with thread based parallelism
$ export OMP_NUM_THREADS=4
$ echo $OMP_NUM_THREADS
int omp_in_parallel()
Demo: PRegion.c
Shared and Private Data
• Shared data are accessible by all threads
Demo: SPData.c
Shared and Private Data
#pragma omp parallel shared(list)
• Default behavior
• List will be shared
• Each thread access the same memory location
• Initial value (for the first thread) will be same as before
the region
• Final value will be updated by the last thread leaving the
region
• Problems: Data Race
Shared and Private Data
Demo: SPDE1.c
Shared and Private Data – Example (2/4)
Demo: SPDE1.c
Shared and Private Data – Example (3/4)
Demo: SPDE1.c
Shared and Private Data – Example (4/4)
Demo: SPDE1.c
Getting ID of Current Thread
int main(int argc, char* argv[])
{
int iam, nthreads;
#pragma omp parallel private(iam,nthreads)
num_threads(2)
{
iam = omp_get_thread_num();
nthreads = omp_get_num_threads();
printf(“ThradID %d, out of %d threads\n”, iam,
nthreads);
if (iam == 0)
printf(“Here is the Master Thread.\n”);
else
printf(“Here is another thread.\n”);
}
}
Demo: CTID.c
Work-Sharing Constructs
• If all the threads are doing the same thing, what is the
advantage then?
Demo: ForConst.c
Do/For Work-Sharing Construct
int main(int argc, char* argv[])
{
int sum, counter, inputList[6] = {11,45,3,5,12,-3};
#pragma omp parallel num_threads(2)
{
#pragma omp for schedule(static, 3)
for (counter=0; counter<6; counter++) {
printf("%d adding %d to the
sum\n",omp_get_thread_num(),
inputList[counter]);
sum+=inputList[counter];
} //end of for
} //end of parallel section
Demo: ForConst2.c
For Work-Sharing –Synchronized
For Work-Sharing – Non Synchronized
Problems with Static Scheduling
• What happens if loop iterations do not take the same
amount of time?
▪ Load imbalance
Dynamic Scheduling
• Fixed size chunks assigned on the fly
• Work-stealing mechanism
Demo: LoopSched.c
ThreadCount: OpenMP Implementation
int main(int argc, char* argv[])
{
int threadCount=0;
#pragma omp parallel num_threads(100)
{
int myLocalCount = threadCount;
threadCount++;
sleep(1);
myLocalCount++;
myLocalCount++;
threadCount = myLocalCount;
threadCount = myLocalCount;
}
printf("Total Number of Threads: %d\n", threadCount);
}
Demo: TCount1.c
Critical-Section (CS) Problem
⮚ n processes all competing to use some shared data
⮚ Each process has a code segment, called critical section,
in which the shared data is accessed
Process A
Process B
T T T T
1 2 3
B attempts to enter 4
B leaves
critical section critical section
Mutual Exclusion
At any given time, only one process is in the critical
OpenMP - Synchronization Constructs
• The CRITICAL directive specifies a region of code that
must be executed by only one thread at a time
Demo: TCount2.c
OpenMP - Synchronization Constructs
• The MASTER directive specifies a region that is to
be executed only by the master thread of the
team
Demo: MasterOnly.c
OpenMP - Synchronization Constructs
• When a BARRIER directive is reached, a thread will wait
at that point until all other threads have reached that
barrier
all here?
Demo: Barrier.c
Reduction (Data-sharing Attribute Clause)
• The REDUCTION clause performs a reduction operation on
the variables that appear in the list
• A private copy for each list variable is created and initialized
for each thread
• At the end of the reduction, the reduction variable (all private
copies) is examined and the shared variable’s final result is
written.
Demo: Reduction.c
Any Questions?