Parallel and Distributed Computing
CS3006 (BCS-6E/6F)
Lecture 13
Instructor: Dr. Syed Mohammad Irteza
Assistant Professor, FAST School of Computing
NUCES Lahore
10 March, 2025
Previous Lecture: OpenMP
• Scheduling in OpenMP
• static, dynamic, guided, runtime
• Useful OpenMP clauses:
• private, firstprivate, lastprivate, shared
• Conditional parallelism: the if clause
• OpenMP task construct
2
Parallelizing linked lists
[omp task illustration]
CS3006 Fall 2022
3
Environment Variables
• OpenMP provides additional environment variables that help control
execution of parallel programs
• OMP_NUM_THREADS
• OMP_DYNAMIC
• OMP_SCHEDULE
• OMP_NESTED
CS3006 Spring 2025
4
Environment Variables: OMP_NUM_THREADS
• Specifies the default number of threads created upon entering a parallel
region.
• The number of threads can be changed during run-time using:
• omp_set_num_threads(int threads) routine [OR]
• num_threads clause → num_threads(int threads)
• Setting OMP_NUM_THREADS to 4 using Linux/Windows:
(Linux bash) → export OMP_NUM_THREADS=4
(Windows PowerShell) → $env:OMP_NUM_THREADS=4
(Windows Command Line) → set OMP_NUM_THREADS=4
CS3006 Spring 2025
5
Environment Variables: OMP_DYNAMIC
• when set to TRUE, allows the number of threads to be controlled at runtime.
It means OpenMP will use its dynamic adjustment algorithm to create
number of threads that may optimize system performance
• In case of TRUE, the total number of threads generated may not be equal to the
threads requested by using the omp_set_num_threads() function or the
num_threads clause.
• In case of FALSE, usually the total number of generated threads in a parallel region
become as requested by the num_threads clause
• OpenMP routines for setting/getting dynamic status:
• void omp_set_dynamic(int flag); // disables if flag=0
• Should be called from outside of a parallel region
• int omp_get_dynamic(); //return value of dynamic status
CS3006 Spring 2025
6
Environment Variables The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form
a new team if a parallel construct without a num_threads clause were encountered after execution returns from
this routine
OMP_DYNAMIC[dynamic.c]
workers = omp_get_max_threads(); //can use num_procs
printf("%d maximum allowed threads\n", workers);
printf("total number of allocated cores are:%d\n", omp_get_num_procs());
omp_set_dynamic(1);
omp_set_num_threads(8);
printf("total number of requested when dynamic is true are:%d\n", 8);
#pragma omp parallel
{
#pragma omp single nowait
printf("total threads in parallel region1=%d:\n", omp_get_num_threads());
#pragma omp for
for (i = 0; i < mult; i++)
{ a = complex_func(); }
}
CS3006 Spring 2025
7
Environment Variables
OMP_DYNAMIC[dynamic.c]
omp_set_dynamic(0);
omp_set_num_threads(8);
printf("total number of requested when dynamic is false are:%d\n", 8);
#pragma omp parallel
{
#pragma omp single nowait
printf("total threads in parallel region2=%d:\n", omp_get_num_threads());
#pragma omp for
for (i = 0; i < mult; i++)
{a = complex_func();}
}
CS3006 Spring 2025
8
Environment Variables
OMP_SCHEDULE
• Controls the assignment of iteration spaces associated with for directives that
use the runtime scheduling class
• Possible values: static, dynamic, and guided
• Can also be used along with chunk-size [optional]
• If chunk-size is not specified than default chunk-size of 1 is used.
• Setting OMP_SCHEDULE to guided with minimum chunk-size of 4 using
Ubuntu-based terminal:
export OMP_SCHEDULE= "guided, 4“
Windows PowerShell:
$env:OMP_SCHEDULE=“guided, 4”
CS3006 Spring 2025
9
Environment Variables
OMP_NESTED
• Default value is FALSE
• While using nested parallel pragma inside another, the nested one is executed by
the original team instead of making new thread team.
• When TRUE
• Enables nested parallelism
• While using nested parallel pragma code inside another, it makes a new team of
threads for executing the nested one.
• Use omp_set_nested(int val) with non-zero value to set this
variable to TRUE.
• When called with ‘0’ as argument, it sets the variable to FALSE
CS3006 Spring 2025
10
Environment Variables
OMP_NESTED [nested.c]
omp_set_nested(0);
#pragma omp parallel num_threads(2)
{
#pragma omp single
printf("Level 1: number of threads in the team : %d\n", omp_get_num_threads());
#pragma omp parallel num_threads(4)
{
#pragma omp single
printf("Level 2: number of threads in the team : %d\n", omp_get_num_threads());
}
}
CS3006 Spring 2025
11
Environment Variables
OMP_NESTED [nested.c]
omp_set_nested(1);
#pragma omp parallel num_threads(2)
{
#pragma omp single
printf("Level 1: number of threads in the team : %d\n", omp_get_num_threads());
#pragma omp parallel num_threads(4)
{
#pragma omp single
printf("Level 2: number of threads in the team : %d\n", omp_get_num_threads());
}
}
CS3006 Spring 2025
12
Computing Pi using Monte Carlo method
Preliminary Idea:
points in circle
Pi = 4 x ( )
points in square
𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝑐𝑖𝑟𝑐𝑙𝑒: 𝑥 − 𝑎 2 + 𝑦−𝑏 2 < 𝑟2
Here a=0.5 , b=0.5 and r=0.5
CS3006 Spring 2025
13
Computing Pi using Monte Carlo method
Steps
For all the random points
1. Calculate total points in the circle
2. Divide points in the circle to the points in the square
• Total number of points are also the total number of points inside the square
3. Multiply this fraction with 4
As number of random points increases, the value of Pi approaches to real value (i.e., 3.14179…..)
CS3006 Spring 2025
14
Sequential Implementation
Computing Pi using
int niter= 100000000; Monte Carlo method
count=0;
seed(time(0));
for (i=0; i < niter; ++i) { //10 million
//get random points
x = (double) random()/RAND_MAX;
y = (double) random()/RAND_MAX;
z = ((x-0.5)*(x-0.5))+((y-0.5)*(y-0.5));
//check to see if point is in unit circle
if (z<0.25) {
++count;
}
}
pi = ((double) count/ (double) niter) * 4.0; //p = 4(m/n)
printf("Seq_Pi: %f\n", pi);
CS3006 Spring 2025
15
#pragma omp parallel shared(niter) private(i, x, y, z, chunk_size, seed) reduction(+:count) {
num_threads = omp_get_num_threads();
chunk_size = niter / num_threads;
seed = omp_get_thread_num();
#pragma omp master
{ printf("chunk_size=%ld\n",chunk_size); }
Total points= 10 millions
count=0;
for (i=0; i < chunk_size; i++) {
//get random points
x = (double) rand_r(&seed) / (double) RAND_MAX;
y = (double) rand_r(&seed) / (double) RAND_MAX;
z = ((x-0.5)*(x-0.5))+((y-0.5)*(y-0.5));
//check to see if point is in unit circle
if (z<0.25) {
++count; Total points= 100 millions
}
}
} Computing Pi using the Monte
pi = ((double) count / (double) niter) * 4.0; Carlo method
CS3006 Spring 2025 (Parallel construct [parallel_pi.c]) 16
More Detailed Discussion
• Full Example Online:
http://www.umsl.edu/~siegelj/cs4790/openmp/pimonti_omp.c.HTML
• Further Reading (optional):
• https://1drv.ms/p/s!Apc0G8okxWJ12jlUANaQsYO-JVdx?e=VixgYX (just slide 1-9)
• https://passlab.github.io/CSCE569/notes/lecture04-07_OpenMP.pdf
• https://www3.nd.edu/~zxu2/acms60212-40212/Lec-12-OpenMP.pdf
17