0% found this document useful (0 votes)

14 views37 pages

OPENMP

Uploaded by

cvidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views37 pages

OPENMP

Uploaded by

cvidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Parallel Programming

with OPENMP
OPENMP: Motivation
Sequential program uses a single core/
processor while all other processors
are idle.

Using OMP pragmas can enable

utilizing all processors in parallel for a
program.
OPENMP: Motivation
https://www.openmp.org/about/whos-using-openmp
/
Parallel Gaussian Elimination Using OpenMP on 4
processors

Matlab – TSA & NaN toolbox

OpenMP is used in two core functions
(sumskipnan_mex&covm_mex), which compute the sum, the
covariance matrix, and counts the samples which are not NaN.
Speed up of 11 has been observed on multicore machines with
12 cores
OPENMP : Overview
• Collection of compiler directives and library functions for
creating parallel programs for shared-memory computers.
• The “MP” in OpenMP stands for “multi-processing”(shared-
memory parallel computing)
• Combined with C, C++, or Fortran to create a
multithreading programming language, in which all
processes are assumed to share a single address space.
• Based on the fork / join programming model: all programs
start as a single (master) thread, fork additional threads
where parallelism is desired (the parallel region), then join
back together.
• Version 1.0 with fortran in 1997, supporting C & C++ there
after, currently at version 5.0 in 2018.
OpenMP: Goals

Standardization: Provide a standard among a

variety of shared memory architectures/platforms
Lean and Mean: Establish a simple and limited set
of directives for programming shared memory
machines. Significant parallelism can be
implemented by using just 3 or 4 directives.
Ease of Use: Provide capability to incrementally
parallelize a serial program. Provide the capability to
implement both coarse-grained and fine-grained
parallelism
Portability: Supports Fortran (77, 90, 95…), C, and
C++. Public forum for API and membership
OpenMP: Core Elements
OPENMP #pragma
Special preprocessor instructions.
Typically added to a system to allow
behaviors that aren’t part of the basic C
specification.
Compilers that don’t support the pragmas
ignore them.
#pragma omp parallel
How many threads?
(OMP_NUM_THREADS)
Shared variables
S
#pragma omp parallel
h
{
Variables (private) a
P0 – Thread 0 Private
Stack r
Parallel Region e
d
P1 – Thread 1 Private
Stack V
a
}
r
P2 – Thread 2 Private i
Code within the parallel region is executed in Stack
a
parallel on all processors/threads.
b
P3 – Thread 3 Private l
Stack e
s
OpenMP - #pragma
Hello World - OpenMP
Fork: master thread
creates a team of
parallel threads.

Threads are
numbered from 0
Structured block of code (master thread) to N-1
Implicit barrier at the
end of a parallel section.
master thread executes
sequentially until the
Join: Team of threads complete first parallel region is
the statements in the parallel encountered.
region, synchronize and Parallelism added
terminate incrementally until
performance goals are
met.
OPENMP: Basic functions
OPENMP: basic functions
Each thread has its own stack, so it
will have its own private (local)
variables.
Each thread gets its own rank -
omp_get_thread_num
The number of threads in the team -
omp_get_num_threads
In OpenMP, stdout is shared among
the threads, so each thread can
execute the printf statement.
There is no scheduling of access
to stdout, output is non-
OPENMP: Run Time Functions
Create a 4 thread Parallel region :
Statements in the program that are enclosed by the parallel
region construct are executed in parallel among the various
team threads.

Each thread calls pooh(ID,A) for ID = 0 to 3

OpenMP Run Time Functions
Modify/check/get info about the number of threads
omp_get_num_threads() //number of threads in use
• omp_get_thread_num() //tells which thread you are
• omp_get_max_threads() //max threads that can be used
Are we in a parallel region? omp_in_parallel()
How many processors in the system? omp_get_num_procs()
Set explicit locks and several more...

OpenMP Environment Variables

USE SET command in Windows or printenv command
OMP_NUM_THREADS: Sets the maximuminnumber ofcurrent
linux to see threads in the parallel
environment variables
region, unless overridden by omp_set_num_threads or num_threads.
OpenMP parallel regions

Branching in or out of a structured block is not allowed!

OpenMP parallel regions
Serial code – Variable declarations, functions etc. When should I
int a,b,c = 0; execute this code
float x = 1.0; in parallel?
if clause
#pragma omp parallel num_threads 8 private(a) …..
{
My Parallel Region (piece of code)
Which variables
are local to each
int i = 5; thread?
int j = 10; private clause
int a =threadNumber;

Number of threads or Which variables are

copies of the parallel shared across all
region to execute threads?
num_threads shared clause
OPENMP: Variable Scope
• In OpenMP, scope refers to the set of threads that can see
a variable in a parallel block.
• A general rule is that any variable declared outside of a
parallel region has a shared scope. In some sense, the
“default” variable scope is shared.
• When a variable can be seen/read/written by all threads
in a team, it is said to have shared scope;
• A variable that can be seen by only one thread is said to
have private scope. Each thread has a copy of the private
variable.
• Loop variables in an omp for are private
• Local variables in the parallel region are private
• Change default behavior by using the clause
default(shared) or default(private)
OpenMP: Data Scoping
Challenge in Shared Memory Parallelization => Managing Data
Environment
Scoping
OpenMP Shared variable : Can be Read/Written by all Threads in the
Loop variables in an omp for are private;
team.
int i;
OpenMP Private
Local variables variable
in the : Each
parallel regionThread
are hasint
itsj;own local copy of this
private. #pragma omp parallel private(j)
variable
{
Alter default behaviour with the {default} int k;
clause: i = ……. Shared
#pragma omp parallel default(shared)
private(x) j = ……..
{ ... } Private
k=…
#pragma omp parallel default(private) }
shared (matrix)
{ ... }
OpenMP: private Clause

• Reproduce the private variable for each thread.

• Variables are not initialized.
• The value that Thread1 stores in x is different from
the value Thread2 stores in x
OpenMP: firstprivate Clause

• Creates private memory location for iper for each

thread.
• Copy value from master thread to each memory
location
• While initial value is same, it can be changed by
threads and subsequently Thread 0 Thread 1 and
2.. Might have different values of the firstprivate
variable
OpenMP: Clauses & Data
Scoping
Schedule Clause

Data
Sharing/Scope
Matrix Vector
Multiplication

#pragma omp parallel

num_threads(4)
for (i=0; i < m; i++)
{ y[i] =0.0;
for (j=0; j < SIZE; j++)
y[i] += (A[i][j] * x[j]);
}

Is this reasonable?
Matrix Vector
Multiplication #pragma omp parallel shared(A,x,y,SIZE) \
private(tid,i,j,istart,iend)
{
tid = omp_get_thread_num();
int nid = omp_get_num_threads();
istart = tid*SIZE/nid;
iend = (tid+1)*SIZE/nid;

for (i=istart; i < iend; i++)

{
for (j=0; j < SIZE; j++)
Matrix Rows = N (= 8) y[i] += (A[i][j] * x[j]);
Number of Threads = T (=4)
Number of Rows processed by thread = N/T printf(" thread %d did row %d\t
Thread 0 => rows 0,1,2,3,…(N/T – 1) y[%d]=%.2f\t",tid,i,i,y[i]);
Thread 1 => rows N/T, N/T+1…… 2*N/T - 1
}
Thread t => rows t, t+1, t+2, …. (t*N/T -1)
} /* end of parallel construct */
Matrix Vector
Multiplication
omp_set_num_threads(4)
#pragma omp for must be inside
#pragma omp parallel shared(A,x,y,SIZE)
a parallel region (#pragma omp
{
parallel)
#pragma omp for
for (int i=0; i < SIZE; i++)
No new threads are created
{
but the threads already
for (int j=0; j < SIZE; j++)
created in the enclosing
y[i] += (A[i][j] * x[j]);
parallel region are used.
}
} /* end of parallel construct */
The system automatically
parallelizes the for loop by dividing
Matrix Rows = N (= 8) the iterations of the loop among
Number of Threads = T (=4) the threads.
Number of Rows processed by thread = N/T
Thread 0 => rows 0,1,2,3,…(N/T – 1) User can control how to divide the
Thread 1 => rows N/T, N/T+1…… 2*N/T - 1 loop iterations among threads by
Thread t => rows t, t+1, t+2, …. (t*N/T -1) using the schedule clause.

User controlled Variable Scope

#pragma omp for
#pragma omp parallel
for

OpenMP takes care of partitioning

the iteration space for you.
Threads are assigned independent
sets of iterations.
There is no implied barrier upon
entry to a work-sharing construct,
There is an implied barrier at the
end of a work sharing construct
OpenMP: Work Sharing
Data parallelism
• Large amount of data elements and each data
element (or possibly a subset of elements) needs
to be processed to produce a result. When this
processing can be done in parallel, we have data
parallelism (for loops)
Task parallelism
• A collection of tasks that need to be completed. If
these tasks can be performed in parallel you are
faced with a task parallel job
Work Sharing: omp
for
Computing ∏ by method of Numerical Integration

Divide the interval (x axis) [0,1] into N parts. Xi + Xi+1

Area of each rectangle is x * y [ x = 1/N, y = 4/ (1+x 2)] =[1/N] *4/ (1+x2)
Approximation of x as midpoint of the interval before computing Y 2
Serial Code

static long num_steps = 100000;

double step;
void main ()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
for (I = 0; I <= num_steps; i++)
{
x = (I + 0.5) * step; task1
sum = sum + 4.0 / (1.0 + x*x);
}
task2
pi = step * sum
}

1. Computation of the areas of There is no communication

individual rectangles among the tasks in the first
2. Adding the areas of collection, but each task in the
rectangles. first collection communicates
with task 2
Computing ∏ by method of Numerical
Integration
static long num_steps = 100000; #include <omp.h>
double step; #define NUM_THREADS 4
static long num_steps = 100000;
void main ()
double step;
{ void main ()
int i; double x, pi, sum = 0.0; {
step = 1.0 / (double) int i; double x, pi, sum = 0.0;
num_steps; step = 1.0 / (double) num_steps;
for (I = 0; I <= num_steps; i+ omp_set_num_threads(NUM_THREADS);
+) { #pragma omp parallel for shared(sum)
x = (I + 0.5) * step; private(x)
sum = sum + 4.0 / (1.0 + for (I = 0; I <= num_steps; i++) {
x = (I + 0.5) * step;
x*x);
sum = sum + 4.0 / (1.0 + x*x);
} }
pi = step * sum pi = step * sum
} }

Serial Code Parallel Code

Race Condition
#pragma omp parallel for
shared(global_result) private(x,
myresult)
for (I = 0; I <= num_steps; i++) {
x = (I + 0.5) * step;
myresult = 4.0 / (1.0 + x*x);
global_result +=
myresult;
}

Unpredictable results when

two (or more) threads
attempt to simultaneously
execute:
global_result +=
myresult
Handling Race Conditions

Thread 0
Global_result +=2
Mutual Exclusion:
Only one thread at a time executes the
statement
Thread1
Global_result +=3
Pretty much sequential

Thread2
Global_result +=4
Handling Race Conditions

omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for shared(sum)
private(x) Mutual Exclusion:
for (I = 0; I <= num_steps; i++) { Only one thread at a
x = (I + 0.5) * step; time executes the
#pragma omp critical
sum = sum + 4.0 / (1.0 + x*x); statement
} sum = sum + 4.0 / (1.0 +
x*x);

Use synchronization to protect data conflicts.

Mutual Exclusion (#pragma omp critical)
Mutual Exclusion (#pragma omp atomic)
Synchronization could be expensive so:
Change how data is accessed to minimize the need
for synchronization.
OpenMP: Reduction

sum = 0;
set_omp_num_threads(8)
#pragma omp parallel
for
reduction (+:sum)
for (int i = 0; i < 16; i++)
{
sum += a[i]
}
Thread0 => iteration 0 &
1
Thread1 => iteration 2 &
3
Thread local/private
………
One or more variables that are private to each thread are subject of
reduction operation at the end of the parallel region.
#pragma omp for reduction(operator : var)
Operator: + , * , - , & , | , && , ||, ^
Combines multiple local copies of the var from threads into a single copy at
master.
Computing ∏ by method of Numerical
Integration
static long num_steps = 100000; #include <omp.h>
double step; #define NUM_THREADS 4
static long num_steps = 100000;
void main ()
double step;
{ void main ()
int i; double x, pi, sum = 0.0; {
step = 1.0 / (double) int i; double x, pi, sum = 0.0;
num_steps; step = 1.0 / (double) num_steps;
for (I = 0; I <= num_steps; i+ omp_set_num_threads(NUM_THREADS);
+) { #pragma omp parallel for
x = (I + 0.5) * step; reduction(+:sum) private(x)
sum = sum + 4.0 / (1.0 + for (I = 0; I <= num_steps; i++) {
x = (I + 0.5) * step;
x*x);
sum += 4.0 / (1.0 + x*x);
} }
pi = step * sum pi = step * sum
} }

Serial Code Parallel Code

omp for Parallelization

Can all loops be parallelized?

Loop iterations have to be independent.

Simple Test: If the results differ when the code is

executed backwards, the loop cannot by parallelized!

for (int i = 2; i < 10; i++)

{
x[i] = a * x[i-1] + b
}

Between 2 Synchronization points, if atleast 1 thread

writes to a memory location, that atleast 1 other
thread reads from => The result is non-deterministic
Recap
What is OPENMP?
Fork/Join Programming model
OPENMP Core Elements
#pragma omp parallel OR Parallel construct
run time variables
environment variables
data scoping (private, shared…)
work sharing constructs
#pragma omp for
sections
tasks
schedule clause
synchronization
compile and run openmp program in c++ and fortran

Revise Edexcel Igcse Computer Science Revision Workbook
67% (3)
Revise Edexcel Igcse Computer Science Revision Workbook
23 pages
License Control Item Lists (5G RAN6.1 - Draft A)
No ratings yet
License Control Item Lists (5G RAN6.1 - Draft A)
33 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Num Tech
No ratings yet
Num Tech
39 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Open MP
No ratings yet
Open MP
28 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Openmp: Martin Kruliš Ji Ří Dokulil
No ratings yet
Openmp: Martin Kruliš Ji Ří Dokulil
38 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Open MP
No ratings yet
Open MP
35 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Open MP
No ratings yet
Open MP
30 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
Openmp 2pp
No ratings yet
Openmp 2pp
15 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Openmp
No ratings yet
Openmp
115 pages
03 cmsc416 Openmp
No ratings yet
03 cmsc416 Openmp
49 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
Unit III
No ratings yet
Unit III
15 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
Openmp
No ratings yet
Openmp
21 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
OpenMP Reference
No ratings yet
OpenMP Reference
2 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
4.OpenMP Done
No ratings yet
4.OpenMP Done
3 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
49 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Unit 3
No ratings yet
Unit 3
13 pages
Open MP
No ratings yet
Open MP
59 pages
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Sockets
No ratings yet
Sockets
23 pages
Proceso S
No ratings yet
Proceso S
45 pages
Intro Fork 1
No ratings yet
Intro Fork 1
10 pages
Lamdas
No ratings yet
Lamdas
54 pages
Mainline Blueprint Cas Pa
No ratings yet
Mainline Blueprint Cas Pa
7 pages
PF-Week 01
No ratings yet
PF-Week 01
58 pages
Become An OCI Foundations Associate (2024)
No ratings yet
Become An OCI Foundations Associate (2024)
53 pages
Sample Paper - 2010 Class - XII Subject - Computer Science
No ratings yet
Sample Paper - 2010 Class - XII Subject - Computer Science
7 pages
Saudi Aramco: Workover Manual Original Issue & Revision Guidelines
No ratings yet
Saudi Aramco: Workover Manual Original Issue & Revision Guidelines
6 pages
1 SP - PP Gold Model Design1-2
No ratings yet
1 SP - PP Gold Model Design1-2
221 pages
Brainly
No ratings yet
Brainly
4 pages
Redhat Summit Perf Analysis and Tuning Part 1 2013
100% (1)
Redhat Summit Perf Analysis and Tuning Part 1 2013
69 pages
Sedomat 5500plus Web en
No ratings yet
Sedomat 5500plus Web en
2 pages
Pop! - OS Keyboard Shortcuts - System76 Support
No ratings yet
Pop! - OS Keyboard Shortcuts - System76 Support
6 pages
مقياس موهبة الصف الثالث المتوسط نموذج الإجابة PDF
No ratings yet
مقياس موهبة الصف الثالث المتوسط نموذج الإجابة PDF
1 page
(Computer Awareness) C Program Basics
No ratings yet
(Computer Awareness) C Program Basics
19 pages
Implementing Self-Service Procurement in ERP - SAP Documentation
No ratings yet
Implementing Self-Service Procurement in ERP - SAP Documentation
3 pages
DCP L3510CDW PDF
No ratings yet
DCP L3510CDW PDF
8 pages
Module5 - Identity and Access Management
No ratings yet
Module5 - Identity and Access Management
84 pages
STM Lab Manual
No ratings yet
STM Lab Manual
50 pages
FCS Unit 1
No ratings yet
FCS Unit 1
5 pages
Cs547 Wireless NW
No ratings yet
Cs547 Wireless NW
1 page
I&cs MCQ Set-3
No ratings yet
I&cs MCQ Set-3
22 pages
TRAF Oracle Cloud ERP-Tax-Security Design - Draft - v2.2
No ratings yet
TRAF Oracle Cloud ERP-Tax-Security Design - Draft - v2.2
404 pages
Digipm: What Do You Get With Digipm?
No ratings yet
Digipm: What Do You Get With Digipm?
3 pages
Shubham Chhimpa Resume April2023 Without Number
100% (1)
Shubham Chhimpa Resume April2023 Without Number
1 page
Route Summarization
No ratings yet
Route Summarization
17 pages
12th Computer Applications All Practical Programs English Medium PDF Download
No ratings yet
12th Computer Applications All Practical Programs English Medium PDF Download
33 pages
DGCA Module 08 MARCH 2017 HANDWRITTEN SET 1 & 2 PDF
No ratings yet
DGCA Module 08 MARCH 2017 HANDWRITTEN SET 1 & 2 PDF
5 pages
2020 Internet Retailer Leading Vendors Top 1000 Eretailers
100% (1)
2020 Internet Retailer Leading Vendors Top 1000 Eretailers
106 pages
Azure PDF
No ratings yet
Azure PDF
2,825 pages
ELTP Extending ELT For Modern AI and Analytics Airbyte
No ratings yet
ELTP Extending ELT For Modern AI and Analytics Airbyte
9 pages

OPENMP

Uploaded by

OPENMP

Uploaded by

Parallel Programming

Using OMP pragmas can enable

Matlab – TSA & NaN toolbox

Standardization: Provide a standard among a

Each thread calls pooh(ID,A) for ID = 0 to 3

OpenMP Environment Variables

Branching in or out of a structured block is not allowed!

Number of threads or Which variables are

• Reproduce the private variable for each thread.

• Creates private memory location for iper for each

#pragma omp parallel

for (i=istart; i < iend; i++)

User controlled Variable Scope

OpenMP takes care of partitioning

Divide the interval (x axis) [0,1] into N parts. Xi + Xi+1

static long num_steps = 100000;

1. Computation of the areas of There is no communication

Serial Code Parallel Code

Unpredictable results when

Use synchronization to protect data conflicts.

Serial Code Parallel Code

Can all loops be parallelized?

Simple Test: If the results differ when the code is

for (int i = 2; i < 10; i++)

Between 2 Synchronization points, if atleast 1 thread

You might also like