0% found this document useful (0 votes)

39 views49 pages

OpenMP Workshop Day 1

Uploaded by

mamalee393

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views49 pages

OpenMP Workshop Day 1

Uploaded by

mamalee393

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Programming OpenMP

Christian Terboven
Michael Klemm

1 OpenMP Tutorial
Members of the OpenMP Language Committee
Agenda

2 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
An Overview Of OpenMP

Christian Terboven
Michael Klemm

1 OpenMP Tutorial
Members of the OpenMP Language Committee
History

• De-facto standard for Shared-Memory Parallelization.

• 1997: OpenMP 1.0 for FORTRAN
• 1998: OpenMP 1.0 for C and C++
• 1999: OpenMP 1.1 for FORTRAN
http://www.OpenMP.org
• 2000: OpenMP 2.0 for FORTRAN
• 2002: OpenMP 2.0 for C and C++
• 2005: OpenMP 2.5 now includes
both programming languages.
• 05/2008: OpenMP 3.0
• 07/2011: OpenMP 3.1
• 07/2013: OpenMP 4.0
• 11/2015: OpenMP 4.5
• 11/2018: OpenMP 5.0
• 11/2020: OpenMP 5.1

2 OpenMP Tutorial
Members of the OpenMP Language Committee
What is OpenMP?

• Parallel Region & Worksharing

• Tasking

• SIMD / Vectorization

• Accelerator Programming

• …

3 OpenMP Tutorial
Members of the OpenMP Language Committee
Get your C/C++ and Fortran Reference Guide!
Covers all of OpenMP 5.0!

4 OpenMP Tutorial
Members of the OpenMP Language Committee
Recent Books About OpenMP

A printed copy of the 5.0 A book that covers all of the A new book about the OpenMP
specifications, 2019 OpenMP 4.5 features, 2017 Common Core, 2019

5 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Parallel Region

Christian Terboven
Michael Klemm

1 OpenMP Tutorial
Members of the OpenMP Language Committee
OpenMP‘s machine model

• OpenMP: Shared-Memory Parallel Programming Model.

Memory All processors/cores access

a shared main memory.

Crossbar / Bus Real architectures are

more complex, as we
will see later / as we
Cache Cache Cache have
Cache
seen.

Proc Proc Proc Proc Parallelization in OpenMP

employs multiple threads.

2 OpenMP Tutorial
Members of the OpenMP Language Committee
The OpenMP Memory Model

• All threads have access to private

memory private
the same, globally shared memory
PU
memory
T T
PU
• Data in private memory is
only accessible by the thread accelerator
owning this memory Shared T memory

Memory private
PU
• No other thread sees the private memory
change(s) in private memory memory
PU

• Data transfer is through shared

T T
memory and is 100% transparent private
memory
to the application

3 OpenMP Tutorial
Members of the OpenMP Language Committee
The OpenMP Execution Model

• OpenMP programs start with Master Thread Serial Part

just one thread: The Master.
Parallel
• Worker threads are spawned Slave Region
Slave
Threads
at Parallel Regions, together Worker
Threads
with the Master they form the Threads
Team of threads.
Serial Part
• In between Parallel Regions the
Worker threads are put to sleep.
The OpenMP Runtime takes care
of all thread management work. Parallel
Region

• Concept: Fork-Join.
• Allows for an incremental parallelization!

4 OpenMP Tutorial
Members of the OpenMP Language Committee
Parallel Region and Structured Blocks

• The parallelism has to be expressed explicitly.

C/C++ Fortran
#pragma omp parallel !$omp parallel
{ ...
... structured block
structured block ...
... !$omp end parallel
}

• Structured Block ◼ Specification of number of threads:

– Exactly one entry point at the top – Environment variable: OMP_NUM_THREADS=…
– Exactly one exit point at the bottom – Or: Via num_threads clause:
– Branching in or out is not allowed add num_threads(num) to the
– Terminating the program is allowed parallel construct
(abort / exit)

5 OpenMP Tutorial
Members of the OpenMP Language Committee
Starting OpenMP Programs on Linux

• From within a shell, global setting of the number of threads:

export OMP_NUM_THREADS=4
./program

• From within a shell, one-time setting of the number of threads:

OMP_NUM_THREADS=4 ./program

6 OpenMP Tutorial
Members of the OpenMP Language Committee
Demo

Hello OpenMP World

7 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Worksharing

Christian Terboven
Michael Klemm

1 OpenMP Tutorial
Members of the OpenMP Language Committee
For Worksharing

• If only the parallel construct is used, each thread executes the Structured Block.
• Program Speedup: Worksharing
• OpenMP‘s most common Worksharing construct: for

C/C++ Fortran
int i; INTEGER :: i
#pragma omp for !$omp do
for (i = 0; i < 100; i++) DO i = 0, 99
{ a[i] = b[i] + c[i]
a[i] = b[i] + c[i]; END DO
}
– Distribution of loop iterations over all threads in a Team.
– Scheduling of the distribution can be influenced.

• Loops often account for most of a program‘s runtime!

2 OpenMP Tutorial
Members of the OpenMP Language Committee
Worksharing illustrated

Pseudo-Code Memory
Here: 4 Threads
Thread 1 do i = 0, 24 A(0)
.
a(i) = b(i) + c(i) .
end do .
A(99)
Thread 2 do i = 25, 49
Serial B(0)
a(i) = b(i) + c(i)
do i = 0, 99 .
end do .
a(i) = b(i) + c(i) .
end do do i = 50, 74 B(99)
a(i) = b(i) + c(i)
Thread 3 end do C(0)
.
do i = 75, 99 .
a(i) = b(i) + c(i) .
C(99)
Thread 4 end do

3 OpenMP Tutorial
Members of the OpenMP Language Committee
The Barrier Construct

• OpenMP barrier (implicit or explicit)

– Threads wait until all threads of the current Team have reached the barrier
C/C++
#pragma omp barrier

• All worksharing constructs contain an implicit barrier at the end

4 OpenMP Tutorial
Members of the OpenMP Language Committee
The Single Construct
C/C++ Fortran
#pragma omp single [clause] !$omp single [clause]
... structured block ... ... structured block ...
!$omp end single

• The single construct specifies that the enclosed structured block is executed by only on thread of the
team.
– It is up to the runtime which thread that is.

• Useful for:
– I/O
– Memory allocation and deallocation, etc. (in general: setup work)
– Implementation of the single-creator parallel-executor pattern as we will see later…

5 OpenMP Tutorial
Members of the OpenMP Language Committee
The Master Construct
C/C++ Fortran
#pragma omp master[clause] !$omp master[clause]
... structured block ... ... structured block ...
!$omp end master

• The master construct specifies that the enclosed structured block is executed only by the master thread of
a team.

• Note: The master construct is no worksharing construct and does not contain an implicit barrier at the end.

6 OpenMP Tutorial
Members of the OpenMP Language Committee
Demo

Vector Addition

7 OpenMP Tutorial
Members of the OpenMP Language Committee
Influencing the For Loop Scheduling / 1

• for-construct: OpenMP allows to influence how the iterations are scheduled among the threads of the
team, via the schedule clause:

– schedule(static [, chunk]): Iteration space divided into blocks of chunk size, blocks are assigned to
threads in a round-robin fashion. If chunk is not specified: #threads blocks.

– schedule(dynamic [, chunk]): Iteration space divided into blocks of chunk (not specified: 1) size,
blocks are scheduled to threads in the order in which threads finish previous blocks.

– schedule(guided [, chunk]): Similar to dynamic, but block size starts with implementation-defined
value, then is decreased exponentially down to chunk.

• Default is schedule(static).

8 OpenMP Tutorial
Members of the OpenMP Language Committee
Influencing the For Loop Scheduling / 2

◼ Static Schedule
→ schedule(static [, chunk])

→ Decomposition
depending on chunksize

→ Equal parts of size ‘chunksize’

distributed in round-robin
fashion
◼ Pros?
→ No/low runtime overhead
◼ Cons?
→ No dynamic workload balancing

9 OpenMP Tutorial
Members of the OpenMP Language Committee
Influencing the For Loop Scheduling / 3

• Dynamic schedule
– schedule(dynamic [, chunk])
– Iteration space divided into blocks of chunk size
– Threads request a new block after finishing the previous one
– Default chunk size is 1
• Pros ?
– Workload distribution
• Cons?
– Runtime Overhead
– Chunk size essential for performance
– No NUMA optimizations possible

10 OpenMP Tutorial
Members of the OpenMP Language Committee
Synchronization Overview

• Can all loops be parallelized with for-constructs? No!

– Simple test: If the results differ when the code is executed backwards, the loop iterations are not independent.
BUT: This test alone is not sufficient:

C/C++
int i, int s = 0;
#pragma omp parallel for
for (i = 0; i < 100; i++)
{
s = s + a[i];
}

• Data Race: If between two synchronization points at least one thread writes to a memory location from
which at least one other thread reads, the result is not deterministic (race condition).

11 OpenMP Tutorial
Members of the OpenMP Language Committee
Synchronization: Critical Region

• A Critical Region is executed by all threads, but by only one thread simultaneously (Mutual Exclusion).

C/C++
#pragma omp critical (name)
{
... structured block ...
}

• Do you think this solution scales well?

C/C++
int i, s = 0;
#pragma omp parallel for
for (i = 0; i < 100; i++)
{
#pragma omp critical
{ s = s + a[i]; }
}

12 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Scoping

Christian Terboven
Michael Klemm

1 OpenMP Tutorial
Members of the OpenMP Language Committee
Scoping Rules

• Managing the Data Environment is the challenge of OpenMP.

• Scoping in OpenMP: Dividing variables in shared and private:

– private-list and shared-list on Parallel Region
– private-list and shared-list on Worksharing constructs
– General default is shared for Parallel Region, firstprivate for Tasks.
– Loop control variables on for-constructs are private Tasks are
introduced later
– Non-static variables local to Parallel Regions are private
– private: A new uninitialized instance is created for the task or each thread executing the construct
• firstprivate: Initialization with the value before encountering the construct
• lastprivate: Value of last loop iteration is written back to Master
– Static variables are shared

2 OpenMP Tutorial
Members of the OpenMP Language Committee
Privatization of Global/Static Variables

• Global / static variables can be privatized with the threadprivate directive

– One instance is created for each thread
• Before the first parallel region is encountered
• Instance exists until the program ends
• Does not work (well) with nested Parallel Region
– Based on thread-local storage (TLS)
• TlsAlloc (Win32-Threads), pthread_key_create (Posix-Threads), keyword __thread (GNU extension)

C/C++ Fortran
static int i; SAVE INTEGER :: i
#pragma omp threadprivate(i) !$omp threadprivate(i)

3 OpenMP Tutorial
Members of the OpenMP Language Committee
Privatization of Global/Static Variables

• Global / static variables can be privatized with the threadprivate directive

C/C++ Fortran
static int i; SAVE INTEGER :: i
#pragma omp threadprivate(i) !$omp threadprivate(i)

4 OpenMP Tutorial
Members of the OpenMP Language Committee
Back to our example

C/C++
int i, s = 0;
#pragma omp parallel for
for (i = 0; i < 100; i++)
{
#pragma omp critical
{ s = s + a[i]; }
}

5 OpenMP Tutorial
Members of the OpenMP Language Committee
It‘s your turn: Make It Scale!

#pragma omp parallel

do i = 0, 24
{ s = s + a(i)
end do

#pragma omp for do i = 25, 49

for (i = 0; i < 99; i++) s = s + a(i)
{ end do
do i = 0, 99
s = s + a(i)
do i = 50, 74
s = s + a[i]; end do
s = s + a(i)
end do
}
do i = 75, 99
s = s + a(i)
} // end parallel end do

6 OpenMP Tutorial
Members of the OpenMP Language Committee
(done)

#pragma omp parallel

do i = 0, 24
{ s1 = s1 + a(i)
double ps = 0.0; // private variable end do
s = s + s1
#pragma omp for
for (i = 0; i < 99; i++) do i = 25, 49
s2 = s2 + a(i)
{ end do
ps = ps + a[i]; do i = 0, 99 s = s + s2
} s = s + a(i)
do i = 50, 74
#pragma omp critical end do
s3 = s3 + a(i)
{ end do
s = s + s3
s += ps;
do i = 75, 99
} s4 = s4 + a(i)
} // end parallel end do
s = s + s4

7 OpenMP Tutorial
Members of the OpenMP Language Committee
The Reduction Clause

• In a reduction-operation the operator is applied to all variables in the list. The variables have to be shared.
– reduction(operator:list)
– The result is provided in the associated reduction variable

C/C++
int i, s = 0;
#pragma omp parallel for reduction(+:s)
for(i = 0; i < 99; i++)
{
s = s + a[i];
}

– Possible reduction operators with initialization value:

+ (0), * (1), - (0), & (~0), | (0), && (1), || (0), ^ (0), min
(largest number), max (least number)
– Remark: OpenMP also supports user-defined reductions (not covered here)

8 OpenMP Tutorial
Members of the OpenMP Language Committee
Example

9 OpenMP Tutorial
Members of the OpenMP Language Committee
Example: Pi (1/2)

double f(double x)
1
{ 4
return (4.0 / (1.0 + x*x)); 𝜋=න
} 1 + 𝑥2
0
double CalcPi (int n)
{
const double fH = 1.0 / (double) n;
double fSum = 0.0;
double fX;
int i;

#pragma omp parallel for

for (i = 0; i < n; i++)
{
fX = fH * ((double)i + 0.5);
fSum += f(fX);
}
return fH * fSum;
}

10 OpenMP Tutorial
Members of the OpenMP Language Committee
Example: Pi (2/2)

double f(double x)
1
{ 4
return (4.0 / (1.0 + x*x)); 𝜋=න
} 1 + 𝑥2
0
double CalcPi (int n)
{
const double fH = 1.0 / (double) n;
double fSum = 0.0;
double fX;
int i;

#pragma omp parallel for private(fX,i) reduction(+:fSum)

for (i = 0; i < n; i++)
{
fX = fH * ((double)i + 0.5);
fSum += f(fX);
}
return fH * fSum;
}

11 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
OpenMP Tasking Introduction

Christian Terboven
Michael Klemm

1 OpenMP Tutorial
Members of the OpenMP Language Committee
What is a Task in OpenMP?
◼ Tasks are work units whose execution
→ may be deferred or…

→ … can be executed immediately

◼ Tasks are composed of
→ code to execute, a data environment (initialized at creation time), internal control variables (ICVs)
◼ Tasks are created…
… when reaching a parallel region → implicit tasks are created (per thread)

… when encountering a task construct → explicit task is created

… when encountering a taskloop construct → explicit tasks per chunk are created

… when encountering a target construct → target task is created

2 OpenMP Tutorial
Members of the OpenMP Language Committee
Tasking Execution Model
◼ Supports unstructured parallelism ◼ Example (unstructured parallelism)
→ unbounded loops #pragma omp parallel
#pragma omp master
while ( <expr> ) {
while (elem != NULL) {
...
#pragma omp task
}
compute(elem);
elem = elem->next;
→ recursive functions }
void myfunc( <args> )
{
...; myfunc( <newargs> ); ...;
} Parallel Team

◼ Several scenarios are possible: Task pool

→ single creator, multiple creators, nested tasks (tasks & WS)

◼ All threads in the team are candidates to execute tasks

3 OpenMP Tutorial
Members of the OpenMP Language Committee
OpenMP Tasking Idiom
◼ OpenMP programmers need a specific idiom to kick off task-parallel execution: parallel master
→ OpenMP version 5.0 introduced the parallel master construct

→ With OpenMP version 5.1 this becomes parallel masked

1 int main(int argc, char* argv[]) 1 int main(int argc, char* argv[])
2 { 2 {
3 [...] 3 [...]
4 #pragma omp parallel 4 #pragma omp parallel
5 { 5 {
6 #pragma omp master 6 #pragma omp single
7 { 7 {
9 start_task_parallel_execution(); 9 start_task_parallel_execution();
9 } 9 }
10 } 10 }
11 [...] 11 [...]
12 } 12 }

4 OpenMP Tutorial
Members of the OpenMP Language Committee
Fibonacci Numbers (in a Stupid Way ☺)
1 int main(int argc, 14 int fib(int n) {
2 char* argv[]) 15 if (n < 2) return n;
3 { 16 int x, y;
4 [...] 17 #pragma omp task shared(x)
5 #pragma omp parallel 18 {
6 { 19 x = fib(n - 1);
7 #pragma omp master 20 }
8 { 21 #pragma omp task shared(y)
9 fib(input); 22 {
10 } 23 y = fib(n - 2);
11 } 24 }
12 [...] 25 #pragma omp taskwait
13 } 26 return x+y;
27 }

◼ Only one thread enters fib() from main().

◼ That thread creates the two initial work tasks and starts the parallel recursion.
◼ The taskwait construct is required to wait for the result for x and y before the task can sum up.
5 OpenMP Tutorial
Members of the OpenMP Language Committee
◼ T1 enters fib(4)
◼ T1 creates tasks for
fib(3) and fib(2) fib(4)
◼ T1 and T2 execute tasks
from the queue
◼ T1 and T2 create 4 new fib(3) fib(2)
tasks
◼ T1 - T4 execute tasks

fib(2) fib(1) fib(1) fib(0)

Task Queue

fib(2)
fib(3) fib(2)
fib(1) fib(1) fib(0)

6 OpenMP Tutorial
Members of the OpenMP Language Committee
◼ T1 enters fib(4)
◼ T1 creates tasks for
fib(3) and fib(2) fib(4)
◼ T1 and T2 execute tasks
from the queue
◼ T1 and T2 create 4 new fib(3) fib(2)
tasks
◼ T1 - T4 execute tasks
◼ … fib(2) fib(1) fib(1) fib(0)

fib(1) fib(0)

7 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Using OpenMP Compilers

Christian Terboven
Michael Klemm

8 OpenMP Tutorial
Members of the OpenMP Language Committee
Production Compilers w/ OpenMP Support
◼ GCC
◼ clang/LLVM
◼ Intel Classic and Next-gen Compilers
◼ AOCC, AOMP, ROCmCC
◼ IBM XL
◼ … and many more

◼ See https://www.openmp.org/resources/openmp-compilers-tools/ for a list

9 OpenMP Tutorial
Members of the OpenMP Language Committee
Compiling OpenMP
◼ Enable OpenMP via the compiler’s command-line switches
→ GCC: -fopenmp

→ clang: -fopenmp

→ Intel: -fopenmp or –qopenmp (classic) or –fiopenmp (next-gen)

→ AOCC, AOCL, ROCmCC: -fopenmp

→ IBM XL: -qsmp=omp

◼ Switches have to be passed to both compiler and linker:
$ gcc [...] -fopenmp -o matmul.o -c matmul.c
$ gcc [...] -fopenmp -o matmul matmul.o
$./matmul 1024
Sum of matrix (serial): 134217728.000000, wall time 0.413975, speed-up 1.00
Sum of matrix (parallel): 134217728.000000, wall time 0.092162, speed-up 4.49

10 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Hands-on Exercises

Christian Terboven
Michael Klemm

11 OpenMP Tutorial
Members of the OpenMP Language Committee
Webinar Exercises
◼ We have implemented a series of small hands-on examples that you can use and play with.
→ Download: git clone https://github.com/cterboven/OpenMP-tutorial-PRACE.git

→ Build: make

→ You can then find the compiled code in the “bin” folder to run it

→ We use the GCC compiler mostly, some examples require Intel’s Math Kernel Library

◼ Each hands-on exercise has a folder “solution”

→ It shows the OpenMP directive that we have added

→ You can use it to cheat ☺, or to check if you came up with the same solution

12 OpenMP Tutorial
Members of the OpenMP Language Committee

(Prof. S.S.Sarkate) : " Round Robin Algorithm "
50% (2)
(Prof. S.S.Sarkate) : " Round Robin Algorithm "
18 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
56 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
Open MP
No ratings yet
Open MP
30 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Introduction To Openmp: Openmp in Small Bites: Overview
No ratings yet
Introduction To Openmp: Openmp in Small Bites: Overview
123 pages
Shared Memory Parallel Programming: Introduction To Openmp
No ratings yet
Shared Memory Parallel Programming: Introduction To Openmp
39 pages
Tutorial Presentation 8
No ratings yet
Tutorial Presentation 8
15 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Open MP
No ratings yet
Open MP
28 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
No ratings yet
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
62 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
09 OpenMP Intro
No ratings yet
09 OpenMP Intro
15 pages
Unit 3
No ratings yet
Unit 3
13 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
Openmp 2pp
No ratings yet
Openmp 2pp
15 pages
Open MP
No ratings yet
Open MP
35 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Openmp 6pp
No ratings yet
Openmp 6pp
5 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
Cs6801 Mcap MGM
No ratings yet
Cs6801 Mcap MGM
7 pages
OPENMP
No ratings yet
OPENMP
37 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
HPC - Unit 3
No ratings yet
HPC - Unit 3
15 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Num Tech
No ratings yet
Num Tech
39 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Uk Openmp Users 2018 Advanced Openmp Tutorial PDF
No ratings yet
Uk Openmp Users 2018 Advanced Openmp Tutorial PDF
106 pages
Openmp
No ratings yet
Openmp
21 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Worksharing and Parallel Loops
No ratings yet
Worksharing and Parallel Loops
23 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
Beginning Perl
From Everand
Beginning Perl
Curtis Poe
4/5 (1)
Mastering Python
From Everand
Mastering Python
Rick van Hattem
No ratings yet
Unix
No ratings yet
Unix
12 pages
Ferret
No ratings yet
Ferret
6 pages
OpenMP Workshop Day 3
No ratings yet
OpenMP Workshop Day 3
91 pages
Mpi Advancedipcmoc15
No ratings yet
Mpi Advancedipcmoc15
69 pages
Parallel Io hdf5
No ratings yet
Parallel Io hdf5
53 pages
Tup Cpu Scheduling p2
No ratings yet
Tup Cpu Scheduling p2
24 pages
Parallel Processing:: Multiple Processor Organization
No ratings yet
Parallel Processing:: Multiple Processor Organization
24 pages
University of Okara, Okara: Department of Computer Science
No ratings yet
University of Okara, Okara: Department of Computer Science
5 pages
CS3391 OOPS UNIT 4 NOTES EduEngg
No ratings yet
CS3391 OOPS UNIT 4 NOTES EduEngg
27 pages
Module 3 & 4
No ratings yet
Module 3 & 4
11 pages
Deadlock
No ratings yet
Deadlock
41 pages
True False
No ratings yet
True False
5 pages
Kernel Threads MCQ (Free PDF) - Objective Question Answer For Kernel Threads Quiz - Download Now!
No ratings yet
Kernel Threads MCQ (Free PDF) - Objective Question Answer For Kernel Threads Quiz - Download Now!
25 pages
Chapter6 RL-ARM Real-Time Executive (RTX)
No ratings yet
Chapter6 RL-ARM Real-Time Executive (RTX)
16 pages
Deadlocks
No ratings yet
Deadlocks
29 pages
Multithreading in Java
No ratings yet
Multithreading in Java
14 pages
Threads in Operating Systems
No ratings yet
Threads in Operating Systems
2 pages
Unit 2-Methods of Handling Deadlock
No ratings yet
Unit 2-Methods of Handling Deadlock
16 pages
OS Assign No. 2 Program
No ratings yet
OS Assign No. 2 Program
8 pages
DBMS Final
No ratings yet
DBMS Final
9 pages
Unit-2 Os
No ratings yet
Unit-2 Os
61 pages
Aca Notes
No ratings yet
Aca Notes
148 pages
OS-Kings College Unit Wise
No ratings yet
OS-Kings College Unit Wise
8 pages
CPU Scheduling: CPU - I/O Burst Cycle
No ratings yet
CPU Scheduling: CPU - I/O Burst Cycle
4 pages
DC Syllabus
No ratings yet
DC Syllabus
8 pages
Adv. DB - Ch. 4-Advanced DB Chapter 4 Dt-2024!05!23 19-35-31
No ratings yet
Adv. DB - Ch. 4-Advanced DB Chapter 4 Dt-2024!05!23 19-35-31
15 pages
RTOS Module III
No ratings yet
RTOS Module III
13 pages
Chapter 4 - Concurrency Control111111111111
No ratings yet
Chapter 4 - Concurrency Control111111111111
43 pages
Cit315 Operating System
No ratings yet
Cit315 Operating System
184 pages
Review Questions Second Test
No ratings yet
Review Questions Second Test
25 pages
Postgraduate PG - Master Computer Applications Mca - Semester 2 - 2022 - May - Advanced dbms2020 Pattern
No ratings yet
Postgraduate PG - Master Computer Applications Mca - Semester 2 - 2022 - May - Advanced dbms2020 Pattern
4 pages
CS604 MIDTERM SOLVED MCQS by Rafiqe
No ratings yet
CS604 MIDTERM SOLVED MCQS by Rafiqe
43 pages
RDBMS Unit2
No ratings yet
RDBMS Unit2
28 pages
OSPES2UG23CS653
No ratings yet
OSPES2UG23CS653
5 pages

OpenMP Workshop Day 1

Uploaded by

OpenMP Workshop Day 1

Uploaded by

Programming OpenMP

• De-facto standard for Shared-Memory Parallelization.

• Parallel Region & Worksharing

• OpenMP: Shared-Memory Parallel Programming Model.

Memory All processors/cores access

Crossbar / Bus Real architectures are

Proc Proc Proc Proc Parallelization in OpenMP

• All threads have access to private

• Data transfer is through shared

• OpenMP programs start with Master Thread Serial Part

• The parallelism has to be expressed explicitly.

• Structured Block ◼ Specification of number of threads:

• From within a shell, global setting of the number of threads:

• From within a shell, one-time setting of the number of threads:

Hello OpenMP World

• Loops often account for most of a program‘s runtime!

• OpenMP barrier (implicit or explicit)

• All worksharing constructs contain an implicit barrier at the end

→ Equal parts of size ‘chunksize’

• Can all loops be parallelized with for-constructs? No!

• Do you think this solution scales well?

• Managing the Data Environment is the challenge of OpenMP.

• Scoping in OpenMP: Dividing variables in shared and private:

• Global / static variables can be privatized with the threadprivate directive

• Global / static variables can be privatized with the threadprivate directive

#pragma omp parallel

#pragma omp for do i = 25, 49

#pragma omp parallel

– Possible reduction operators with initialization value:

#pragma omp parallel for

#pragma omp parallel for private(fX,i) reduction(+:fSum)

→ … can be executed immediately

… when encountering a task construct → explicit task is created

… when encountering a target construct → target task is created

◼ Several scenarios are possible: Task pool

→ single creator, multiple creators, nested tasks (tasks & WS)

→ With OpenMP version 5.1 this becomes parallel masked

◼ Only one thread enters fib() from main().

fib(2) fib(1) fib(1) fib(0)

◼ See https://www.openmp.org/resources/openmp-compilers-tools/ for a list

→ Intel: -fopenmp or –qopenmp (classic) or –fiopenmp (next-gen)

→ AOCC, AOCL, ROCmCC: -fopenmp

→ IBM XL: -qsmp=omp

◼ Each hands-on exercise has a folder “solution”

You might also like