0% found this document useful (0 votes)

20 views35 pages

Lect11 Openmp1

Uploaded by

cvidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views35 pages

Lect11 Openmp1

Uploaded by

cvidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Programming shared memory systems:

OpenMP
 Various methods to program shared memory systems
 Introduction to OpenMP
Programming shared memory systems

 Create multiple threads to use multiple cores

o Use a specialized programming language designed for shared memory multi-
thread programming. E.g Cilk plus. These are not mainstream yet.
o Use library support for shared memory multi-threading. E.g. Python
multiprocessing shared memory module
o Use parallelizing compiler
o Use compiler directives that supplement a sequential programming language.
E.g. OpenMP (with C/C++, fortran)
o Use processes
o Use threads
Programming shared memory systems

 Although there are different ways for shared memory programming,

issues in programming shared memory systems are similar. Any
shared memory programming (or parallel programming) needs to
address two issues.
o How to create multiple execution threads, mostly the fork-join model.
o How to share data and coordinate – e.g. avoiding race condition

 We will review processes and threads, but focus on OpenMP

o OpenMP is widely supported: Gcc supports OpenMP
o OpenMP is relatively easy to understand and use
Use processes

 A topic in the UG OS class.

 Heavyweight
o Process creation, termination, and coordination are expensive.
 UNIX system calls: fork(), exit(), wait().
o Inter-process communication is somewhat awkward because process is designed
for isolation, not sharing.
 Explicit shared memory operations: shmget(), shmat(), and shmctl()

 Due to its overhead, although processes can be used for shared memory
programming, they are rarely used in practice.
Use threads

 A topics in the UG OS class

 A thread is a lightweight processes: creation, termination, communication
are all much cheaper than those for process.
 Pthreads: POSIX-compliant threads
o Thread creation, termination, and coordination: pthread_create(), pthread_exit(),
and pthread_join().
o Communication is through shared memory. Pthread routines to coordinating memory
access: pthread_mutex_lock(), pthread_cond_wait(), pthread_cond_signal() 。
 Very easy to write wrong programs. Some legacy system routines are still not “thread-safe.”
o Programmer needs to explicitly manage the threads.
OpenMP
 What does OpenMP stands for?
o Open specifications for Multi Processing via collaborative work between
interested parties from the hardware and software industry, government
and academia.
 OpenMP is an Application Program Interface (API) that may be used
to explicitly direct multi-threaded, shared memory parallelism.
o API components: Compiler Directives, Runtime Library Routines.
Environment Variables
 OpenMP is a directive-based method to invoke parallel computations on share-
memory multiprocessors
OpenMP

 OpenMP API is specified for C/C++ and Fortran.

 OpenMP is not intrusive to the original serial code: instructions
appear in comment statements for fortran and pragmas for C/C++.
 OpenMP website: http://www.openmp.org
o Materials in this lecture are taken from various OpenMP tutorials in the
website and other places.
Why OpenMP

 OpenMP is portable: It is jointly defined and endorsed by a group of

major hardware and software vendors.
o Hoping for an ANSI standard

 OpenMP can be implemented incrementally, one function or even

one loop at a time.
o A nice way to get a parallel program from a sequential program.
How to compile and run OpenMP programs?

 To compile: use gcc with flag -fopenmp

o Example: gcc –fopenmp a.c
o Try lect11/helloworld.cpp, lect11/helloworld1.cpp – see how easy it is to write wrong
shared memory program.
 To run: ‘a.out’
o To change the number of threads (the default is the number of cores on the machine):
 setenv OMP_NUM_THREADS 4 (tcsh) or export OMP_NUM_THREADS=4(bash)
 Put it togather, on linprog:
 gcc –fopenmp helloworld1.cpp
 Export OMP_NUM_THREADS = 4
 ./a.out -- run the program with 4 threads
OpenMP basic syntax

 Most of the constructs in OpenMP are compiler directives

#pragma omp construct [clause [clause] …]
o Example
#pragma omp parallel num_threads(8)
 OpenMP constructs usually apply to a structured block (one
statement).
 Function prototypes are in
#include <omp.h>
OpenMP “Hello world” program
#include <omp.h>
#include <iostream> OpenMP include file
#include <sstream>

using namespace std;

int main()
{
stringstream ss[100];
int nprocs;
OpenMP directive, parallel region with
the default number of threads
#pragma omp parallel
{
int total_threads = omp_get_num_threads();
int myid = omp_get_thread_num();
if (myid == 0) nprocs = total_threads; OpenMP library routines that return the number
ss[myid] << "Hello world, I am No." << myid << " out of "
<< total_threads << " Members\n"; Of threads and the thread ID respectively.
}
for (int i=0; i<nprocs; i++) {
string a; while (ss[i] >> a) cout << a << " "; cout << "\n";
}
return 0;
}
OpenMP execution model

 OpenMP uses the fork-join model of parallel execution.

o All OpenMP programs begin with a single master thread.
o The master thread executes sequentially until a parallel region is encountered, when it
creates a team of parallel threads (FORK).
o When the team threads complete the parallel region, they synchronize and terminate,
leaving only the master thread that executes sequentially (JOIN).
OpenMP general code structure
#include <omp.h>
main () {
int var1, var2, var3;
Serial code
...
/* Beginning of parallel section. Fork a team of threads. Specify variable scoping*/
#pragma omp parallel private(var1, var2) shared(var3)
{
/* Parallel section executed by all threads */
...
/* All threads join master thread and disband*/
}
Resume serial code
...
}
An example lect11/simple_multithread.cpp
 If variables are not specified as public or private. All variables outside the parallel
region are public variables (shared by all threads)
 What about variables declared inside the parallel block?
 What if the ‘nprocs = total_threads;’ statement in the program is moved before the if
statement?
 Easy for OpenMP to create threads
o Deterministic multi-threaded program: no race condition – no two threads can access one
memory location with one writing to it!
o Non-deterministic multi-threaded programs are mostly incorrect programs.
o Need to be mindful when making system/library routine calls.
Data model

• Private and shared variables

• Variables in the global data space are accessed
by all parallel threads (shared variables).
• Variables in a thread’s private space can only be
accessed by the thread (private variables)
• several variations, depending on the initial values and
whether the results are copied outside the region.

• OpenMP allows for specification of private and

shared variable before entering each parallel region.
FOR ( I = 0; I < ARRAYSIZE; I++ ) {

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Can this loop run in parallel? On what condition?

If one has a choice, one should parallelize inner or outer loop?
#PRAGMA OMP PARALLEL FOR PRIVATE( PRIVINDX, PRIVDBL )

FOR ( I = 0; I < ARRAYSIZE; I++ ) {

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Parallel for loop index variable is

private by default.
OpenMP directives

 Format:
#progma omp directive-name [clause,..] newline
(use ‘\’ for multiple lines)
 Example:
#pragma omp parallel default(shared) private(beta,pi)

 Scope of a directive is one block of statements { …}

 More detail in the specification at http://www.openmp.org/ (OpenMP
5.0 at https://www.openmp.org/spec-html/5.0/openmp.html)
Parallel region construct
 A block of code that will be executed by multiple threads.
#pragma omp parallel [clause …]
{
……
} (implied barrier)

Clauses: if ([parallel :]expression), num_thread(integer-expression), private (list),

shared (list), default (shared | none), reduction ([reduction-modifier,] reduction-
modifier: list), firstprivate(list), copyin(list), proc_bind(master | close | spread),
allocate([allocator:} list)

o private(list): everything private and local (no relation with variables outside the
block).
o shared(list): data accessed by all threads
o default (none| shared)
The reduction clause (see
lect11/reduction.cpp)
Sum = 0.0;
#pragma parallel for default(none) shared (n, x) private (I) reduction(+ : sum)
{
For(I=0; I<n; I++) sum = sum + x(I);
}

 Updating sum must avoid racing condition

 With the reduction clause, OpenMP generates code such that the
race condition is avoided.
o See also lect11/reduction1.cpp and lect11/reduction2.cpp.
Clauses
 Firstprivate(list): variables are private, but are initialized with the value
before entering the block.
 copyin(list): The copyin clause provides a mechanism to copy the value of
a threadprivate variable of the master thread to the threadprivate variable
of each other member of the team that is executing the parallel region.
 proc_bind(master | close | spread): thread affinity related,
to talk more in the next class (OpenMP for NUMA
architectures)
 allocate([allocator :] list): memory allocation related, to talk
more in the next class. (OpenMP for NUMA architectures)
Work-sharing constructs
 #pragma omp for [clause …]
 #pragma omp sections [clause …]
 #pragma omp single [clause …]

 These must be enclosed in parallel region

 No implied barrier on entry, implied barrier on exit (unless specified otherwise)
 “omp for”: loop iterations are distributed over the number of threads.
 “omp sections”: each section is assigned to one thread
 “omp single”: only one thread runs the block
The omp for directive: example
omp for shares loop iteration among different threads. It can have a schedule clause
schedule (static | dynamic | guided [, chunk])
The omp session clause – example (see also
lect11/simple_multithread1.cpp)
Omp single construct

 The single construct specifies that the associated structured block is

executed by only one of the threads in the team (not necessarily the
master thread), in the context of its implicit task. The other threads
in the team, which do not execute the block, wait at an implicit
barrier at the end of the single construct unless a nowait clause is
specified.
 See lect11/omp_single.cpp
Some merged constructs
Synchronization: barrier
For(I=0; I<N; I++)  Both loops are in parallel region
a[I] = b[I] + c[I];
With no synchronization in between.
For(I=0; I<N; I++) What is the problem?
d[I] = a[I] + b[I]
 The second loop should not start
For(I=0; I<N; I++)
a[I] = b[I] + c[I]; before all of the first loop finishes.

#pragma omp barrier

For(I=0; I<N; I++)

d[I] = a[I] + b[I]
Critical session
For(I=0; I<N; I++) {
……
Cannot be parallelized if sum is shared.
sum += A[I];
See lect11/reduction1.cpp.
……
What if we comment out ‘#pragma omp critical’ in
}
the program?
For(I=0; I<N; I++) {
……
#pragma omp critical
{
sum += A[I];
}
……
}
OpenMP environment variables

 OMP_NUM_THREADS
 OMP_SCHEDULE
OpenMP runtime environment

 omp_get_num_threads()
 omp_get_thread_num()
 omp_in_parallel()
 Routines related to locks
 ……
OpenMP examples

 lect11/piomp.c
Sequential Matrix Multiply

For (I=0; I<n; I++)

for (j=0; j<n; j++)
c[I][j] = 0;
for (k=0; k<n; k++)
c[I][j] = c[I][j] + a[I][k] * b[k][j];
OpenMP Matrix Multiply (lect11/mm_omp.c)

#pragma omp parallel for private(j, k)

For (I=0; I<n; I++)
for (j=0; j<n; j++)
c[I][j] = 0;
for (k=0; k<n; k++)
c[I][j] = c[I][j] + a[I][k] * b[k][j];
OpenMP Summary
o OpenMP provides a compact, yet powerful programming model for shared memory
programming
 It is very easy to use OpenMP to create parallel programs.
o OpenMP preserves the sequential version of the program
o Developing an OpenMP program:
 Start from a sequential program
 Identify the code segment that takes most of the time.
 Determine whether the important loops can be parallelized
 The loops may have critical sections, reduction variables, etc
 Determine the shared and private variables.
 Add directives

Openmp 1
No ratings yet
Openmp 1
38 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Openmp
No ratings yet
Openmp
115 pages
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
No ratings yet
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
61 pages
Javascript Ppt
No ratings yet
Javascript Ppt
39 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
CS8083 UNIT III Notes
No ratings yet
CS8083 UNIT III Notes
26 pages
Syllabus: Certificate Course in Information Technology (Cit) Duration:-6 Months
No ratings yet
Syllabus: Certificate Course in Information Technology (Cit) Duration:-6 Months
2 pages
Mcap-lab Manual 1
No ratings yet
Mcap-lab Manual 1
19 pages
openmp_HPC_ass1
No ratings yet
openmp_HPC_ass1
43 pages
csc186 Group Project Report Bus Reservation System
No ratings yet
csc186 Group Project Report Bus Reservation System
27 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Lecture 10 Shared Memory Programming with OpenMP.pptx
No ratings yet
Lecture 10 Shared Memory Programming with OpenMP.pptx
30 pages
Updated_CS8083 MCP UNIT III notes
No ratings yet
Updated_CS8083 MCP UNIT III notes
26 pages
04
No ratings yet
04
39 pages
OPENMP
No ratings yet
OPENMP
37 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 19-Aug-2021 Module 2 - OpenMP
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 19-Aug-2021 Module 2 - OpenMP
10 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
OpenMP_SPM
No ratings yet
OpenMP_SPM
9 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
HS19 MSE Master Thesis Report PUBLIC Stefan Kapferer SDDD Modeling Framework
No ratings yet
HS19 MSE Master Thesis Report PUBLIC Stefan Kapferer SDDD Modeling Framework
171 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Openmp 2pp
No ratings yet
Openmp 2pp
15 pages
UNIT 3
No ratings yet
UNIT 3
13 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Lamdas
No ratings yet
Lamdas
54 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Proceso S
No ratings yet
Proceso S
45 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Unit III
No ratings yet
Unit III
15 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Open MP
No ratings yet
Open MP
35 pages
Openmp
No ratings yet
Openmp
21 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
Midterm OOP
No ratings yet
Midterm OOP
6 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
Num Tech
No ratings yet
Num Tech
39 pages
Openmp Programming: Aiichiro Nakano
No ratings yet
Openmp Programming: Aiichiro Nakano
10 pages
All Documnts Compressed
No ratings yet
All Documnts Compressed
18 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
OpenMP Tutorial - Lawrence Livermore National Laboratory
No ratings yet
OpenMP Tutorial - Lawrence Livermore National Laboratory
75 pages
SE Study Material
No ratings yet
SE Study Material
112 pages
Intro Fork 1
No ratings yet
Intro Fork 1
10 pages
Open MP
No ratings yet
Open MP
30 pages
CPE001L Laboratory Task 2 Manual
No ratings yet
CPE001L Laboratory Task 2 Manual
3 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Security Checkup R81.20 2022
No ratings yet
Security Checkup R81.20 2022
42 pages
Acceptance Criteria
No ratings yet
Acceptance Criteria
2 pages
agil cv
No ratings yet
agil cv
1 page
Sockets
No ratings yet
Sockets
23 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Bug Life Cycle in Software Development
No ratings yet
Bug Life Cycle in Software Development
4 pages
DevOps Approach Project Plan
No ratings yet
DevOps Approach Project Plan
3 pages
ICT Assignment 4
No ratings yet
ICT Assignment 4
5 pages
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
MySQL Triggers
No ratings yet
MySQL Triggers
21 pages
HTML Structure & Syntax
No ratings yet
HTML Structure & Syntax
15 pages
An Introduction To Scrum. The Agile Manifesto A Statement of Values
No ratings yet
An Introduction To Scrum. The Agile Manifesto A Statement of Values
23 pages
Unit 3 - OOPs Concepts
No ratings yet
Unit 3 - OOPs Concepts
40 pages
Abap Hana Adt Debugging
No ratings yet
Abap Hana Adt Debugging
9 pages
Gr11 Nov Prac 2018 Final
No ratings yet
Gr11 Nov Prac 2018 Final
6 pages
FLIR Ariel Full HD IP Camera CB 3102 - Quick Install Guide
No ratings yet
FLIR Ariel Full HD IP Camera CB 3102 - Quick Install Guide
2 pages
Programs - CS: Using Using Using Using Using Using Namespace Class New Public Static Void
No ratings yet
Programs - CS: Using Using Using Using Using Using Namespace Class New Public Static Void
4 pages
Polimorfism. Clase Abstracte.: Polimorfismul
No ratings yet
Polimorfism. Clase Abstracte.: Polimorfismul
14 pages
Lab 04 - Exception Handling and Event Handling (Answers) PDF
No ratings yet
Lab 04 - Exception Handling and Event Handling (Answers) PDF
5 pages
Jython Programming
No ratings yet
Jython Programming
99 pages
Cypress Commands Cheat Sheet
No ratings yet
Cypress Commands Cheat Sheet
10 pages
Jurnal Hutan Lestari: Refbacks
No ratings yet
Jurnal Hutan Lestari: Refbacks
1 page
FSD Objective Final
No ratings yet
FSD Objective Final
1 page
How To Install Kali Nethunter Android Termux No Root
75% (4)
How To Install Kali Nethunter Android Termux No Root
6 pages
Kubernaties
No ratings yet
Kubernaties
36 pages

Lect11 Openmp1

Uploaded by

Lect11 Openmp1

Uploaded by

Programming shared memory systems:

 Create multiple threads to use multiple cores

 Although there are different ways for shared memory programming,

 We will review processes and threads, but focus on OpenMP

 A topic in the UG OS class.

 A topics in the UG OS class

 OpenMP API is specified for C/C++ and Fortran.

 OpenMP is portable: It is jointly defined and endorsed by a group of

 OpenMP can be implemented incrementally, one function or even

 To compile: use gcc with flag -fopenmp

 Most of the constructs in OpenMP are compiler directives

using namespace std;

 OpenMP uses the fork-join model of parallel execution.

• Private and shared variables

• OpenMP allows for specification of private and

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Can this loop run in parallel? On what condition?

FOR ( I = 0; I < ARRAYSIZE; I++ ) {

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Parallel for loop index variable is

 Scope of a directive is one block of statements { …}

Clauses: if ([parallel :]expression), num_thread(integer-expression), private (list),

 Updating sum must avoid racing condition

 These must be enclosed in parallel region

 The single construct specifies that the associated structured block is

#pragma omp barrier

For(I=0; I<N; I++)

For (I=0; I<n; I++)

#pragma omp parallel for private(j, k)

You might also like