0% found this document useful (0 votes)

28 views32 pages

4 Openmp

This document provides an overview of OpenMP, a programming model for parallel programming on shared memory architectures. It discusses key OpenMP concepts like parallel constructs, work sharing directives, synchronization constructs, and memory models. Examples are provided for parallelizing a matrix multiplication using OpenMP pragmas and scheduling clauses. Overall, the document introduces the basic syntax and execution model of OpenMP for parallelizing loops and distributing work across threads.

Uploaded by

girishcherry12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views32 pages

4 Openmp

Uploaded by

girishcherry12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

CSL 860: Modern Parallel

Computation
Hello OpenMP
#pragma omp parallel
{
// I am now thread i of n
switch(omp_get_thread_num()) { Parallel
case 0 : blah1..
case 1: blah2.. Construct
}
}
// Back to normal

• Extremely simple to use and incredibly powerful

• Fork-Join model
• Every thread has its own execution context
• Variables can be declared shared or private
Execution Model
• Encountering thread creates a team:
– Itself (master) + zero or more additional threads.
• Applies to structured block immediately following
– Each thread executes a copy of the code in {}
• But, also see: Work-sharing constructs
• There’s an implicit barrier at the end of block
• Only master continues beyond the barrier
• May be nested
– Sometimes disabled by default
Memory Model
• Notion of temporary view of memory
– Allows local caching
– Need to flush memory
– T1 writes -> T1 flushes -> T2 flushes -> T2 reads
– Same order seen by all threads
• Supports threadprivate memory
• Variables declared before parallel construct:
– Shared by default
– May be designated as private
– n-1 copies of the original variable is created
• May not be initialized by the system
Shared Variables
• Heap allocated storage
• Static data members
• const-qualified (no mutable members)
• Private:
– Variables declared in a scope inside the construct
– Loop variable in for construct
• private to the construct
• Others are shared unless declared private
– You can change default
• Arguments passed by reference inherit from original
Beware of Compiler Re-ordering

a=b=0
thread 1 thread 2

b=1 a=1
flush(b); flush(a); flush(a); flush(b);
if (a == 0) { if (b == 0) {
critical section critical section
} }
Beware more of Compiler Re-ordering

// Parallel construct
{
int b = initialSalary
print(“Initial Salary was %d\n”, initialSalary);
Book-keeping() // No read b or write initialSalary

if (b < 10000) {
raiseSalary(500);
}
}
Thread Control
E nvironment Variable Ways to modify value Way to retrieve value Initial value

OMP_NUM_THREADS omp_set_num_threads omp_get_max_threads Implementation

* defined

OMP_DYNAMIC omp_set_dynamic omp_get_dynamic Implementation

defined

OMP_NESTED omp_set_nested omp_get_nested false

OMP_SCHEDULE Implementation
* defined

* Also see construct clause: num_threads, schedule

Parallel Construct
#pragma omp parallel \
if(boolean) \
private(var1, var2, var3) \
firstprivate(var1, var2, var3) \
default(shared | none) \
shared(var1, var2), \
copyin(var2), \
reduction(operator:list) \
num_threads(n)
{
}
Parallel Loop
#pragma omp parallel for
for (i= 0; i < N; ++i) {
blah …
}
• No of iterations must be known when the
construct is encountered
– Must be the same for each thread
• Compiler puts a barrier at the end of parallel for
– But see nowait
Parallel For
#pragma omp for \
private(var1, var2, var3) \
firstprivate(var1, var2, var3) \
lastprivate(var1, var2), \
reduction(operator: list), \
ordered, \
schedule(kind[, chunk_size]), \
nowait
Canonical For Loop
No loop break
Schedule(kind[, chunk_size])
• Divide iterations into contiguous sets, chunks
– chunks are assigned transparently to threads
• static: iterations are divided among threads in a round-robin
fashion
– When no chunk_size is specified, approximately equal chunks are
made
• dynamic: iterations are assigned to threads in ‘request order’
– When no chunk_size is specified, it defaults to 1.
• guided: like dynamic, the size of each chunk is proportional to the
number of unassigned iterations divided by the number of threads
– If chunk_size =k, chunks have at least k iterations (except the last)
– When no chunk_size is specified, it defaults to 1.
• runtime: taken from environment variable
Single
#pragma omp parallel
{
#pragma omp for
for( int i=0; i<N; i++ ) a[i] = f0(i);
#pragma omp single
x = f1(a);
#pragma omp for
for(int i=0; i<N; i++ ) b[i] = x * f2(i);
}
• Only one of the threads executes
• Other threads wait for it
– unless NOWAIT is specified
• Hidden complexity
– Threads may be at different instructions
Sections
#pragma omp sections
{
#pragma omp section
{
// do this …
}
#pragma omp section
{
// do that …
}
// …
}
• The omp section directives must be closely nested in a sections construct,
where no other work-sharing construct may appear.
Private Variables
#pragma omp parallel private (size, …) for
for ( int i = 0; i = numThreads; i++) {
int size = numTasks/numThreads;
int extra = numTasks – numThreads*size;
if(i < extra) size ++;
doTask(i, size, numThreads);
}

doTask(int start, int count)

{ // Each thread’s instance has its own activation record
for(int i = 0, t=start; i< count; i++; t+=stride)
doit(t);
}
}
Firstprivate and Lastprivate
• Initial value of private variable is unspecified
– firstprivate initializes copies with the original
– Once per thread (not once per iteration)
– Original exists before the construct
• Only the original copy is retained after the construct
• lastprivate forces sequential-like behavior
– thread executing the sequentially last iteration (or last
listed section) writes to the original copy
Firstprivate and Lastprivate
#pragma omp parallel for firstprivate( simple )
for (int i=0; i<N; i++) {
simple += a[f1(i, omp_get_thread_num())]
f2(simple);
}

#pragma omp parallel for lastprivate( doneEarly )

for( i=0; (i<N || doneEarly; i++ ) {
doneEarly = f0(i);
Other Synchronization Directives
#pragma omp master
{
}
– binds to the innermost enclosing parallel region
– Only the master executes
– No implied barrier
Master Directive
#pragma omp parallel
{
#pragma omp for
for( int i=0; i<100; i++ ) a[i] = f0(i);
Only master executes.
#pragma omp master No synchronization.
x = f1(a);
}
Critical Section
#pragma omp critical accessBankBalance
{
}
– A single thread at a time
– Applies to all threads
– The name is optional; no name implies global
critical region
Barrier Directive
#pragma omp barrier
– Stand-alone
– Binds to inner-most parallel region
– All threads in the team must execute
• they will all wait for each other at this instruction
• Dangerous:
if (! ready)
#pragma omp barrier

– Same sequence of work-sharing and barrier for the

entire team
Ordered Directive
#pragma omp ordered
{
}
• Binds to inner-most enclosing loop
• The structured block executed in sequential
order
• The loop must declare the ordered clause
• May encounter only one ordered regions
Flush Directive
#pragma omp flush (var1, var2)
– Stand-alone, like barrier
– Only directly affects the encountering thread
– List-of-vars ensures that any compiler re-ordering
moves all flushes together
Atomic Directive
#pragma omp atomic
i++;

• Light-weight critical section

• Only for some expressions
– x = expr (no mutual exclusion on expr evaluation)
– x++
– ++x
– x--
– --x
Reductions
• Reductions are so common that OpenMP provides
support for them
• May add reduction clause to parallel for
pragma
• Specify reduction operation and reduction variable
• OpenMP takes care of storing partial results in
private variables and combining partial results after
the loop
reduction Clause
• reduction (<op> :<variable>)
– + Sum
– * Product
– & Bitwise and
– | Bitwise or
– ^ Bitwise exclusive or
– && Logical and
– || Logical or
• Add to parallel for
– OpenMP creates a loop to combine copies of the
variable
– The resulting loop may not be parallel
Nesting Restrictions
• A work-sharing region may not be closely nested inside a
work-sharing, critical, ordered, or master region.
• A barrier region may not be closely nested inside a work-
sharing, critical, ordered, or master region.
• A master region may not be closely nested inside a work-
sharing region.
• An ordered region may not be closely nested inside a
critical region.
• An ordered region must be closely nested inside a loop
region (or parallel loop region) with an ordered clause.
• A critical region may not be nested (closely or otherwise)
inside a critical region with the same name. Note that this
restriction is not sufficient to prevent deadlock
EXAMPLES
OpenMP Matrix Multiply
#pragma omp parallel for
for(int i=0; i<n; i++ )
for( int j=0; j<n; j++ ) {
c[i][j] = 0.0;
for(int k=0; k<n; k++ )
c[i][j] += a[i][k]*b[k][j];
}
• a, b, c are shared
• i, j, k are private
OpenMP Matrix Multiply: Triangular
#pragma omp parallel for schedule (dynamic, 1 )
for( int i=0; i<n; i++ )
for( int j=i; j<n; j++ ) {
c[i][j] = 0.0;
for(int k=i; k<n; k++ )
c[i][j] += a[i][k]*b[k][j];
}
• This multiplies upper-triangular matrix A with B
• Unbalanced workload
– Schedule improves this
OpenMP Jacobi
for some number of timesteps/iterations {
#pragma omp parallel for
for (int i=0; i<n; i++ )
for( int j=0, j<n, j++ )
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );
#pragma omp parallel for
for( int i=0; i<n; i++ )
for( int j=0; j<n; j++ )
grid[i][j] = temp[i][j];
}
• This could be improved by using just one parallel region
• Implicit barrier after loops eliminates race on grid
OpenMP Jacobi
for some number of timesteps/iterations {
#pragma omp parallel for
for (int i=0; i<n; i++ )
for( int j=0, j<n, j++ ) {
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );
#pragma omp barrier
grid[i][j] = temp[i][j];
}
}

• Is barrier sufficient?
• What change to the code is needed?
– Recall barrier is per-team

Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
LCD LED Display Device Training
No ratings yet
LCD LED Display Device Training
36 pages
MATHEMATICS 7-10 Edited LAS WEEK 1 AND 2
100% (2)
MATHEMATICS 7-10 Edited LAS WEEK 1 AND 2
5 pages
Database Management System
No ratings yet
Database Management System
9 pages
Network Analysis and Architecture Evaluation
50% (2)
Network Analysis and Architecture Evaluation
21 pages
PM Debug Info
No ratings yet
PM Debug Info
1,001 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Project Management Office Checklist
100% (1)
Project Management Office Checklist
10 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Openmp
No ratings yet
Openmp
115 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
CSL 730: Parallel Programming: Openmp
No ratings yet
CSL 730: Parallel Programming: Openmp
74 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
Omp Sync Data Runtime Environment
No ratings yet
Omp Sync Data Runtime Environment
59 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Unit III
No ratings yet
Unit III
15 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Excelente
No ratings yet
Excelente
64 pages
Encabit-Plus Mounting Description.V160.en
No ratings yet
Encabit-Plus Mounting Description.V160.en
3 pages
OpenMP-More Directives
No ratings yet
OpenMP-More Directives
44 pages
Openmp: Martin Kruliš Ji Ří Dokulil
No ratings yet
Openmp: Martin Kruliš Ji Ří Dokulil
38 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
Parallel Programming Module 3
No ratings yet
Parallel Programming Module 3
44 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Coursera Certificate JavaScript Jquery and JSON
No ratings yet
Coursera Certificate JavaScript Jquery and JSON
1 page
Bali Options: Bali Hospitality Group Mobile: Phone: Fax: Email: Website
No ratings yet
Bali Options: Bali Hospitality Group Mobile: Phone: Fax: Email: Website
5 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Online TV Shows Pitch Deck by Slidesgo
No ratings yet
Online TV Shows Pitch Deck by Slidesgo
51 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
(May-2022) New PassLeader DP-900 Exam Dumps
No ratings yet
(May-2022) New PassLeader DP-900 Exam Dumps
8 pages
Open MP
No ratings yet
Open MP
35 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
OpenMP Reference
No ratings yet
OpenMP Reference
2 pages
Cantonment Board Faizabad Distt-Ayodhya Employment Notice
No ratings yet
Cantonment Board Faizabad Distt-Ayodhya Employment Notice
10 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Programming 1 Course Outline
No ratings yet
Programming 1 Course Outline
3 pages
Open MP
No ratings yet
Open MP
28 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
ASEEx Slides
No ratings yet
ASEEx Slides
87 pages
National Career Service Portal: User Manual - COUNSELLOR v4.0
No ratings yet
National Career Service Portal: User Manual - COUNSELLOR v4.0
45 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
W8L2 OpenMP7 Tasks
No ratings yet
W8L2 OpenMP7 Tasks
21 pages
3unit3 Mca Pecnotes
No ratings yet
3unit3 Mca Pecnotes
23 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Num Tech
No ratings yet
Num Tech
39 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
DBMS Experiment-6
No ratings yet
DBMS Experiment-6
8 pages
Ift 211-1
No ratings yet
Ift 211-1
16 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
HPC - Unit 3
No ratings yet
HPC - Unit 3
15 pages
Unit 3
No ratings yet
Unit 3
13 pages
Cao Da1
No ratings yet
Cao Da1
9 pages
The Net Exam
No ratings yet
The Net Exam
73 pages
OpenMP and MPI Multiple Choice Questions (MCQS) For Exam Preparation
No ratings yet
OpenMP and MPI Multiple Choice Questions (MCQS) For Exam Preparation
13 pages
Lesson 1b Introduction To Python
No ratings yet
Lesson 1b Introduction To Python
19 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Tutorial - SD - Bass Diffusion Model - ENG
No ratings yet
Tutorial - SD - Bass Diffusion Model - ENG
41 pages
TRA Declarant and Audit Firm Appointment Process (E-Filling)
No ratings yet
TRA Declarant and Audit Firm Appointment Process (E-Filling)
34 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Seminar Report 2012-13 E-Wallet
No ratings yet
Seminar Report 2012-13 E-Wallet
21 pages
Class-IV Worksheet 5
No ratings yet
Class-IV Worksheet 5
12 pages
Parkinson Disease Detection Using Deep Neural Networks
No ratings yet
Parkinson Disease Detection Using Deep Neural Networks
4 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Ecl 310
No ratings yet
Ecl 310
16 pages
CRM Unit-1
No ratings yet
CRM Unit-1
23 pages
4.OpenMP Done
No ratings yet
4.OpenMP Done
3 pages
TLE-12 Q1 LAS CSS Wk1Day1
No ratings yet
TLE-12 Q1 LAS CSS Wk1Day1
2 pages
Adv Sec Arch Spec Parnter Req Etmg en
No ratings yet
Adv Sec Arch Spec Parnter Req Etmg en
5 pages
x4PCIE Splitter4 Hwmanual-R1.4
No ratings yet
x4PCIE Splitter4 Hwmanual-R1.4
16 pages
ARC400 Update Manual
No ratings yet
ARC400 Update Manual
11 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

4 Openmp

Uploaded by

4 Openmp

Uploaded by

CSL 860: Modern Parallel

• Extremely simple to use and incredibly powerful

OMP_NUM_THREADS omp_set_num_threads omp_get_max_threads Implementation

OMP_DYNAMIC omp_set_dynamic omp_get_dynamic Implementation

OMP_NESTED omp_set_nested omp_get_nested false

* Also see construct clause: num_threads, schedule

doTask(int start, int count)

#pragma omp parallel for lastprivate( doneEarly )

– Same sequence of work-sharing and barrier for the

• Light-weight critical section

You might also like