0% found this document useful (0 votes)

69 views39 pages

Num Tech

This document discusses programming shared memory parallelism using OpenMP. It introduces OpenMP directives for specifying parallel regions and worksharing loops. It covers data handling clauses like private, firstprivate, lastprivate, and reductions. It provides examples of OpenMP programs and discusses key concepts like fork-join parallelism, worksharing with for directives, and data environment directives.

Uploaded by

andres python

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views39 pages

Num Tech

Uploaded by

andres python

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Parallel Computing

平行計算
Tien-Hsiung Weng
翁添雄
大葉大學資工系
[email protected]

Lecture 4
Programming Shared-Address
Space with OpenMP

Lecture 4
Topics
• Introduction to OpenMP
• OpenMP directives
– specifying concurrency
• parallel regions
• loops, task parallelism
• Synchronization directives
– reductions, barrier, critical, ordered
• Data handling clauses
– shared, private, firstprivate, lastprivate
• Library primitives
• Environment variables
• Example of SPMD style OpenMP program
Introduction to OpenMP
• Open specifications for Multi Processing
• An API for explicit multi-threaded, shared memory parallelism
• Three components
– compiler directives
– runtime library routines
– environment variables
• Higher-level programming model than Pthreads
– support for concurrency, synchronization, and data handling
– not mutexes, condition variables, data scope, and initialization
• Portable
– API is specified for C/C++ and Fortran
– implementations on many platforms (most Unix, Windows NT)
• Standardized
Introduction to OpenMP
• Parallelism is explicit
– It is not an automatic programming programming model
– programmer full control (and responsibility) over
parallelization
• No data locality control
– No guaranteed to make the most efficient use of shared
memory
• Not necessarily implemented identically by all
vendors
• designed for shared address spaced machines
– Not for distributed memory parallel systems (by itself)
Introduction to OpenMP
• Advantages
– Ease of use
– Enables incremental parallelization of a
serial program
– Supports both coarse-grain and fine-grain
parallelism
– Portable
– Standard
OpenMP: Fork-Join Parallelism
• OpenMP program begins execution as a single
master thread
• Master thread executes sequentially until 1st parallel
region
• When a parallel region is encountered, master thread
– creates a group of threads
– becomes the master of this group of threads
– is assigned the thread id 0 within the group
OpenMP Directive Format
C and C++ use compiler directives
– prefix: #pragma …
• Fortran uses significant comments
– prefixes: !$OMP, C$OMP, *$OMP
• A directive consists of a directive name
followed by clauses
• C:
– #pragma omp parallel default(shared) private(i,j)
• Fortran:
– !$OMP PARALLEL DEFAULT(SHARED)
PRIVATE(i,j)
OpenMP parallel Region
Directives
#pragma omp parallel [clause list]

Possible clauses in [clause list]

• Conditional parallelization
– if (scalar expression)
• determines whether the parallel construct creates threads
• Degree of concurrency
– num_threads(integer expression)
• Specifies the number of threads to create
• Data Handling
– private (variable list)
• specifies variables local to each thread
– firstprivate (variable list)
• similar to the private
• private variables are initialized to variable value before the parallel directive
– shared (variable list)
• specifies that variables are shared across all the threads
Interpreting an OpenMP Parallel
Directive
#pragma omp parallel if (is_parallel==1) num_threads(8) \
private (a) shared (b) firstprivate(c) default(none)
{
/* structured block */
}

Meaning
• if (is_parallel== 1) num_threads(8)
– If the value of the variable is_parallel is one, create 8 threads
• private (a) shared (b)
– each thread gets private copies of variables a and c
– each thread shares a single copy of variable b
• firstprivate(c)
– each private copy of c is initialized with the value of c in main thread when the parallel
directive is encountered
• default(none)
– default state of a variable is specified as none (rather than shared)
– signals error if not all variables are specified as shared or private
OpenMP Programming Model
OpenMP

Pthread

• A sample OpenMP program along with its Pthreads

translation that might be performed by an OpenMP
compiler.
Reduction Clause in OpenMP
• The reduction clause specifies how multiple local copies
of a variable at different threads are combined into a
single copy at the master when threads exit.
• The usage of the reduction clause is reduction (operator:
variable list).
• The variables in the list are implicitly specified as being
private to threads.
• The operator can be one of +, *, -, &, |, ^, &&, and ||.
#pragma omp parallel reduction(+: sum) num_threads(8) {
/* compute local sums here */
}
/*sum here contains sum of all local instances of sums */
OpenMP Programming: Example
/* ******************************************************
An OpenMP version of a threaded program to compute PI.
****************************************************** */
#pragma omp parallel default(private) shared (npoints) \
reduction(+: sum) num_threads(8)
{
num_threads = omp_get_num_threads();
sample_points_per_thread = npoints / num_threads;
sum = 0;
for (i = 0; i < sample_points_per_thread; i++) {
rand_no_x =(double)(rand_r(&seed))/(double)((2<<14)-1);
rand_no_y =(double)(rand_r(&seed))/(double)((2<<14)-1);
if (((rand_no_x - 0.5) * (rand_no_x - 0.5) +
(rand_no_y - 0.5) * (rand_no_y - 0.5)) < 0.25)
sum ++;
}
}
Worksharing DO/for Directive
• for directive partitions parallel iterations
across threads
• DO is the analogous directive for Fortran
• Usage:
#pragma omp for [clause list]
/* for loop */
• Possible clauses in [clause list]
– private, firstprivate, lastprivate
– reduction
– schedule, nowait, and ordered
• Implicit barrier at end of for loop
Using Worksharing for Directive
#pragma omp parallel default(private) shared (npoints) \
reduction(+: sum) num_threads(8)
{
worksharing for divides work
sum = 0;
#pragma omp for
for (i = 0; i < npoints; i++) {
rand_no_x =(double)(rand_r(&seed))/(double)((2<<14)-1);
rand_no_y =(double)(rand_r(&seed))/(double)((2<<14)-1);
if (((rand_no_x - 0.5) * (rand_no_x - 0.5) +
(rand_no_y - 0.5) * (rand_no_y - 0.5)) < 0.25)
sum ++;
}
} Implicit barrier at end of loop
Example: Matrix Multiply
#pragma omp parallel for
for(i=0; i<n; i++)
for(j=0; j<n; j++) {
c[i][j] =0.0;
for (k=0; k<n; k++)
c[i][j] += a[i][k]*b[k][i];
}

a,b,c are shared

i,j,k are private
Private Variables
#pragma omp parallel for private(list)

• Compiler sets up a private copy of each

variable in the list for each thread
• Our examples use OpenMP for and DO
• But these apply to other region and
worksharing directives
• For compiler: thread has its own stack
Example: Private Variables
for (i=0; i<n; i++) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
}

Swaps the values of a and b

Loop-carried dependence on tmp
Easily fixed by privatizing tmp
Example: Private Variables
#pragma omp parallel for private(tmp)
for (i=0; i<n; i++) {
tmp = a[i];
a[i] = b[i];
b[i] = tmp;
}

Removes dependence on tmp

Would be more difficult to do in Pthreads
Example: Private Variables

for (i=0; i<n; i++) {

tmp[i] = a[i];
a[i] = b[i];
b[i] = tmp[i];
}

Requires sequential program change

Wasteful in space, O(n) vs O(p)
Example: Private Variables

F()
{ int tmp; /* local allocation on stack */
for (i=0; i<n; i++) {
tmp[i] = a[i];
a[i] = b[i];
b[i] = tmp[i];
}
}

So, tmp is local to each thread

Firstprivate and Lastprivate
The initial and final values of private variables
are unspecified

A firstprivate variable is private, and the private

copies are initialized using its value before the
loop

A lastprivate variable is private, and the thread

executing the {sequentially last
iteration/lexically last section} updates the
version of the object outside the parallel region
Example: Firstprivate and
Lastprivate
for(i=0; (i<n) && b[i]; i++)
a[i] = b[i];
for(i=0; j<n; j++)
a[j] = 1.0;

Sets all elements of a to the value of the

corresponding element in b, up to first zero
value in b
Sets all further elements of a to 1.0
Example: Firstprivate and
Lastprivate
#pragma omp parallel for lastprivate(i)
for(i=0; (i<n) && b[i]; i++)
a[i] = b[i];
#pragma omp parallel for firstprivate(i)
for(i=0; j<n; j++)
a[j] = 1.0;

Sets all elements of a to the value of the

corresponding element in b, up to first zero
value in b
Sets all further elements of a to 1.0
Data Environment Directives
Private
Firstprivate
Lastprivate
Reduction
Threadprivate
Copyin

For good performance, OpenMP code should use

private variables whenever possible
Reduces cache problems

However, this could waste a lot of memory

Use of reductions also extremely important
Mapping Iterations to Threads
schedule clause of the for directive
• Recipe for mapping iterations to threads
• Usage: schedule(scheduling_class[, parameter]).
• Four scheduling classes
– static: work partitioned at compile time
• iterations statically divided into pieces of size chunk
• statically assigned to threads
– dynamic: work evenly partitioned at run time
• iterations are divided into pieces of size chunk
• chunks dynamically scheduled among the threads
• when a thread finishes one chunk, it is dynamically assigned another
• default chunk size is 1
– guided: guided self-scheduling
• chunk size is exponentially reduced with each dispatched piece of work
• the default chunk size is 1
– runtime:
• scheduling decision from environment variable OMP_SCHEDULE
• illegal to specify a chunk size for this clause.
Statically Mapping Iterations to
Threads
• /* static scheduling of matrix multiplication loops */
• #pragma omp parallel default(private) \
• shared (a, b, c, dim) num_threads(4)
• #pragma omp for schedule(static)
• for (i = 0; i < dim; i++) {
• for (j = 0; j < dim; j++) {
• c(i,j) = 0;
• for (k = 0; k < dim; k++) {
• c(i,j) += a(i, k) * b(k, j);
• }
• }
• }

• static schedule maps iterations to threads at compile time

Avoiding Unwanted Synchronization

• Default: worksharing for loops end with

an implicit barrier
• Often, less synchronization is
appropriate
– series of independent for-directives within
a parallel construct
• nowait clause
– modifies a for directive
– avoids implicit barrier at end of for
Avoiding Synchronization with
nowait
#pragma omp parallel
{
#pragma omp for nowait
for (i = 0; i < nmax; i++)
if (isEqual(name, current_list[i])
processCurrentName(name);
#pragma omp for
for (i = 0; i < mmax; i++)
if (isEqual(name, past_list[i])
processPastName(name);
}

any thread can begin second loop immediately without

waiting for other threads to finish first loop
Using the sections Directive
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{ taskA();
}
#pragma omp section
{ taskB();
}
#pragma omp section
{ taskC();
}
}
}

parallel section encloses all parallel work

sections: task parallelism
three concurrent tasks
Synchronization Constructs in
OpenMP
#pragma omp barrier
#pragma omp single [clause list]
structured block
#pragma omp master
structured block
Use MASTER instead of SINGLE wherever possible
MASTER = IF-statement with no implicit BARRIER
equivalent to IF(omp_get_thread_num() == 0) {...}
SINGLE: implemented like other worksharing constructs
keeping track of which thread reached SINGLE first adds
overhead
Synchronization Constructs in
OpenMP
#pragma omp critical [(name)]
structured block
#pragma omp ordered
structured block

Similar to Pthreads mutex locks

critical section: like a named lock

for loops with carried dependences
Example Using critical

#pragma omp parallel

{
#pragma omp for nowait shared(best_cost)
for (i = 0; i < nmax; i++) {
int my_cost;
…
#pragma omp critical
{ if (best_cost <my_cost)
best_cost = my_cost;
}…
}
}
critical ensures mutual exclusion
when accessing shared state
Example Using ordered

#pragma omp parallel

{
#pragma omp for nowait shared(best_cost)
for (k = 0; k < nmax; k++) {
…
#pragma omp ordered
{ a[k] = a[k-1] +
…;
}…
}
}
ordered ensures carried dependence does not cause a data race
OpenMP Library Functions

Processor count
int omp_get_num_procs(); /* # PE currently available */
int omp_in_parallel(); /* determine whether running in parallel */

Thread count and identity

/* max # threads for next parallel region. only call in serial region */
void omp_set_num_threads(int num_threads);
int omp_get_num_threads(); /*# threads currently active */
int omp_get_max_threads(); /* max # concurrent threads */
int omp_get_thread_num(); /* thread id */
OpenMP Environment Variables

• OMP_NUM_THREADS
– specifies the default number of threads for a
parallel region
• OMP_SET_DYNAMIC
– specfies if the number of threads can be
dynamically changed
• OMP_NESTED
– enables nested parallelism
• OMP_SCHEDULE
– specifies scheduling of for-loops if the clause
specifies runtime
OpenMP SPMD Style

• SPMD (Single Program Multiple Data)

• The same program on each CPU
accessed different data
OpenMP SPMD Style
#include <omp.h>
main()
{ long int i;
long int A[1000000];
float B[1000000];
float c[1000000];
printf("omp_get_num_procs = %4d \n",omp_get_num_procs());
printf("omp_get_max_threads = %4d \n",omp_get_max_threads());
#pragma omp parallel
{
#pragma omp for
for (i=1; i<=1000000; i++)
{ A[i] = i;
B[i] = A[i] *2.3;
}
}
for(i=10; i<=100; i++) printf(" %7d ",A[i]);
printf("\n");
}
#include <omp.h>
long int A[1000000];
int mystart, myend;
OpenMP SPMD Example
float B[1000000];
void mywork(int, int);

#pragma omp threadprivate(mystart, myend)

main()
{ long int i, iam;
int N, nthreads, chunk,temp;

#pragma omp parallel private(iam, nthreads, chunk)

{ nthreads = omp_get_num_threads();
iam = omp_get_thread_num();
void mywork(int mystart, int myend)
chunk = (N + nthreads - 1) / nthreads;
{ int i;
mystart = iam * chunk + 1;
for (i=mystart; i<=myend; i++)
temp = (iam+1) * chunk;
{ A[i] = i;
myend = (temp <= N) ? temp : N;
B[i] = A[i] *2.3;
mywork(mystart, myend);
}
}
}
for(i=1; i<=100; i++) printf(" %7d ",A[i]);
printf("\n");
}

Ruud Van Der Pas - Eric Stotzer - Christian Terboven - Using Openmp - The Next Step - Affinity, Accelerators, Tasking, and Simd (2017, Mit Press) PDF
No ratings yet
Ruud Van Der Pas - Eric Stotzer - Christian Terboven - Using Openmp - The Next Step - Affinity, Accelerators, Tasking, and Simd (2017, Mit Press) PDF
381 pages
Numtech PDF
No ratings yet
Numtech PDF
167 pages
Excel Odev
No ratings yet
Excel Odev
13 pages
CCD II Basic Alcatel Contact Center - CCD Basic
0% (1)
CCD II Basic Alcatel Contact Center - CCD Basic
350 pages
Class - 4 Computer Text Book
50% (2)
Class - 4 Computer Text Book
67 pages
Basics of Python Programming: A Quick Guide for Beginners
From Everand
Basics of Python Programming: A Quick Guide for Beginners
Krishna Kumar Mohbey
No ratings yet
TONEX ONE User Manual
No ratings yet
TONEX ONE User Manual
34 pages
DHTML: Dynamic and Interactive Web Sites
No ratings yet
DHTML: Dynamic and Interactive Web Sites
23 pages
Mastering Python
From Everand
Mastering Python
Rick van Hattem
No ratings yet
EQQI1 MST
No ratings yet
EQQI1 MST
256 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Hacking Wpa 2 Evil Twin Method
No ratings yet
Hacking Wpa 2 Evil Twin Method
48 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
2008 CBP W Remarks
100% (1)
2008 CBP W Remarks
36 pages
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
CRGC Mcore PDF
No ratings yet
CRGC Mcore PDF
124 pages
Fanuc Series 0i Model A CNC Battery
No ratings yet
Fanuc Series 0i Model A CNC Battery
6 pages
L09-openmp for第13周
No ratings yet
L09-openmp for第13周
119 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Openmp
No ratings yet
Openmp
115 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
pp12 PDF
No ratings yet
pp12 PDF
70 pages
PFC SilviaPascual
No ratings yet
PFC SilviaPascual
77 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Openmp
No ratings yet
Openmp
61 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
How To Read ArcObject Diagram
100% (1)
How To Read ArcObject Diagram
11 pages
Computer Security: A Machine Learning Approach: Sandeep V. Sabnani
No ratings yet
Computer Security: A Machine Learning Approach: Sandeep V. Sabnani
62 pages
ColorImages PDF
No ratings yet
ColorImages PDF
59 pages
Introduction To Junos and Cli
No ratings yet
Introduction To Junos and Cli
62 pages
Lecture - 06 (Shared Memory Programming With OpenMP)
No ratings yet
Lecture - 06 (Shared Memory Programming With OpenMP)
65 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
OpenMP-More Directives
No ratings yet
OpenMP-More Directives
44 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
OPENMP
No ratings yet
OPENMP
37 pages
Electronics & Instrumentation Engineering 23 March 2021
No ratings yet
Electronics & Instrumentation Engineering 23 March 2021
33 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
WRCJ 4 Dez
No ratings yet
WRCJ 4 Dez
42 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
BlackBerry - Global - Threat - Intelligence - Report 2023
No ratings yet
BlackBerry - Global - Threat - Intelligence - Report 2023
36 pages
Chatterjee L26
No ratings yet
Chatterjee L26
42 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
OpenMP 3
No ratings yet
OpenMP 3
26 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Azizul Azri Bin Mustaffa - PEC12-60
No ratings yet
Azizul Azri Bin Mustaffa - PEC12-60
36 pages
Openmp 2
No ratings yet
Openmp 2
25 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
Seguridad
No ratings yet
Seguridad
29 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Parallel Programming 2
No ratings yet
Parallel Programming 2
20 pages
Openmp: Martin Kruliš Ji Ří Dokulil
No ratings yet
Openmp: Martin Kruliš Ji Ří Dokulil
38 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Open MP
No ratings yet
Open MP
35 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Microc/Os-Ii The Real-Time Kernel
No ratings yet
Microc/Os-Ii The Real-Time Kernel
29 pages
Numerical Recipes in F 90
No ratings yet
Numerical Recipes in F 90
20 pages
A Comparison of Co-Array Fortran and Openmp Fortran For SPMD Programming
No ratings yet
A Comparison of Co-Array Fortran and Openmp Fortran For SPMD Programming
20 pages
Code Script Tampermonkey
No ratings yet
Code Script Tampermonkey
13 pages
TLC0834C, TLC0834I, TLC0838C, TLC0838I 8-Bit Analog-To-Digital Converters With Serial Control
No ratings yet
TLC0834C, TLC0834I, TLC0838C, TLC0838I 8-Bit Analog-To-Digital Converters With Serial Control
14 pages
3unit3 Mca Pecnotes
No ratings yet
3unit3 Mca Pecnotes
23 pages
Open MP
No ratings yet
Open MP
30 pages
Unit III
No ratings yet
Unit III
15 pages
Unit 3
No ratings yet
Unit 3
13 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
A Subset Feature Elimination Mechanism For Intrusion Detection System
No ratings yet
A Subset Feature Elimination Mechanism For Intrusion Detection System
10 pages
Fortran 95 Openmp Directives
No ratings yet
Fortran 95 Openmp Directives
12 pages
Semi Supervised
No ratings yet
Semi Supervised
13 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
Implementation and Analysis of Combined Machine Learning Method For Intrusion Detection System
No ratings yet
Implementation and Analysis of Combined Machine Learning Method For Intrusion Detection System
10 pages
Group Assignment 1 - Group Lab Activity I
No ratings yet
Group Assignment 1 - Group Lab Activity I
8 pages
Session 21 Big Data Visualization
No ratings yet
Session 21 Big Data Visualization
9 pages
Iot Lab
No ratings yet
Iot Lab
11 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
Toward Generating A New Intrusion Detection Dataset and Intrusion Traffic Characterization
No ratings yet
Toward Generating A New Intrusion Detection Dataset and Intrusion Traffic Characterization
9 pages
Combined Form
No ratings yet
Combined Form
7 pages
Sui2017 PDF
No ratings yet
Sui2017 PDF
8 pages
Seguridad ML
No ratings yet
Seguridad ML
7 pages
Seguridad
No ratings yet
Seguridad
6 pages
02.PB Python Conditional Statements Lab
No ratings yet
02.PB Python Conditional Statements Lab
4 pages
Ring Tone Issue
No ratings yet
Ring Tone Issue
4 pages
FDD and TDD PDF
No ratings yet
FDD and TDD PDF
5 pages
Composable Multi-Threading For Python Libraries: Hsutter Wtichy
No ratings yet
Composable Multi-Threading For Python Libraries: Hsutter Wtichy
5 pages
Openmp Programming: Aiichiro Nakano
No ratings yet
Openmp Programming: Aiichiro Nakano
10 pages
List of Public Functions
No ratings yet
List of Public Functions
2 pages
Tree-Based Methods: Chaid: Categorical Response Variable Categorical Explanatory Variabales Create A Decision Tree
No ratings yet
Tree-Based Methods: Chaid: Categorical Response Variable Categorical Explanatory Variabales Create A Decision Tree
6 pages
Onlin Code Jugment System
No ratings yet
Onlin Code Jugment System
3 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Network Escalation Procedure V 1.0
No ratings yet
Network Escalation Procedure V 1.0
4 pages
Tr17 11 Ritschel TKS
No ratings yet
Tr17 11 Ritschel TKS
53 pages
IMSL Fortran
No ratings yet
IMSL Fortran
151 pages
T. Emmanuel
No ratings yet
T. Emmanuel
1 page
OpenMP Reference
No ratings yet
OpenMP Reference
2 pages

Num Tech

Uploaded by

Num Tech

Uploaded by

Parallel Computing

Possible clauses in [clause list]

• A sample OpenMP program along with its Pthreads

a,b,c are shared

• Compiler sets up a private copy of each

Swaps the values of a and b

Removes dependence on tmp

for (i=0; i<n; i++) {

Requires sequential program change

So, tmp is local to each thread

A firstprivate variable is private, and the private

A lastprivate variable is private, and the thread

Sets all elements of a to the value of the

Sets all elements of a to the value of the

For good performance, OpenMP code should use

However, this could waste a lot of memory

• static schedule maps iterations to threads at compile time

• Default: worksharing for loops end with

any thread can begin second loop immediately without

parallel section encloses all parallel work

Similar to Pthreads mutex locks

critical section: like a named lock

#pragma omp parallel

#pragma omp parallel

Thread count and identity

• SPMD (Single Program Multiple Data)

#pragma omp threadprivate(mystart, myend)

#pragma omp parallel private(iam, nthreads, chunk)

You might also like