0% found this document useful (0 votes)
8 views27 pages

Lecture 7-OpenMP-Basics

The document outlines key concepts in high-performance computing and parallel programming, focusing on processes, threads, and the OpenMP directives used for parallel execution. It details the fork-join execution model, data scope attributes, and how to manage thread creation and data sharing in OpenMP. Additionally, it emphasizes the importance of structured code blocks for efficient parallelization and addresses potential issues like race conditions.

Uploaded by

roarsomebros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views27 pages

Lecture 7-OpenMP-Basics

The document outlines key concepts in high-performance computing and parallel programming, focusing on processes, threads, and the OpenMP directives used for parallel execution. It details the fork-join execution model, data scope attributes, and how to manage thread creation and data sharing in OpenMP. Additionally, it emphasizes the importance of structured code blocks for efficient parallelization and addresses potential issues like race conditions.

Uploaded by

roarsomebros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Applied High-Performance Computing and Parallel

Programming

Presenter: Liangqiong Qu

Assistant Professor

The University of Hong Kong


Administration

• Send your UID, Email Address, Name for HPC account generation for our
assignment.

- Send UID, email address (with HKU email), and name to Mr. Changbing Zhang
[email protected] for account open before Feb. 21, Friday, 11:59PM!

- No late submissions will not be accepted!


Review of Lecture 6 – Thread and Process
▪ Process: Process is an independent program which run in its own memory space
and has its own resources. Breaking it down, we can say it is the running program that
consists of the program code, data and the current activity. It’s an isolated program that
means the process don’t interfere each other.
▪ Thread: It is the smallest possible unit of execution that lies inside of the process
and can be in groups inside of a process and can be managed individually by OS
Scheduler. Every process mandatorily have a main thread which run program
sequentially one after another. As long as we have a single active thread within a
process, this process is alive. Multiple threads can exist within one process and share
its resources.
Review of Lecture 6: Fork-Join Execution Model
Program start: one process (master thread) running. The
master thread executes sequentially until the first parallel
region construct is encountered.

Parallel region: (1) When a parallel region is encountered,


master thread – Create a group of threads by FORK. –
Becomes the master of this group of threads, and is assigned
the thread id 0 within the group. (2) The statement in the
program that are enclosed by the parallel region construct are
then executed in parallel among these threads (3) JOIN:
When the threads complete executing the statement in the
parallel region construct, they synchronize and terminate,
leaving only the master thread.

Serial region: Only the master thread executes


Thread # 0 1 2 3 4
Review of Lecture 6: OpenMP Directives
▪ A series of directives and clauses in OpenMP identify code blocks as parallel regions.
Programmers only need to insert these directives into the code, so OpenMP is defined as
a kind of directive based language.

▪ In C/C++, the directive is based on the sentinel (#pragma omp) construct, and the basic
format of OpenMP directive in C/C++ is as follows (Compiler directive):
#pragma omp directive-name [clause[ [,] clause]...]
structured block of code
• The #pragma omp is a so-called sentinel, which starts an OpenMP directive

#pragma omp parallel private(beta, pi)

Sentinel Directive-name Clause


Required (Required) (Optional)
Review of Lecture 6: What is the Structed Block of Code in OpenMP Directives?

• A “structured block of code” refers to a block of code that has a single entry point and a
single exit point. This means the code block must be well-defined and cannot have jumps
(like goto, break, or return) that cause the control flow to exit the block prematurely.

• Structured blocks are


easier for the compiler
to parallelize and
optimize. Unstructured
code complicates the
compiler’s job and can
lead to inefficient or
incorrect parallel
execution.

❌ All the above are not a structed block of code due to multiple entrys/exits.
Review of Lecture 6: OpenMP Directives

▪ In C/C++, the directive is based on the sentinel (#pragma omp) construct, and the basic
format of OpenMP directive in C/C++ is as follows (Compiler directive):
#pragma omp directive-name [clause[ [,] clause]...]
structured block of code
• The #pragma omp is a so-called sentinel, which starts an OpenMP directive

#pragma omp parallel: sentinel


derivative starts an OpenMP
parallel region

Structured block of code is the


section of the program that is
affected by the previous directive
Review of Lecture 6: Set Number of Threads
▪ Several ways to set the number of threads in OpenMP
• Using environment variables: OMP_NUM_THREADS=…
• Linux: From within a shell, global setting of the number of threads:
export OMP_NUM_THREADS=4
• From within a shell, one-time setting of the number of threads:
OMP_NUM_THREADS=4 ./a.out

• Use num_threads clause: add num_threads(num) to the parallel construct

• Use the omp_set_num_threads() function in the code to set the number of threads.

• The ordering usually follows as: OMP_NUM_THREADS environment<


omp_set_num_threads() function < num_threads clause
Review of Lecture 6: Get Number of Threads
▪ omp_get_num_threads function returns the number of threads in the current team.
• Syntax: int omp_get_num_threads(void);
The omp_get_num_threads routine returns the number of threads in the team that is
executing the parallel region to which the routine region binds. If called from the
sequential part of a program, this routine returns 1.

▪ omp_get_thread_num routine returns the thread number, within the current team, of the
calling thread. Threads are numbered from 0 (master thread) to N-1.
• Syntax: int omp_get_thread_num(void);

▪ omp_get_max_threads routine returns an upper bound on the number of threads that


could be used to form a new team if a parallel construct without a num_threads clause
were encountered after execution returns from this routine.
▪ Syntax: int omp_get_max_threads(void);
Review of Lecture 6: Data Scope
Data in a parallel region can be:
▪ private to each executing thread
• each thread has its own local copy
of data
• modifications to the private variable
will not affect the private copies of
other threads.

▪ shared between threads


• there is only one instance of data
available to all threads
• this does not mean that the instance
is always visible to all threads!
Review of Lecture 6: Shared vs. Private Data
▪ If you do not explicitly specify the shared or private
attribute of a variable, OpenMP will automatically infer
void f() {
it based on its rules:
int a;
float x,y;
... • Because OpenMP is based upon the shared
#pragma omp parallel memory programming model, most variables are
{ shared by default. Thus by default: All data in
int i; OpenMP including a parallel region is shared.
float y; //masked shared y
... • Exceptions (the following is private):
} • Non-static variables within parallel region are
} private
• Loop variables of parallel loops are private
Review of Lecture 6: Race Condition
▪ What happens if a variable is unintentionally shared?
• Nothing if it is just read
• Possibly hazardous if at least one thread writes to it

• Race condition: Multiple threads are


accessing and modifying the shared
variable A concurrently, there maybe race
conditions.

• Race conditions can lead to unpredictable


results, as the order of execution and
updates to `A` by different threads is not
guaranteed.

• We will learn how to avoid the race condition


next lecture.
Review of Lecture 6: Race Condition
▪ What happens if a variable is unintentionally shared?
• Nothing if it is just read
• Possibly hazardous if at least one thread writes to it

Different run get different results! The results are not


guaranteed!
Review of Lecture 6: Data Scope Shared vs. Private Data
▪ Because OpenMP is based upon the shared memory programming model, most variables
are shared by default. If you do not explicitly specify the shared or private attribute of a
variable, OpenMP will automatically infer it based on its rules.
▪ We can also use OpenMP Data Scope Attribute Clauses to explicitly define how
variables should be scoped. They include:
• PRIVATE
• FIRSTPRIVATE
• LASTPRIVATE
• SHARED

• Data Scope Attribute Clauses are used in conjunction with several directives
(PARALLEL, DO/for, and SECTIONS) to control the scoping of enclosed variables.
• Data Scope Attribute Clauses are effective only within their lexical/static extent,
meaning they are only applicable within the specific scope they are defined in.

#pragma omp parallel private(a) shared(b)


Review of Lecture 6: OpenMP Data Scope Attribute Clauses Private
• PRIVATE: private(var) creates a local copy of var for each thread. A new uninitialized
instance is created for each thread.
• The value is uninitialized
• Private copy is not storage-associated with the same variable outside the region


Review of Lecture 6: OpenMP Data Scope Attribute Clauses Private
• PRIVATE: private(var) creates a local copy of var for each thread. A new uninitialized
instance is created for each thread.
• The value is uninitialized
• Private copy is not storage-associated with the same variable outside the region
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Firstprivate Clause

• firstprivate: firstprivate is a special case of private. firstprivate(var) also creates a local


copy of var for each thread, but initializes each private copy with the corresponding value
from the master thread. The firstprivate is destroyed outside the region.
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Firstprivate Clause

• Firstprivate: firstprivate is a special case of private. firstprivate(var) also creates a local


copy of var for each thread, but initializes each private copy with the corresponding value
from the master thread. The firstprivate is destroyed outside the region.
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Lastprivate Clause

▪ Lastprivate: lastprivate is a special case of private. The lastprivate(var) clause var to be


private on each thread, and causes the corresponding original list item on the master thread
to be updated after the end of the region. Note that lastprivate var remain uninitialized.

• If you use the lastprivate clause


with PARALLEL FOR, the last value
is the value of the variable after the last
sequential iteration of the loop.

• If the last iteration of the loop or last


section of the construct does not define
a lastprivate variable, the variable is The shared value is
undefined after the loop or construct. set to the value of
the variable after
the last sequential
iteration of the loop.
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Lastprivate Clause

• lastprivate clause is uninitiated, random initialization if not


specified.
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Lastprivate Clause

• If you use the lastprivate clause


with parallel, the last value is the value
of the variable after the last sequential
iteration of the loop.

What we will get from printf function here?


Review of Lecture 6: OpenMP Data Scope Attribute Clauses Lastprivate Clause

• If the last iteration of the loop or last


section of the construct does not define
a lastprivate variable, the variable is
undefined after the loop or construct.
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Lastprivate Clause

• lastprivate clause is uninitiated, random initialization if not


specified.

Question: How to initialize A in the for loop with the corresponding


value in line 69 from the master thread? And also update the last value
A in the for loop to the master thread A value in line 79?
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Lastprivate Clause

• If you want to initialize and assign values to a variable upon


entry and exit, respectively, you can use both
`firstprivate` and `lastprivate` for the same variable.
Review of Lecture 6: OpenMP Data Scope Attribute Clauses Lastprivate Clause

• If you want to initialize and assign values to a variable upon


entry and exit, respectively, you can use both
`firstprivate` and `lastprivate` for the same variable.

Question: Right now the A is updated as


expectation? Why?
Review of Lecture 6: OpenMP Data Scope Attribute Clauses

• Summarization: `firstprivate` and `lastprivate` clauses are used to respectively


initialize private copies of variables on entry and assign their values back to shared
variables on exit. The `private` clause creates private copies of variables not
associated with shared ones.

• If you want to initialize and assign values to a variable upon entry and exit,
respectively, you can use both `firstprivate` and `lastprivate` for the same
variable.

• But you cannot use the `private` clause and the `firstprivate`/`lastprivate` clauses
together for the same variable.
Thank you very much for choosing this course!

Give us your feedback!

https://forms.gle/zDdrPGCkN7ef3UG5A

28

You might also like