0% found this document useful (0 votes)

34 views23 pages

A Tutorial On Parallel Computing On Shared Memory Systems

This document provides an overview of parallel computing on shared memory systems using OpenMP. It discusses what OpenMP is, its history and components. OpenMP uses a fork-join model of parallelism where the master thread creates a team of threads to execute parallel regions. The document describes OpenMP directives, runtime routines, environment variables, and how it handles data scoping. It also discusses work sharing constructs like parallel loops and sections. Features like scheduling, critical sections, and potential problems with parallelizations are covered at a high level.

Uploaded by

Swadhin Satapathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views23 pages

A Tutorial On Parallel Computing On Shared Memory Systems

Uploaded by

Swadhin Satapathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

A tutorial on parallel computing

on shared memory systems

OpenMP, Compiler level parallelization and Fortran 2008 co-

arrays
What OpenMP is (source https://computing.llnl.gov/tutorials/openMP/)
OpenMP Is:
•An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
•Comprised of three primary API components:
•Compiler Directives
•Runtime Library Routines
•Environment Variables

OpenMP Is Not:

•Meant for distributed memory parallel systems (by itself)

•Necessarily implemented identically by all vendors
•Guaranteed to make the most efficient use of shared memory
•Required to check for data dependencies, data conflicts, race conditions, deadlocks, or code sequences that cause a program to be
classified as non-conforming
•Designed to handle parallel I/O. The programmer is responsible for synchronizing input and output.

History:

•In the early 90's, vendors of shared-memory machines supplied similar, directive-based, Fortran programming extensions.
•The user would augment a serial Fortran program with directives specifying which loops were to be parallelized
•The compiler would be responsible for automatically parallelizing such loops across the SM processors. Implementations were all
functionally similar but were diverging.
•First attempt at a standard ANSI X3H5 in 1994. It was never adopted, since distributed memory machines became popular.
•However, not long after this, newer shared memory machine architectures started to become prevalent, and interest resumed.
•The OpenMP standard specification started in the spring of 1997, taking over where ANSI X3H5 had left off.
Shared Memory Model
• OpenMP is designed for multi-processor/core, shared memory
machines. The underlying architecture can be shared memory Uniform
Memory Access or Non-uniform Memory Access.

Uniform Memory Access (UMA) Non-uniform Memory Access (NUMA)

• The idealized use of OpenMP in HPC on modern clusters is at
increasing node-level efficiency. These are called hybrid codes.
OpenMP - MPI

Open MP parallelization in each node

MPI parallelization between nodes

Hybrid

Open MP parallelization in each node

Open MP Parallelization model
• OpenMP fork-join model of thread-level parallelism

• All OpenMP programs begin on a single master thread.

• Given a suitable OpenMP directive to create a parallel region, the master thread then
creates a team of parallel threads through a FORK.
• The statements in the program within the parallel region construct are executed in
parallel among the threads.
• JOIN: When the team threads complete the statements in the parallel region construct,
they synchronize and terminate, leaving only the master thread.
• The number of parallel regions and the threads that comprise them are arbitrary.
Components of OpenMP
• The OpenMP API is comprised of three distinct components:
1) Compiler Directives
2) Runtime Library Routines
3) Environment Variables
• Compiler directives are for:
1) Spawning a parallel region
2) Dividing blocks of code among threads
3) Distributing loop iterations between threads
4) Serializing sections of code
5) Synchronization of work among threads
sentinel directive-name [clause, ...]

• Syntax: !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(BETA,PI)

Run-time Library Routines and
Environment variables
• Run time library routines (vendor specific), among other
things, are used to:
1) Set number of threads, inquire the number of active
threads
2) querying a thread's unique identifier (thread ID), a
thread's ancestor's identifier, the thread team size.
3) Querying if in a parallel region, and at what level.
• Environment variables, among other things, are used to :
1) Set the number of threads
2) Specify how loop interactions are divided
3) Bind threads to processors
OpenMP code structure and data scoping
PROGRAM HELLO

INTEGER VAR1, VAR2, VAR3 Serial code . . .

Data Scoping:
Beginning of parallel region.

Fork a team of threads.

• Because OpenMP is a shared memory
Specify variable scoping
!$OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3)
programming model, most data within a
parallel region is shared by default.
Parallel region executed by all threads

Other OpenMP directives

• All threads in a parallel region can
Run-time Library calls
access this shared data simultaneously.
All threads join master thread and disband • OpenMP provides a way for the
!$OMP END PARALLEL
programmer to explicitly specify how
Resume serial code . . .
data is "scoped" if the default shared
scoping is not desired.
END
General rules for OpenMP directives
• Comments can not appear on the same line as a directive
• Only one directive-name may be specified per directive
• Fortran compilers which are OpenMP enabled generally include a
command line option which instructs the compiler to activate and
interpret all OpenMP directives.
• Several Fortran OpenMP directives come in pairs and have the form
shown below. The "end" directive is optional but advised for readability.
!$OMP directive

[ structured block of code ]

!$OMP end directive

Features of OpenMP parallelism
PROGRAM TEST ... SUBROUTINE SUB1 ...
!$OMP PARALLEL ...
!$OMP CRITICAL ...
!$OMP DO
!$OMP END CRITICAL
DO I=...
... CALL SUB1 END
... ENDDO
SUBROUTINE SUB2 ...
!$OMP END DO
!$OMP SECTIONS ...
... CALL SUB2 ...
!$OMP END SECTIONS ...
!$OMP END PARALLEL
END
STATIC EXTENT ORPHANED DIRECTIVES
The DO directive occurs within an enclosing parallel The CRITICAL and SECTIONS directives occur outside
region an enclosing parallel region
DYNAMIC EXTENT
The CRITICAL and SECTIONS directives occur within the dynamic extent of the DO and PARALLEL directives.
Example 2: Work sharing constructs
SECTIONS - breaks work into
separate, discrete sections. Each
DO / for - shares iterations of a loop section is executed by a thread. SINGLE - serializes a
across the team. Represents a type of Can be used to implement a type of section of code
"data parallelism" "functional parallelism".
Scheduling in OpenMP
• SCHEDULE: Describes how iterations of the loop are divided among the
threads in the team. The default schedule is implementation dependent.
The goal is to keep all the threads equally busy for all the time.
• STATIC: Loop iterations are divided into pieces of size chunk and then
statically assigned to threads. If chunk is not specified, the iterations are
evenly (if possible) divided contiguously among the threads.
• A static schedule is good if the work you have can be divided up pretty
evenly (e.g., so all the processors you are using are busy for about the
same amount of time). Most of the overhead work of static is done at
compile time.
Scheduling in OpenMP
• DYNAMIC: Loop iterations are divided into pieces of size chunk, and
dynamically scheduled among the threads; when a thread finishes one
chunk, it is dynamically assigned another. The default chunk size is 1.
• Dynamic is good if your workload is uneven, since one or more
processors might be busy for a long time. It allows the processors with
small workloads to go after other chunks of work and hopefully balance
the work out between processors. It has a larger overhead at runtime,
since work has to be taken off of a queue.
Scheduling in OpenMP
GUIDED: Iterations are dynamically assigned to threads in blocks as
threads request them until no blocks remain to be assigned. Similar to
DYNAMIC except that the block size decreases each time a parcel of work
is given to a thread.
The size of the initial block is proportional to: number_of_iterations / number_of_threads
Subsequent blocks are proportional to number_of_iterations_remaining / number_of_threads
The chunk parameter defines the minimum block size. The default chunk
size is 1.
Note: compilers differ in how GUIDED is implemented as shown in the
"Guided A" and "Guided B" examples below.
Like DYNAMIC, good when
you know the exact loads and
job sizes are uneven.
Overhead at runtime.
Problems with OpenMP parallelizations
• In addition to usual, “serial” bugs, parallel programs can have
“parallel-only” bugs, such as
• Race conditions - When results depend on specific ordering of
commands, which is not enforced.
• Deadlocks - When task(s) wait perpetually for a message/signal
which never come
OpenMP parallelization may lead to race
conditions
• Sometimes, order of instructions across threads creates problems
with shared variables.
• Imagine the sum loop:
!$OMP PARALLEL DO
      DO I = 1, N
        SUM = SUM + (A(I) * B(I))
      ENDDO
!$OMP END PARALLEL DO
Note that SUM is a shared variable. Multiple threads might want to
update its value at the same time. What then?
This is called a race condition.
OpenMP reduction clause: One solution for
race
• Automatic reduction of loop variables to local copies.

• Structure of reduced loops:

!$omp parallel shared(n) private(i,j,prime)
!$omp do reduction (+:sum)
…Code body…
!$omp end do
!$omp end parallel
Other solutions: Synchronization
constructs
• CRITICAL directive:
1) The CRITICAL directive specifies a region of code that must
be executed by only one thread at a time.
2) If a thread is currently executing inside a CRITICAL region
and another thread reaches that CRITICAL region and
attempts to execute it, it will block until the first thread exits
that CRITICAL region.
SINGLE, MASTER, BARRIER: examples
Some practical examples
• A matrix multiplication code: naïve parallelization example.
• A more generic approach for solutions on shared and
distributed memory systems – domain decomposition:

Domain decomposition has many fancy

definitions. To me, it is simply block operations
on matrices – many practical problems in
numerical computation can be reduced to this
simple goal.
Pseudo-code for block matrix
multiplication
• Here is the pseudo-code (the code is not specific to any
language, just explains logic)
Strengths?
1) Easily vectorizable (SIMD upscaling).
2) One, non-data dependent, loop.
3) Only bottle neck is data transmission.
Domain-decomposition for parallel linear
solvers
• The class of methods that we have studied so far are called
direct solvers – they are guaranteed to succeed in a fixed
number of steps.
• However, for special matrices which are either sparse or
sparse-like (for example, extremely diagonally dominant) can be
more efficiently solved by generic methods which solve the
problem iteratively.

such that we ultimately have .

Two common iterative solvers
•Central
idea: . Then use and for Jacobi
Jacobi:

Gauss-Siedel:

What are P and N?

Convergence of iterative methods
• The
iterative methods we will study are guaranteed to converge if
the underlying matrix is diagonally dominant.
• Let’s denote the iteration matrix by , the error in the th step is .
Then, the iteration on the error is:
.
This is convergent only if the spectral radius of <1.
In Jacobi, . Therefore convergence is guaranteed for diagonally
dominant matrices! Example of diagonally dominant matrices – our
FD matrices, they are sparse and diagonally dominant usually!
Let’s see a naïve parallelization.
Domain decomposed Jacobi

Jacobi on sparse blocks

PDE on blocks, then block Jacobi on each

block! Nested parallelization ideal for hybrid
MPI-OpenMP codes.

Pioneer - Avh-X5800bhs - Avh-X5800dab - Service Manual
100% (1)
Pioneer - Avh-X5800bhs - Avh-X5800dab - Service Manual
109 pages
Parallel Programming Using OpenMP
No ratings yet
Parallel Programming Using OpenMP
76 pages
OpenMP Tutorial - Lawrence Livermore National Laboratory
No ratings yet
OpenMP Tutorial - Lawrence Livermore National Laboratory
75 pages
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
No ratings yet
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
62 pages
Openmp
No ratings yet
Openmp
95 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
OpenMP - Reference Book
No ratings yet
OpenMP - Reference Book
59 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
Openmp
No ratings yet
Openmp
61 pages
OpenMP-More Directives
No ratings yet
OpenMP-More Directives
44 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Open MP
No ratings yet
Open MP
30 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Open MP
No ratings yet
Open MP
28 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Shared Memory Parallel Programming: Introduction To Openmp
No ratings yet
Shared Memory Parallel Programming: Introduction To Openmp
39 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Updated - CS8083 MCP UNIT III Notes
No ratings yet
Updated - CS8083 MCP UNIT III Notes
26 pages
Num Tech
No ratings yet
Num Tech
39 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Open MP
No ratings yet
Open MP
35 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
PDR Toolkit 2 7 (February 2015) Ejemplo
No ratings yet
PDR Toolkit 2 7 (February 2015) Ejemplo
91 pages
Unit 3
No ratings yet
Unit 3
13 pages
09 OpenMP Intro
No ratings yet
09 OpenMP Intro
15 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Unit III
No ratings yet
Unit III
15 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Openmp: John H. Osorio Ríos
No ratings yet
Openmp: John H. Osorio Ríos
24 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
No ratings yet
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
10 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Openmp
No ratings yet
Openmp
21 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
전자회로 3장 솔루션
No ratings yet
전자회로 3장 솔루션
50 pages
4.OpenMP Done
No ratings yet
4.OpenMP Done
3 pages
PowerMacs4000 Atlas Copco 2018
No ratings yet
PowerMacs4000 Atlas Copco 2018
11 pages
Network Security & Cryptography - Unit-1
No ratings yet
Network Security & Cryptography - Unit-1
11 pages
GRC - SAP Cloud Identity Access Governance
No ratings yet
GRC - SAP Cloud Identity Access Governance
8 pages
Meta-357Tv2-Operating Instructions
No ratings yet
Meta-357Tv2-Operating Instructions
1 page
Technical Proposal (Vehicle Tracking Only)
No ratings yet
Technical Proposal (Vehicle Tracking Only)
17 pages
Smart Home Energy Management System A Review
No ratings yet
Smart Home Energy Management System A Review
27 pages
Loran
No ratings yet
Loran
18 pages
DOS Commands
No ratings yet
DOS Commands
34 pages
Yamaha Psr-E243, Ypt-240 Keyboard User Manual
No ratings yet
Yamaha Psr-E243, Ypt-240 Keyboard User Manual
44 pages
Oracle Bea Weblogic Server Overview Topology, Configuration and Administration
No ratings yet
Oracle Bea Weblogic Server Overview Topology, Configuration and Administration
38 pages
Customer Relationship Marketing: Apple-Iphone
No ratings yet
Customer Relationship Marketing: Apple-Iphone
7 pages
Wjec Ict Coursework Examples
100% (2)
Wjec Ict Coursework Examples
7 pages
Recuperacion APSW
No ratings yet
Recuperacion APSW
29 pages
Twentieth Century Harmony
No ratings yet
Twentieth Century Harmony
90 pages
Product Manual For RTSO-6001B-V1.2
No ratings yet
Product Manual For RTSO-6001B-V1.2
26 pages
Programming Assignment
No ratings yet
Programming Assignment
22 pages
8basics of LED
No ratings yet
8basics of LED
7 pages
Analysis of The Influence of Digital Marketing STR
No ratings yet
Analysis of The Influence of Digital Marketing STR
9 pages
Multitenant Metering - EM4800
No ratings yet
Multitenant Metering - EM4800
4 pages
WYOMING - Tier II Reporting Instructions
No ratings yet
WYOMING - Tier II Reporting Instructions
8 pages
GoSkills Microsoft Excel Shortcuts
No ratings yet
GoSkills Microsoft Excel Shortcuts
2 pages
Indra Product - GIS
No ratings yet
Indra Product - GIS
6 pages
373-Assignment Webeng
No ratings yet
373-Assignment Webeng
8 pages
Data Egineering Simplified Cheat Sheet 2023 06 03
No ratings yet
Data Egineering Simplified Cheat Sheet 2023 06 03
2 pages
Rules and Regulations
No ratings yet
Rules and Regulations
5 pages
Tutorial 6-Electromagnetism and Electronics
No ratings yet
Tutorial 6-Electromagnetism and Electronics
3 pages
ICT 10 WHLP Done
No ratings yet
ICT 10 WHLP Done
2 pages

A Tutorial On Parallel Computing On Shared Memory Systems

Uploaded by

A Tutorial On Parallel Computing On Shared Memory Systems

Uploaded by

A tutorial on parallel computing

on shared memory systems

OpenMP, Compiler level parallelization and Fortran 2008 co-

•Meant for distributed memory parallel systems (by itself)

Uniform Memory Access (UMA) Non-uniform Memory Access (NUMA)

Open MP parallelization in each node

MPI parallelization between nodes

Open MP parallelization in each node

• All OpenMP programs begin on a single master thread.

• Syntax: !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(BETA,PI)

INTEGER VAR1, VAR2, VAR3 Serial code . . .

Fork a team of threads.

Other OpenMP directives

[ structured block of code ]

!$OMP end directive

• Structure of reduced loops:

Domain decomposition has many fancy

such that we ultimately have .

What are P and N?

Jacobi on sparse blocks

PDE on blocks, then block Jacobi on each

You might also like