0% found this document useful (0 votes)

86 views28 pages

Parallel Computing Platforms: Chieh-Sen (Jason) Huang

The document discusses parallel computing platforms and their communication models. There are two main types of parallel platforms - those with an explicitly shared memory and those that use message passing. Shared memory platforms can have uniform or non-uniform memory access. Message passing platforms have distributed memory and processors communicate via send and receive operations. The cost of communication is an important consideration and depends on factors like startup time, per-hop time, and data transfer time. MPI is an example of an interface for message passing platforms.

Uploaded by

ZESTY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views28 pages

Parallel Computing Platforms: Chieh-Sen (Jason) Huang

Uploaded by

ZESTY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Parallel Computing Platforms

Chieh-Sen (Jason) Huang

Department of Applied Mathematics

National Sun Yat-sen University

Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin

Kumar for providing slides.
Topic Overview

• Dichotomy of Parallel Computing Platforms

• Communication Costs in Parallel Machines

• The first example of MPI: the Message Passing Interface

Explicitly Parallel Platforms
Dichotomy of Parallel Computing Platforms

• An explicitly parallel program must specify concurrency and

interaction between concurrent subtasks.

• The former is sometimes also referred to as the control structure

and the latter as the communication model.
Control Structure of Parallel Programs

• Parallelism can be expressed at various levels of granularity –

from instruction level to processes.

• Between these extremes exist a range of models, along with

corresponding architectural support.
Control Structure of Parallel Programs

• Processing units in parallel computers either operate under

the centralized control of a single control unit or work
independently.

• If there is a single control unit that dispatches the same

instruction to various processors (that work on different data),
the model is referred to as single instruction stream, multiple
data stream (SIMD).

• If each processor has its own control control unit, each

processor can execute different instructions on different data
items. This model is called multiple instruction stream, multiple
data stream (MIMD).
SIMD and MIMD Processors
PE: Processing Element

PE
+

INTERCONNECTION NETWORK

INTERCONNECTION NETWORK
PE control unit

PE PE
+
control unit
PE
Global
control
unit
PE
+
PE control unit

PE PE
+
control unit

(a) (b)

A typical SIMD architecture (a) and a typical MIMD architecture

(b).
SIMD Processors
• Some of the earliest parallel computers such as the Illiac IV,
MPP, DAP, CM-2, and MasPar MP-1 belonged to this class of
machines.

• SIMD relies on the regular structure of computations (such as

those in image processing).

• CUDA programs compile to the PTX instruction set. That

instruction set does not contain SIMD instructions. So,
CUDA programs cannot make explicit use of SIMD. Individual
threads are part of groups called warps, within which every
thread executes exactly the same sequence of instructions.
Nvidia call it Single Instruction, Multiple Thread (SIMT), but it’s
essentially SIMD.

Parallel Thread Execution (PTX, or NVPTX) is a pseudo-assembly language used

in Nvidia’s CUDA programming environment. The nvcc compiler translates
code written in CUDA, a C++-like language, into PTX, and the graphics driver
contains a compiler which translates the PTX into a binary code which can be
run on the processing cores.
MIMD Processors

• In contrast to SIMD processors, MIMD processors can execute

different programs on different processors.

• A variant of this, called single program multiple data streams

(SPMD) executes the same program on different processors.

• It is easy to see that SPMD and MIMD are closely related in

terms of programming flexibility and underlying architectural
support.

• Examples of such platforms include current generation Sun

Ultra Servers, SGI Origin Servers, multiprocessor PCs, workstation
clusters, and the IBM SP.
SIMD-MIMD Comparison

• SIMD computers require less hardware than MIMD computers

(single control unit).

• However, since SIMD processors are specially designed, they

tend to be expensive and have long design cycles.

• Not all applications are naturally suited to SIMD processors.

• In contrast, platforms supporting the SPMD paradigm can be

built from inexpensive off-the-shelf components with relatively
little effort in a short amount of time.

• MPI is primarily for SPMD/MIMD. HPF is an example of a SIMD

interface.
Communication Model of Parallel Platforms

• There are two primary forms of data exchange between

parallel tasks – accessing a shared data space and
exchanging messages.

• Platforms that provide a shared data space are called shared-

address-space machines or multiprocessors.
1. Consistency problems
2. Global view of memory

• Platforms that support messaging are also called message

passing platforms or multicomputers.
1. No consistency problems
2. Lack of global view of memory
3. Communication is costly
Shared-Address-Space Platforms

• Part (or all) of the memory is accessible to all processors.

• Processors interact by modifying data objects stored in this

shared-address-space.

• If the time taken by a processor to access any memory

word in the system global or local is identical, the platform
is classified as a uniform memory access (UMA), else, a non-
uniform memory access (NUMA) machine.
NUMA and UMA Shared-Address-Space Platforms
P P
P
M M

Interconnection Network

Interconnection Network
C C M

P P P
M M
C C M

P P
M M
P
C C M
(a) (b) (c)

Typical shared-address-space architectures: (a)

Uniform-memory-access shared-address-space computer; (b)
Uniform-memory-access shared-address-space computer with
caches and memories; (c) Non-uniform-memory-access
shared-address-space computer with local memory only.
NUMA and UMA Shared-Address-Space Platforms

• The distinction between NUMA and UMA platforms is important

from the point of view of algorithm design. NUMA machines
require locality from underlying algorithms for performance.

• Programming these platforms is easier since reads and writes

are implicitly visible to other processors.

• However, read-write data to shared data must be coordinated

(this will be discussed in greater detail when we talk about
threads programming).

• Caches in such machines require coordinated access to

multiple copies. This leads to the cache coherence problem.

• A weaker model of these machines provides an address map,

but not coordinated access. These models are called non
cache coherent shared address space machines.
Shared-Address-Space vs. Shared Memory Machines

• It is important to note the difference between the terms shared

address space and shared memory.

• We refer to the former as a programming abstraction and to

the latter as a physical machine attribute.

• It is possible to provide a shared address space using a

physically distributed memory.
Message-Passing Platforms

• These platforms comprise of a set of processors and their own

(exclusive) memory.

• Instances of such a view come naturally from clustered

workstations and non-shared-address-space multicomputers.

• These platforms are programmed using (variants of) send and

receive primitives.

• Libraries such as MPI and PVM provide such primitives.

Message Passing vs. Shared Address Space Platforms

• Message passing requires little hardware support, other than a

network.

• Shared address space platforms can easily emulate message

passing. The reverse is more difficult to do (in an efficient
manner).
Communication Costs in Parallel Machines

• Along with idling and contention, communication is a major

overhead in parallel programs.

• The cost of communication is dependent on a variety of

features including the programming model semantics, the
network topology, data handling and routing, and associated
software protocols.
Message Passing Costs in Parallel Computers

The total time to transfer a message over a network comprises

of the following:

• Startup time (ts): Time spent at sending and receiving nodes

(executing the routing algorithm, programming routers, etc.).

• Per-hop time (th): This time is a function of number of hops and

includes factors such as switch latencies, network delays, etc.

• Per-word transfer time (tw ): This time includes all overheads that
are determined by the length of the message. This includes
bandwidth of links, error checking and correction, etc.
Store-and-Forward Routing

• A message traversing multiple hops is completely received at

an intermediate hop before being forwarded to the next hop.

• The total communication cost for a message of size m words to

traverse l communication links is

tcomm = ts + (mtw + th)l.

• In most platforms, th is small and assume l = 1 then the above

expression can be approximated by

tcomm = ts + mtw .

• See homework.
MPI: the Message Passing Interface

• MPI defines a standard library for message-passing that can

be used to develop portable message-passing programs using
either C or Fortran.

• The MPI standard defines both the syntax as well as the

semantics of a core set of library routines.

• Vendor implementations of MPI are available on almost all

commercial parallel computers.

• It is possible to write fully-functional message-passing programs

by using only the six routines.

• Syntax is the grammar. It describes the way to construct a correct sentence.

For example, this water is triangular is syntactically correct.
• Semantics relates to the meaning. this water is triangular does not mean
anything, though the grammar is ok.
MPI: the Message Passing Interface

The minimal set of MPI routines.

MPI_Init Initializes MPI.

MPI_Finalize Terminates MPI.
MPI_Comm_size Determines the number of processes.
MPI_Comm_rank Determines the label of the calling process.
MPI_Send Sends a message.
MPI_Recv Receives a message.
Starting and Terminating the MPI Library

• MPI_Init is called prior to any calls to other MPI routines. Its

purpose is to initialize the MPI environment.

• MPI_Finalize is called at the end of the computation,

and it performs various clean-up tasks to terminate the MPI
environment.

• The prototypes of these two functions are:

int MPI_Init(int *argc, char ***argv)

int MPI_Finalize()

• MPI_Init also strips off any MPI related command-line

arguments.

• All MPI routines, data-types, and constants are prefixed

by “MPI_”. The return code for successful completion is
MPI_SUCCESS.
Communicators

• A communicator defines a communication domain – a set of

processes that are allowed to communicate with each other.

• Information about communication domains is stored in

variables of type MPI_Comm.

• Communicators are used as arguments to all message transfer

MPI routines.

• A process can belong to many different (possibly overlapping)

communication domains.

• MPI defines a default communicator called MPI_COMM_WORLD

which includes all the processes.
Querying Information

• The MPI_Comm_size and MPI_Comm_rank functions are used to

determine the number of processes and the label of the calling
process, respectively.

• The calling sequences of these routines are as follows:

int MPI_Comm_size(MPI_Comm comm, int *size)

int MPI_Comm_rank(MPI_Comm comm, int *rank)

• The rank of a process is an integer that ranges from zero up to

the size of the communicator minus one.
Our First MPI Program

#include <mpi.h>

main(int argc, char *argv[])

{
int npes, myrank;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
printf("From process %d out of %d, Hello World!\n",
myrank, npes);
MPI_Finalize();
}
Sending and Receiving Messages

• The basic functions for sending and receiving messages in MPI

are the MPI_Send and MPI_Recv, respectively.

• The calling sequences of these routines are as follows:

int MPI_Send(void *buf, int count, MPI_Datatype datatype,

int dest, int tag, MPI_Comm comm)
int MPI_Recv(void *buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm, MPI_Status *status)

• MPI provides equivalent datatypes for all C datatypes. This is

done for portability reasons.

• The datatype MPI_BYTE corresponds to a byte (8 bits) and

MPI_PACKED corresponds to a collection of data items that has
been created by packing non-contiguous data.

• The message-tag can take values ranging from zero up to the

MPI defined constant MPI_TAG_UB.
A sending-receiving MPI Program
#include <mpi.h>

main(int argc, char *argv[])

{
int npes, myrank, a[10], b[10];

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

if (myrank==0){
a[0]=0; b[0]=1;
MPI_Send(a,1,MPI_INT,1,1,MPI_COMM_WORLD);
}
else{
a[0]=2; b[0]=3;
MPI_Recv(a,1,MPI_INT,0,1,MPI_COMM_WORLD,&status);
}
printf("From processor: %d out of %d a=%d b=%d\n"
,myrank,npes,a[0],b[0]);
MPI_Finalize();
}

• Homework: Compute the start up time (ts) and transmission

time per word (tm), note that this is so called uncongested
time.

Autodesk® ImageModeler™ 2009
No ratings yet
Autodesk® ImageModeler™ 2009
22 pages
Verizon Bill 08 12 2019 PDF
No ratings yet
Verizon Bill 08 12 2019 PDF
55 pages
WagoLibModbus IP 01 en
No ratings yet
WagoLibModbus IP 01 en
7 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Slide02 Parallel Computers
No ratings yet
Slide02 Parallel Computers
44 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Lecture 1
No ratings yet
Lecture 1
23 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Chapter2 Part 3
No ratings yet
Chapter2 Part 3
27 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
CS516: Parallelization of Programs: Overview of Parallel Architectures
No ratings yet
CS516: Parallelization of Programs: Overview of Parallel Architectures
43 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
Architecture
No ratings yet
Architecture
67 pages
Unit 4
No ratings yet
Unit 4
16 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Unit3 All
No ratings yet
Unit3 All
115 pages
3 4 Flayynn Taxonomy, Network
No ratings yet
3 4 Flayynn Taxonomy, Network
84 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
P 1
No ratings yet
P 1
44 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Multi-Core Programming - Increasing Performance Through Software Multi-Threading
No ratings yet
Multi-Core Programming - Increasing Performance Through Software Multi-Threading
11 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Pda 2
No ratings yet
Pda 2
105 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Chap15 Sima Mimd
No ratings yet
Chap15 Sima Mimd
12 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
02 - B (Parallel Hardware)
No ratings yet
02 - B (Parallel Hardware)
40 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
18 pages
Unit 1
No ratings yet
Unit 1
25 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
L2
No ratings yet
L2
27 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Aircraft Intercom Systems
From Everand
Aircraft Intercom Systems
Askhat Khafizow
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
IBM 1401 Programming Systems
From Everand
IBM 1401 Programming Systems
Archive Classics
No ratings yet
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet
Robotics and Automation
From Everand
Robotics and Automation
Rahul Basu
No ratings yet
Networked Control System: Fundamentals and Applications
From Everand
Networked Control System: Fundamentals and Applications
Fouad Sabry
No ratings yet
Telecommunications Traffic : Technical and Business Considerations
From Everand
Telecommunications Traffic : Technical and Business Considerations
Sigit Haryadi
No ratings yet
Load Dataset: Import As
No ratings yet
Load Dataset: Import As
8 pages
Combining Factors
No ratings yet
Combining Factors
18 pages
Foundations of Engineering Economy
No ratings yet
Foundations of Engineering Economy
29 pages
Software Project Planning (Ch.5)
No ratings yet
Software Project Planning (Ch.5)
25 pages
Learning Goals: Normalization
No ratings yet
Learning Goals: Normalization
8 pages
Prof. Rajasshrie Pillai
No ratings yet
Prof. Rajasshrie Pillai
38 pages
Distributed Pervasive Systems
No ratings yet
Distributed Pervasive Systems
6 pages
Ch5 Perception and Individual Decision Making
No ratings yet
Ch5 Perception and Individual Decision Making
29 pages
Distributed Pervasive Systems
No ratings yet
Distributed Pervasive Systems
8 pages
Parallel Algorithm Models
No ratings yet
Parallel Algorithm Models
21 pages
Edfa Network Management S ND Debugging Tutorial
No ratings yet
Edfa Network Management S ND Debugging Tutorial
24 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
Red Hat Enterprise Linux-7-System Administrators Guide-En-US
No ratings yet
Red Hat Enterprise Linux-7-System Administrators Guide-En-US
594 pages
Autolite Autolite Autolite: Token Number Indicator
No ratings yet
Autolite Autolite Autolite: Token Number Indicator
6 pages
Proof of Concept Virtualization With The Intel Xeon Processor 5500 Series
No ratings yet
Proof of Concept Virtualization With The Intel Xeon Processor 5500 Series
12 pages
Fire Alarm
80% (5)
Fire Alarm
53 pages
Ict Reviewer
No ratings yet
Ict Reviewer
2 pages
Advanced Diploma Study Program
100% (1)
Advanced Diploma Study Program
3 pages
FOC Algorithm For PMSM PDF
No ratings yet
FOC Algorithm For PMSM PDF
8 pages
ProgramGuide BSCIT
No ratings yet
ProgramGuide BSCIT
31 pages
Synk
No ratings yet
Synk
18 pages
7 - 4 HTTPServer
No ratings yet
7 - 4 HTTPServer
14 pages
SNMP GD 7 01 2013
No ratings yet
SNMP GD 7 01 2013
106 pages
COA Experiment No 9-10
No ratings yet
COA Experiment No 9-10
6 pages
Cbse - Class 9: I.T 402 - Part B: Unit I & Ii Question Bank
No ratings yet
Cbse - Class 9: I.T 402 - Part B: Unit I & Ii Question Bank
2 pages
MikroTik Profile PDF
No ratings yet
MikroTik Profile PDF
16 pages
Cs8383 Oops Lab Manual
83% (12)
Cs8383 Oops Lab Manual
81 pages
The Mysterious Island - Jules Verne
No ratings yet
The Mysterious Island - Jules Verne
526 pages
Academy of Animation and Gaming New Delhi - Brochure
No ratings yet
Academy of Animation and Gaming New Delhi - Brochure
36 pages
Aviation: A.I. Solutions Improving Safety, Security and Operational Efficiency
No ratings yet
Aviation: A.I. Solutions Improving Safety, Security and Operational Efficiency
4 pages
The Computer System
No ratings yet
The Computer System
11 pages
Timothy Hodgkins: Profile
No ratings yet
Timothy Hodgkins: Profile
2 pages
Lec 30
No ratings yet
Lec 30
19 pages
Author Query Form: Queries And/or Remarks
No ratings yet
Author Query Form: Queries And/or Remarks
2 pages
Java Spring Boot Microservices Example - Step by Step Guide - Ge
No ratings yet
Java Spring Boot Microservices Example - Step by Step Guide - Ge
13 pages
02 RDBMS CG
No ratings yet
02 RDBMS CG
36 pages
Sequence Detector
50% (2)
Sequence Detector
26 pages