Unit 1 - Part - 2

Uploaded by

Meet Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views30 pages

Unit 1 - Part - 2

Uploaded by

Meet Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

High Performance Computing

Dr. Amit Barve,

Associate Professor and Head of the Department,
CSE department, PIET, PU
Motivating Parallelism
•The role of parallelism in accelerating computing speeds has been recognized for
several decades. Developing parallel hardware and software has traditionally been
time and effort intensive.
•If one is to view this in the context of rapidly improving uniprocessor speeds, one
is tempted to question the need for parallel computing.
•There are some unmistakable trends in hardware design, which indicate that
uniprocessor (or implicitly parallel) architectures may not be able to sustain the
rate of realizable performance increments in the future.
•This is the result of a number of fundamental physical and computational
limitations.
•The emergence of standardized parallel programming environments, libraries,
and hardware have significantly reduced time to (parallel) solution.
• The vector column_sum is small and easily fits
into the cache
•The matrix b is accessed in a column order.
•The strided access results in very poor
performance.
Example of memory Bandwidth
•Consider the following code fragment:
• for (i = 0; i < 1000; i++)
• column_sum[i] = 0.0;
• for (j = 0; j < 1000; j++)
• column_sum[i] += b[j][i];

•The code fragment sums columns of the

matrix b into a vector column_sum.
Multiplying a matrix with a vector: (a) multiplying column-by-
column, keeping a running sum; (b) computing each element of
the result as a dot product of a row of the matrix with the vector.
Impact of Memory Bandwidth: Example

•We can fix the above code as follows:

• for (i = 0; i < 1000; i++)
• column_sum[i] = 0.0;
• for (j = 0; j < 1000; j++)
• for (i = 0; i < 1000; i++)
• column_sum[i] += b[j][i];

•In this case, the matrix is traversed in a row-order and

performance can be expected to be significantly better.
Memory System Performance: Summary
Dichotomy of Parallel
Computing Platforms
Control Structure of Parallel Programs
•Processing units in parallel computers either operate under
the centralized control of a single control unit or work
independently.
•If there is a single control unit that dispatches the same
instruction to various processors (that work on different data),
the model is referred to as single instruction stream, multiple
data stream (SIMD).
•If each processor has its own control control unit, each
processor can execute different instructions on different data
items. This model is called multiple instruction stream,
multiple data stream (MIMD).
SIMD and MIMD Processors

A typical SIMD architecture (a) and a typical MIMD architecture (b).

SIMD Processors
•Some of the earliest parallel computers such as the Illiac IV, MPP,
DAP, CM-2, and MasPar MP-1 belonged to this class of machines.
•Variants of this concept have found use in co-processing units such as
the MMX units in Intel processors and DSP chips such as the Sharc.
•SIMD relies on the regular structure of computations (such as those in
image processing).
•It is often necessary to selectively turn off operations on certain data
items. For this reason, most SIMD programming paradigms allow for an
``activity mask'', which determines if a processor should participate in a
computation or not.
Conditional Execution in SIMD Processors

Executing a conditional statement on an SIMD computer with four

processors: (a) the conditional statement; (b) the execution of the
statement in two steps.
MIMD Processors
In contrast to SIMD processors, MIMD processors can
execute different programs on different processors.
A variant of this, called single program multiple data streams
(SPMD) executes the same program on different processors.
It is easy to see that SPMD and MIMD are closely related in
terms of programming flexibility and underlying architectural
support.
Examples of such platforms include current generation Sun
Ultra Servers, SGI Origin Servers, multiprocessor PCs,
workstation clusters, and the IBM SP.
SIMD-MIMD Comparison
•SIMD computers require less hardware than MIMD
computers (single control unit).
•However, since SIMD processors ae specially designed,
they tend to be expensive and have long design cycles.
•Not all applications are naturally suited to SIMD processors.
•In contrast, platforms supporting the SPMD paradigm can be
built from inexpensive off-the-shelf components with relatively
little effort in a short amount of time.
Communication Model of Parallel Platforms
•There are two primary forms of data exchange
between parallel tasks - accessing a shared data
space and exchanging messages.
•Platforms that provide a shared data space are
called shared-address-space machines or
multiprocessors.
•Platforms that support messaging are also called
message passing platforms or multicomputers.
Shared-Address-Space Platforms
•Part (or all) of the memory is accessible to all
processors.
•Processors interact by modifying data objects
stored in this shared-address-space.
•If the time taken by a processor to access any
memory word in the system global or local is
identical, the platform is classified as a uniform
memory access (UMA), else, a non-uniform memory
access (NUMA) machine.
NUMA and UMA Shared-Address-Space Platforms

Typical shared-address-space architectures: (a) Uniform-memory access

shared-address-space computer; (b) Uniform-memory-access shared-
address-space computer with caches and memories; (c) Non-uniform-
memory-access shared-address-space computer with local memory only.
NUMA and UMA Shared-Address-Space Platforms

•The distinction between NUMA and UMA platforms is important from the
point of view of algorithm design. NUMA machines require locality from
underlying algorithms for performance.
•Programming these platforms is easier since reads and writes are
implicitly visible to other processors.
•However, read-write data to shared data must be coordinated (this will
be discussed in greater detail when we talk about threads programming).
•Caches in such machines require coordinated access to multiple copies.
This leads to the cache coherence problem.
•A weaker model of these machines provides an address map, but not
coordinated access. These models are called non cache coherent shared
address space machines.
Shared-Address-Space vs. Shared Memory machines

•It is important to note the difference between the

terms shared address space and shared memory.
•We refer to the former as a programming
abstraction and to the latter as a physical machine
attribute.
•It is possible to provide a shared address space
using a physically distributed memory.
Message-Passing Platforms
•These platforms comprise of a set of processors
and their own (exclusive) memory.
•Instances of such a view come naturally from
clustered workstations and non-shared-address-
space multicomputers.
•These platforms are programmed using (variants
of) send and receive primitives.
•Libraries such as MPI and PVM provide such
primitives.
Message Passing vs. Shared Address Space Platforms

•Message passing requires little hardware support,

other than a network.
•Shared address space platforms can easily emulate
message passing. The reverse is more difficult to do
(in an efficient manner).
Physical Organization

of Parallel Platforms
Architecture of an Ideal Parallel
Computer
•A natural extension of the Random Access Machine
(RAM) serial architecture is the Parallel Random
Access Machine, or PRAM.
•PRAMs consist of p processors and a global
memory of unbounded size that is uniformly
accessible to all processors.
•Processors share a common clock but may execute
different instructions in each cycle.
Architecture of an Ideal Parallel Computer
•Depending on how simultaneous memory accesses are
handled, PRAMs can be divided into four subclasses.
–Exclusive-read, exclusive-write (EREW) PRAM.
–Concurrent-read, exclusive-write (CREW) PRAM.
–Exclusive-read, concurrent-write (ERCW) PRAM.
–Concurrent-read, concurrent-write (CRCW) PRAM.
Architecture of an Ideal Parallel Computer
•What does concurrent write mean, anyway?
–Common: write only if all values are identical.
–Arbitrary: write the data from a randomly selected processor.
–Priority: follow a predetermined priority order.
–Sum: Write the sum of all data items.
Physical Complexity of an Ideal Parallel
•Processors and memoriesComputer
are connected via
switches.
•Since these switches must operate in O(1) time at
the level of words, for a system of p processors and m
words, the switch complexity is O(mp).
•Clearly, for meaningful values of p and m, a true
PRAM is not realizable.
Interconnection Networks for Parallel computers
•Interconnection networks carry data between processors
and to memory.
•Interconnects are made of switches and links (wires, fiber).
•Interconnects are classified as static or dynamic.
•Static networks consist of point-to-point communication links
among processing nodes and are also referred to as direct
networks.
•Dynamic networks are built using switches and
communication links. Dynamic networks are also referred to
as indirect networks.
Interconnection Networks for Parallel
computers
•Interconnection networks carry data between processors
and to memory.
•Interconnects are made of switches and links (wires, fiber).
•Interconnects are classified as static or dynamic.
•Static networks consist of point-to-point communication links
among processing nodes and are also referred to as direct
networks.
•Dynamic networks are built using switches and
communication links. Dynamic networks are also referred to
as indirect networks.
www.paruluniversity.ac.i
n

EX2013 ChallengeYourself 3 3 Instructions PDF
0% (2)
EX2013 ChallengeYourself 3 3 Instructions PDF
3 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
Pda 2
No ratings yet
Pda 2
105 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
3 4 Flayynn Taxonomy, Network
No ratings yet
3 4 Flayynn Taxonomy, Network
84 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Unit 4
No ratings yet
Unit 4
16 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Parallel Architectures Parallel Architectures: Ever Faster
No ratings yet
Parallel Architectures Parallel Architectures: Ever Faster
11 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
15CS72 ACA Module1 Chapter1Final
No ratings yet
15CS72 ACA Module1 Chapter1Final
25 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Lec1 Introduction To Parallel Computing
No ratings yet
Lec1 Introduction To Parallel Computing
40 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
CS0051 - M1-Parallel Computing Hardware
No ratings yet
CS0051 - M1-Parallel Computing Hardware
36 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Seminar
No ratings yet
Seminar
85 pages
8051 Arch
No ratings yet
8051 Arch
55 pages
Unit 1
No ratings yet
Unit 1
22 pages
Parallel Computing
100% (1)
Parallel Computing
53 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages
SISd
No ratings yet
SISd
17 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Unit - 01 Easid
No ratings yet
Unit - 01 Easid
18 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Pipeliningandvectorprocessing 140612142847 Phpapp01
No ratings yet
Pipeliningandvectorprocessing 140612142847 Phpapp01
53 pages
2.3 Dichotomy of Parallel Computing Platforms
No ratings yet
2.3 Dichotomy of Parallel Computing Platforms
6 pages
HPC Unit 1 Solution
No ratings yet
HPC Unit 1 Solution
8 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Architecture
No ratings yet
Architecture
67 pages
Cic 311
No ratings yet
Cic 311
2 pages
21-Day Plan To Prepare For Data Structures and Algorithms (DSA)
No ratings yet
21-Day Plan To Prepare For Data Structures and Algorithms (DSA)
12 pages
PPS With Diagram
No ratings yet
PPS With Diagram
17 pages
Rahul Kushwah 0901io201051 Design & Analysis of Algorithm Pratical File
0% (1)
Rahul Kushwah 0901io201051 Design & Analysis of Algorithm Pratical File
47 pages
Linked List Data Structure
No ratings yet
Linked List Data Structure
27 pages
Cambridge IGCSE: Computer Science 0478/11
No ratings yet
Cambridge IGCSE: Computer Science 0478/11
12 pages
String - Pattern Matching
No ratings yet
String - Pattern Matching
86 pages
EXPERIMENT-1 Avl
No ratings yet
EXPERIMENT-1 Avl
16 pages
1.introducation To Dot Net Framework-1.docx - 20250103 - 134852 - 0000
100% (1)
1.introducation To Dot Net Framework-1.docx - 20250103 - 134852 - 0000
4 pages
Computer Science Full GCSE Notes
No ratings yet
Computer Science Full GCSE Notes
16 pages
LMA) ဝင္းထက္ဝင္း PIC
No ratings yet
LMA) ဝင္းထက္ဝင္း PIC
207 pages
Deepercoder: Code Generation Using Machine Learning: Ntroduction
No ratings yet
Deepercoder: Code Generation Using Machine Learning: Ntroduction
6 pages
Core Java Interview Questions and Answers by Advanto
No ratings yet
Core Java Interview Questions and Answers by Advanto
55 pages
Question Bank Class Xi Cs Q&A
No ratings yet
Question Bank Class Xi Cs Q&A
64 pages
AI Unit-II Chapter-III Propositional Logic
No ratings yet
AI Unit-II Chapter-III Propositional Logic
36 pages
Grade 8 Math
No ratings yet
Grade 8 Math
1 page
Jcs2201-Python Programming Unit-I Notes
No ratings yet
Jcs2201-Python Programming Unit-I Notes
42 pages
Autodesk Inventor Shortcuts 2023
No ratings yet
Autodesk Inventor Shortcuts 2023
14 pages
Algopoint Zero Repaint - Compress
No ratings yet
Algopoint Zero Repaint - Compress
15 pages
University of Wollongong Singapore Institute of Management: School of Computing and Information Technology (SCIT)
No ratings yet
University of Wollongong Singapore Institute of Management: School of Computing and Information Technology (SCIT)
5 pages
COP2800 MidtermStudyGuide
No ratings yet
COP2800 MidtermStudyGuide
31 pages
Final - Module-4 Cloud Computing - May 8, 2023
No ratings yet
Final - Module-4 Cloud Computing - May 8, 2023
88 pages
CS - Class XII 2021-22 Programs List For Record: Part A - Python (Use User Defined Functions)
No ratings yet
CS - Class XII 2021-22 Programs List For Record: Part A - Python (Use User Defined Functions)
7 pages
Apache Spark Explanation
No ratings yet
Apache Spark Explanation
9 pages
Assign4 Ans
100% (1)
Assign4 Ans
3 pages
Discrete Mathematics With Applications: Fifth Edition Susanna S. Epp All Chapter Instant Download
100% (1)
Discrete Mathematics With Applications: Fifth Edition Susanna S. Epp All Chapter Instant Download
55 pages
Lab No. 5:: Method Overloading
No ratings yet
Lab No. 5:: Method Overloading
6 pages
How To Convert Numbers To Binary - 3 Steps - Instructables
No ratings yet
How To Convert Numbers To Binary - 3 Steps - Instructables
3 pages
1 s2.0 S1110016822001892 Main
No ratings yet
1 s2.0 S1110016822001892 Main
12 pages

Unit 1 - Part - 2

Uploaded by

Unit 1 - Part - 2

Uploaded by

High Performance Computing

Dr. Amit Barve,

•The code fragment sums columns of the

•We can fix the above code as follows:

•In this case, the matrix is traversed in a row-order and

A typical SIMD architecture (a) and a typical MIMD architecture (b).

Executing a conditional statement on an SIMD computer with four

Typical shared-address-space architectures: (a) Uniform-memory access

•It is important to note the difference between the

•Message passing requires little hardware support,

You might also like