0% found this document useful (0 votes)

10 views44 pages

Lecture 2 - Parallel Programming Platforms (Part I)

Lecture 2 discusses parallel computing platforms, focusing on implicit parallelism, memory system performance, and the dichotomy of parallel computing architectures. It highlights the importance of understanding performance bottlenecks in microprocessor architectures, including pipelining and superscalar execution. The lecture also introduces Flynn's Taxonomy for classifying parallel computers based on instruction and data streams.

Uploaded by

Nur Fatihah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views44 pages

Lecture 2 - Parallel Programming Platforms (Part I)

Uploaded by

Nur Fatihah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

LECTURE 2: Parallel Computing Platforms

(PART 1)

Ananth Grama, Anshul Gupta,

George Karypis, and Vipin Kumar

To accompany the text ``Introduction to Parallel Computing'',

Addison Wesley, 2003.

Page 1 Introduction to High Performance Computing

Topic Overview
 Implicit Parallelism: Trends in Microprocessor
Architectures
 Limitations of Memory System Performance
 Dichotomy of Parallel Computing Platforms
 Communication Model of Parallel Platforms

Page 2 Introduction to High Performance Computing

Scope of Parallelism
 Conventional architectures coarsely comprise of a processor,
memory system, and the datapath.
 Each of these components present significant performance
bottlenecks.
 Parallelism addresses each of these components in significant ways.
 Different applications utilize different aspects of parallelism:
◦ - e.g.
 data itensive applications utilize high aggregate throughput
 server applications utilize high aggregate network bandwidth
 scientific applications typically utilize high processing and
memory system performance.
 It is important to understand each of these performance
bottlenecks.

Page 3 Introduction to High Performance Computing

Implicit Parallelism: Trends in Microprocessor
Architectures
 Microprocessor clock speeds have posted impressive
gains over the past two decades.

 In the current technologies, due to higher levels of

device integration, there are a lot of transistors in the
design of the microprocessor.

 Therefore, current processors use these resources

(transistors etc.) to execute multiple instructions in the
same cycle.

Page 4 Introduction to High Performance Computing

Machine Cycle
• The steps performed by the computer processor for each
machine language instruction received. The machine
cycle is a 4 process cycle that includes reading and
interpreting the machine language, executing the code
and then storing that code.
• Four steps:
• Fetch - Retrieve an instruction from the memory.
• Decode - Translate the retrieved instruction into a series of
computer commands.
• Execute - Execute the computer commands.
• Store - Send and write the results back in memory.

Page 5 Introduction to High Performance Computing

Pipelining and Superscalar Execution
• Processors have relied on pipelining to improve execution
rate.
• An instruction pipeline is a technique used in the design
of computers to increase their instruction throughput (the
number of instructions that can be executed in a unit of time).
• The basic instruction cycle is broken up into a series called
a pipeline.
• Rather than processing each instruction sequentially each
instruction is split up into a sequence of steps so different
steps can be executed concurrently and in parallel.
• Pipelining increases instruction throughput by performing
multiple operations at the same time.
Task of washing machine
1) washing the cloth
2) drying
3) ironing

Conclusion, all the task will use simultaneously

Page 6 Introduction to High Performance Computing
Pipelining and Superscalar Execution
 Pipelining overlaps various stages of instruction execution to
achieve performance.
 In pipelining, an instruction can be executed while the next
one is being decoded and the next one is being fetched.
 An assembly line for manufacture of cars – an analogy for
pipelining.
◦ 100 time units – broken into 10 pipelined stages of 10 units
each.
◦ Enables faster execution & increase speed.
 Limitations of pipelining:
 The speed of a pipeline is eventually limited by the slowest
stage.
 Pipeline can cause bottlenecks.
sesak

Page 7 Introduction to High Performance Computing

Pipelining and Superscalar Execution

Basic five-stage pipeline in a RISC machine:

 IF = Instruction Fetch,
 ID = Instruction Decode,
 EX = Execute, RISC
(Reduced Instruction Set Computing)
 MEM = Memory access,
 WB = Register write back

In the fourth clock cycle (the green column), the earliest instruction is in
MEM stage, and the latest instruction has not yet entered the pipeline.

Page 8 Introduction to High Performance Computing

Pipelining and Superscalar Execution
 One simple way of alleviating these bottlenecks is to
use multiple pipelines.
 Multiple pipelines:
◦ Improve instruction execution rate.
◦ Multiple instructions are piped into the processor in
parallel.
◦ Multiple instructions are executed on multiple
functional units.
 Superscalar execution: The ability of a processor to
issue multiple instructions in the same cycle.

Page 9 Introduction to High Performance Computing

Superscalar Execution: An Example

Example of a two-way superscalar execution of instructions. By fetching and dispatching two

instructions at a time, a maximum of two instructions per cycle can be completed.

Page 10 Introduction to High Performance Computing

Superscalar Execution: An Example
 In the above example, there is some
wastage of resources due to data
dependencies.
 The example also illustrates that different
instruction mixes with identical semantics
can take significantly different execution
time.

Page 11 Introduction to High Performance Computing

Superscalar Execution
 Scheduling of instructions is determined by a number of factors:

◦ True Data Dependency: The result of one operation is an

input to the next. Add R1 R2 R3 >> SUB R6 R1 R4
◦ Resource Dependency: Two operations require the same
resource.
◦ Branch Dependency: Scheduling instructions across
conditional branch statements cannot be done deterministically
a-priori.
◦ The scheduler, a piece of hardware looks at a large number of
instructions in an instruction queue and selects appropriate
number of instructions to execute concurrently based on these
factors.
◦ The complexity of this hardware is an important constraint on
superscalar processors.

Page 12 Introduction to High Performance Computing

Superscalar Execution: Issue Mechanisms
 In the simpler model, instructions can be issued only in
the order in which they are encountered. That is, if the
second instruction cannot be issued because it has a
data dependency with the first, only one instruction is
issued in the cycle.This is called in-order issue.
 In a more aggressive model, instructions can be issued
out of order. In this case, if the second instruction has
data dependencies with the first, but the third
instruction does not, the first and third instructions can
be co-scheduled.This is also called dynamic issue.
 Performance of in-order issue is generally limited.

Page 13 Introduction to High Performance Computing

Superscalar Execution: Efficiency Considerations
 Not all functional units can be kept busy at all times.
 Vertical waste: If during a cycle, no functional units
are utilized.
 Horizontal waste: If during a cycle, only some of the
functional units are utilized.
 Due to limited parallelism in typical instruction traces,
dependencies, or the inability of the scheduler to
extract parallelism, the performance of superscalar
processors is eventually limited.
 Conventional microprocessors typically support four-
way superscalar execution.

Page 14 Introduction to High Performance Computing

Very Long Instruction Word (VLIW) Processors
 The hardware cost and complexity of
the superscalar scheduler is a major
consideration in processor design.
 To address this issues, VLIW
processors rely on compile time
analysis to identify and bundle
together instructions that can be
executed concurrently.
 These instructions are packed and
dispatched together, and thus the
name very long instruction word.
 This concept was used with some
commercial success in the Multiflow
Trace machine (circa 1984).
 Variants of this concept are employed
in the Intel IA64 processors.

Page 15 Introduction to High Performance Computing

Limitations of
Memory System Performance
 Memory system, and not processor speed, is often the
bottleneck for many applications.
 Memory system performance is largely captured by two
parameters, latency and bandwidth.
 Latency is the time from the issue of a memory
request to the time the data is available at the
processor.
 Bandwidth is the rate at which data can be pumped to
the processor by the memory system.

Page 16 Introduction to High Performance Computing

Memory System Performance: Bandwidth
and Latency
 It is very important to understand the difference
between latency and bandwidth.
Example latency  Consider the example of a fire-hose. If the water comes
out of the hose two seconds after the hydrant is turned
on, the latency of the system is two seconds.
Example bandwidth  Once the water starts flowing, if the hydrant delivers
water at the rate of 5 gallons/second, the bandwidth of
the system is 5 gallons/second.
 If you want immediate response from the hydrant, it is
important to reduce latency.
 If you want to fight big fires, you want high bandwidth.

Page 17 Introduction to High Performance Computing

Improving Effective Memory
Latency Using Caches
 Caches are small and fast memory elements between
the processor and DRAM.
 This memory acts as a low-latency high-bandwidth
storage.
 If a piece of data is repeatedly used, the effective latency
of this memory system can be reduced by the cache.
 The fraction of data references satisfied by the cache is
called the cache hit ratio of the computation on the
system.
 Cache hit ratio achieved by a code on a memory system
often determines its performance.
Why the processor is very fast as compared to other components in the system
--> It's the material use in processor is very expensive because its gold
Page 18 Introduction to High Performance Computing
Impact of Memory Bandwidth
 Memory bandwidth is determined by the
bandwidth of the memory bus as well as the
memory units.

• Memory bandwidth can be improved by

increasing the size of memory blocks.

 The underlying system takes l time units (where

l is the latency of the system) to deliver b units
of data (where b is the block size).
Fast car (latency)
Road (bandwidth)
Page 19 Introduction to High Performance Computing
Alternate Approaches for
Hiding Memory Latency
 Consider the problem of browsing the web on a very slow
network connection. We deal with the problem in one of three
possible ways:

◦ Prefetching: We anticipate which pages we are going to browse

ahead of time and issue requests for them in advance;

◦ Multithreading: We open multiple browsers and access

different pages in each browser, thus while we are waiting for
one page to load, we could be reading others; or

◦ Spatial locality: We access a whole bunch of pages in one go -

reducing the latency across various accesses.

Page 20 Introduction to High Performance Computing

Tradeoffs of Multithreading and Prefetching
 Multithreading and prefetching are critically impacted by
the memory bandwidth.
 Multithreaded systems become bandwidth bound
instead of latency bound.
 Multithreading and prefetching only address the latency
problem and may often worsen the bandwidth problem.
 Multithreading and prefetching also require significantly
more hardware resources in the form of storage.

Page 21 Introduction to High Performance Computing

Dichotomy of Parallel Computing Platforms
 Based on the logical and physical organization of parallel
platforms.

◦ Logical organization: Programmer’s view of the

platform.

◦ Physical organization: The actual hardware organization

of the platform.

Page 22 Introduction to High Performance Computing

Logical Organization

 An explicitly parallel program must specify:

◦ parallel/ concurrent tasks (Control Structure).

◦ interaction between the concurrent subtasks

(Communication model).

Page 23 Introduction to High Performance Computing

Control Structure of Parallel Programs
 Parallelism can be expressed at various
levels of granularity - from instruction level
to processes.

 Between these extremes exist a range of

models, along with corresponding
architectural support.

Page 24 Introduction to High Performance Computing

Control Structure of Parallel Programs
 Processing units in parallel computers either operate under the
centralized control of a single control unit or work independently.

 If there is a single control unit that dispatches the same instruction

to various processors (that work on different data), the model is
referred to as single instruction stream, multiple data
stream (SIMD).

 If each processor has its own control unit, each processor can
execute different instructions on different data items. This model is
called multiple instruction stream, multiple data stream
(MIMD).

Page 25 Introduction to High Performance Computing

SIMD and MIMD Processors

A typical SIMD architecture (a) and a typical MIMD architecture (b).

Page 26 Introduction to High Performance Computing

SIMD Processors
 Some of the earliest parallel computers such as the Illiac IV,
MPP, DAP, CM-2, and MasPar MP-1 belonged to this class of
machines.

 Variants of this concept have found use in co-processing

units such as the MMX units in Intel processors and DSP
chips such as the Sharc.

 SIMD relies on the regular structure of computations (such

as those in image processing).

Page 27 Introduction to High Performance Computing

MIMD Processors
 In contrast to SIMD processors, MIMD processors can
execute different programs on different processors.
 A variant of this, called single program multiple data
streams (SPMD) executes the same program on
different processors.
 It is easy to see that SPMD and MIMD are closely
related in terms of programming flexibility and
underlying architectural support.
 Examples of such platforms include current generation
Sun Ultra Servers, SGI Origin Servers, multiprocessor
PCs, workstation clusters, and the IBM SP.

Page 28 Introduction to High Performance Computing

IMPORTANT

 Flynn's Classical Taxonomy

• There are different ways to classify parallel computers.

• One of the more widely used classifications, in use since

1966, is called Flynn's Taxonomy.

• Flynn's taxonomy distinguishes multi-processor

computer architectures according to how they can be
classified along the two independent dimensions
of Instruction Stream and Data Stream. Each of these
dimensions can have only one of two possible
states: Single or Multiple.

Page 29 Introduction to High Performance Computing

The matrix below defines the 4 possible classifications
according to Flynn:

Page 30 Introduction to High Performance Computing

a) Single Instruction, Single Data (SISD)

 A serial (non-parallel) computer

 Single Instruction: Only one instruction stream is being acted on by
the CPU during any one clock cycle
 Single Data: Only one data stream is being used as input during any
one clock cycle
 Deterministic execution
 This is the oldest type of computer
 Examples: older generation mainframes, minicomputers,
workstations and single processor/core PCs.

Page 31 Introduction to High Performance Computing

b) Single Instruction, Multiple Data (SIMD)
 A type of parallel computer
 Single Instruction: All processing units execute the same instruction
at any given clock cycle
 Multiple Data: Each processing unit can operate on a different data
element
 Best suited for specialized problems characterized by a high degree
of regularity, such as graphics/image processing.
 Synchronous (lockstep) and deterministic execution
 Two varieties: Processor Arrays and Vector Pipelines

Page 32 Introduction to High Performance Computing

c) Multiple Instructions, Single Data (MISD)
 A type of parallel computer
 Multiple Instructions: Each processing unit operates on the data
independently via separate instruction streams.
 Single Data: A single data stream is fed into multiple processing
units.
 Few (if any) actual examples of this class of parallel computer have
ever existed.
 Some conceivable uses might be:
o multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded
message.

Page 33 Introduction to High Performance Computing

d) Multiple Instructions, Multiple Data (MIMD)
 A type of parallel computer
 Multiple Instruction: Every processor may be executing a different
instruction stream
 Multiple Data: Every processor may be working with a different data
stream
 Execution can be synchronous or asynchronous, deterministic or non-

(Part 1)
Lecture 2: Parallel Computing Platforms
deterministic
 Currently, the most common type of parallel computer - most modern
supercomputers fall into this category.
 Examples: most current supercomputers, networked parallel computer
clusters and "grids", multi-processor SMP computers, multi-core PCs.
 Note: many MIMD architectures also include SIMD execution sub-
components

Page 34 Introduction to High Performance Computing

SIMD-MIMD Comparison
 SIMD computers require less hardware than MIMD
computers (single control unit).
 However, since SIMD processors ae specially designed,
they tend to be expensive and have long design cycles.
 Not all applications are naturally suited to SIMD
processors.
 In contrast, platforms supporting the SPMD paradigm
can be built from inexpensive off-the-shelf components
with relatively little effort in a short amount of time.

Page 35 Introduction to High Performance Computing

Communication Model of Parallel Platforms
 There are two primary forms of data exchange between
parallel tasks:
◦ accessing a shared data space
◦ exchanging messages

 Platforms that provide a shared data space are called

shared-address-space machines or multiprocessors.

 Platforms that support messaging are also called message

passing platforms or multicomputers.

Page 36 Introduction to High Performance Computing

Page 37 Introduction to High Performance Computing

Shared-Address-Space Platforms
 Part (or all) of the memory is accessible to all
processors.
 Processors interact by modifying data objects stored in
this shared-address-space.
 The platform is classified as:
◦ Uniform Memory Access (UMA)
 Processors have equal access time
to memory in the system.
 Identical processors
◦ Non-Uniform Memory Access (NUMA) machine.
 Not all processors have equal access time to all memories.
 Memory access across link is slower

Page 38 Introduction to High Performance Computing

NUMA and UMA Shared-Address-Space Platforms

(a) (b)
Typical shared-address-space architectures: (a) Uniform-
memory access shared-address-space computer; (b) Non-
uniform-memory-access shared-address-space computer
with local memory only.

Page 39 Introduction to High Performance Computing

NUMA and UMA Shared-Address-Space Platforms
 The distinction between NUMA and UMA platforms is
important from the point of view of algorithm design.
NUMA machines require locality from underlying algorithms
for performance.
 Programming these platforms is easier since reads and
writes are implicitly visible to other processors.
 However, read-write data to shared data must be
coordinated (this will be discussed in greater detail when we
talk about threads programming).
 Caches in such machines require coordinated access to
multiple copies.This leads to the cache coherence problem.
 A weaker model of these machines provides an address map,
but not coordinated access. These models are called non
cache coherent shared address space machines.

Page 40 Introduction to High Performance Computing

Message-Passing Platforms
 These platforms comprise of a set of
processors and their own (exclusive)
memory.

 Instances of such a view come naturally from

clustered workstations and non-shared-address-
space multicomputers.

Page 41 Introduction to High Performance Computing

Message-Passing Platforms
 These platforms are programmed using (variants of) send and
receive primitives.
 Libraries such as MPI and PVM provide such primitives.

Beowulf is a multi-computer architecture which can

be used for parallel computations. It is a system
which usually consists of one server node, and one
or more client nodes connected via Ethernet or
some other network.

Page 42 Introduction to High Performance Computing

Message Passing
vs.
Shared Address Space Platforms

 Message passing:
◦ requires little hardware support, other than a network.
◦ processors must explicitly communicate with each
other through messages.
◦ data exchanged among processors cannot be shared, it
is copied (using send/receive messages).

 Shared address space platforms:

◦ requires more hardware support
◦ processors access memory through the shared bus.
◦ data sharing between tasks is fast and uniform.

Page 43 Introduction to High Performance Computing

Next Week:

Lecture 3:
Parallel Platforms
(Part 2)

Page 44 Introduction to High Performance Computing

Chapter 1 - Introduction - 2023 - Programming Massively Parallel Processors
No ratings yet
Chapter 1 - Introduction - 2023 - Programming Massively Parallel Processors
20 pages
High Performance Computing: Course Introduction
No ratings yet
High Performance Computing: Course Introduction
32 pages
CSC 580 - Chapter 2
No ratings yet
CSC 580 - Chapter 2
50 pages
Introduction To High Performance Computing: Unit-I
No ratings yet
Introduction To High Performance Computing: Unit-I
70 pages
The Improvement of The Personal Computer
No ratings yet
The Improvement of The Personal Computer
74 pages
Performance
No ratings yet
Performance
57 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
7TH - Unit 2-21ec74h6 - Ca
No ratings yet
7TH - Unit 2-21ec74h6 - Ca
95 pages
Lecture (2) .PPT-1
100% (1)
Lecture (2) .PPT-1
19 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
التحليل
No ratings yet
التحليل
32 pages
4 - Performance Issues
No ratings yet
4 - Performance Issues
48 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
Chapter 04 Processors and Memory Hierarchy
75% (8)
Chapter 04 Processors and Memory Hierarchy
50 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Presentation - Cea - Chapter16 2
No ratings yet
Presentation - Cea - Chapter16 2
33 pages
High Performance
100% (1)
High Performance
457 pages
L1.0 HPC Overview
No ratings yet
L1.0 HPC Overview
58 pages
FIT9134 Week11
No ratings yet
FIT9134 Week11
21 pages
Lecture 2 - Parallel Programming Platforms (Part I) - Updated - 2021
No ratings yet
Lecture 2 - Parallel Programming Platforms (Part I) - Updated - 2021
44 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
Parallel Processing Chapter - 2
0% (1)
Parallel Processing Chapter - 2
135 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Slide 6
No ratings yet
Slide 6
46 pages
ch3 Parallel PDF
0% (1)
ch3 Parallel PDF
76 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Comp422 534 2020 Lecture1 Introduction
No ratings yet
Comp422 534 2020 Lecture1 Introduction
49 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Comp422 2011 Lecture1 Introduction
No ratings yet
Comp422 2011 Lecture1 Introduction
50 pages
Module 2
No ratings yet
Module 2
127 pages
Chapter 04 Processors and Memory Hierarchy PDF
No ratings yet
Chapter 04 Processors and Memory Hierarchy PDF
50 pages
CUDA - Introduction CUDA - Introduction
No ratings yet
CUDA - Introduction CUDA - Introduction
3 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Lecture-4 Parallel hardware-Jameel-NNL
No ratings yet
Lecture-4 Parallel hardware-Jameel-NNL
39 pages
Modle 01 - HPC Introduction To Pipeline
No ratings yet
Modle 01 - HPC Introduction To Pipeline
124 pages
Parallel Computing Platforms-Dr Nausheen
No ratings yet
Parallel Computing Platforms-Dr Nausheen
47 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
HPC
100% (3)
HPC
457 pages
Unit I-Basic Structure of Computers-Lecture 5
No ratings yet
Unit I-Basic Structure of Computers-Lecture 5
6 pages
Sergei Kurgalin, Sergei Borzunov - A Practical Approach To High-Performance Computing
100% (1)
Sergei Kurgalin, Sergei Borzunov - A Practical Approach To High-Performance Computing
210 pages
Parallel Programming Platforms: Alexandre David 1.2.05
No ratings yet
Parallel Programming Platforms: Alexandre David 1.2.05
30 pages
An Introduction To Parallel Programming. Second Edition Peter S. Pachecopdf Download
100% (2)
An Introduction To Parallel Programming. Second Edition Peter S. Pachecopdf Download
75 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Cs621 Highlighted by (1..8) by Pin2
No ratings yet
Cs621 Highlighted by (1..8) by Pin2
335 pages
Parallel Processing
No ratings yet
Parallel Processing
127 pages
HPC Unit 1 Solution
No ratings yet
HPC Unit 1 Solution
8 pages
Parallelism
No ratings yet
Parallelism
22 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Chap2 Slides
No ratings yet
Chap2 Slides
127 pages
Unit 4 Foc
No ratings yet
Unit 4 Foc
57 pages
Computer Architecture AllClasses-Outline
No ratings yet
Computer Architecture AllClasses-Outline
294 pages
CSC 252 Study Questions 2
No ratings yet
CSC 252 Study Questions 2
21 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
Term Paper For Cse
100% (1)
Term Paper For Cse
4 pages
Important Exam Questions For Parallel and Distributed Systems
No ratings yet
Important Exam Questions For Parallel and Distributed Systems
7 pages
Distributed and Parallel Comuting
No ratings yet
Distributed and Parallel Comuting
70 pages
Parallel Algorithms
No ratings yet
Parallel Algorithms
18 pages
Week2 - AI
No ratings yet
Week2 - AI
33 pages
Unit 1 - Cloud Computing - Digital Content
No ratings yet
Unit 1 - Cloud Computing - Digital Content
69 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
34 pages
Flynn's Classification Divides Computers Into Four Major Groups That Are
No ratings yet
Flynn's Classification Divides Computers Into Four Major Groups That Are
44 pages
DSCC Unit 1 PDF
No ratings yet
DSCC Unit 1 PDF
14 pages
Cs8791 Cloud Computing Unit1 Notes
No ratings yet
Cs8791 Cloud Computing Unit1 Notes
34 pages
PG Cloud Computing Unit II
No ratings yet
PG Cloud Computing Unit II
52 pages
CS0051 - Module 01 - Subtopic 1
No ratings yet
CS0051 - Module 01 - Subtopic 1
27 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Rohini 74684926776
No ratings yet
Rohini 74684926776
24 pages
Parallel Computing
No ratings yet
Parallel Computing
34 pages
Final
No ratings yet
Final
26 pages
Lecture3 (Form Parallelism&flynn)
No ratings yet
Lecture3 (Form Parallelism&flynn)
12 pages
Distributed Computing Full Assignment
No ratings yet
Distributed Computing Full Assignment
4 pages
Computer Architecture Report
No ratings yet
Computer Architecture Report
32 pages
Multiprocessing
No ratings yet
Multiprocessing
10 pages
Flynn's Classification
No ratings yet
Flynn's Classification
3 pages
Parallel Computing System
No ratings yet
Parallel Computing System
4 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet

Lecture 2 - Parallel Programming Platforms (Part I)

Uploaded by

Lecture 2 - Parallel Programming Platforms (Part I)

Uploaded by

LECTURE 2: Parallel Computing Platforms

Ananth Grama, Anshul Gupta,

To accompany the text ``Introduction to Parallel Computing'',

Page 1 Introduction to High Performance Computing

Page 2 Introduction to High Performance Computing

Page 3 Introduction to High Performance Computing

 In the current technologies, due to higher levels of

 Therefore, current processors use these resources

Page 4 Introduction to High Performance Computing

Page 5 Introduction to High Performance Computing

Conclusion, all the task will use simultaneously

Page 7 Introduction to High Performance Computing

Basic five-stage pipeline in a RISC machine:

Page 8 Introduction to High Performance Computing

Page 9 Introduction to High Performance Computing

Example of a two-way superscalar execution of instructions. By fetching and dispatching two

Page 10 Introduction to High Performance Computing

Page 11 Introduction to High Performance Computing

◦ True Data Dependency: The result of one operation is an

Page 12 Introduction to High Performance Computing

Page 13 Introduction to High Performance Computing

Page 14 Introduction to High Performance Computing

Page 15 Introduction to High Performance Computing

Page 16 Introduction to High Performance Computing

Page 17 Introduction to High Performance Computing

• Memory bandwidth can be improved by

 The underlying system takes l time units (where

◦ Prefetching: We anticipate which pages we are going to browse

◦ Multithreading: We open multiple browsers and access

◦ Spatial locality: We access a whole bunch of pages in one go -

Page 20 Introduction to High Performance Computing

Page 21 Introduction to High Performance Computing

◦ Logical organization: Programmer’s view of the

◦ Physical organization: The actual hardware organization

Page 22 Introduction to High Performance Computing

 An explicitly parallel program must specify:

◦ parallel/ concurrent tasks (Control Structure).

◦ interaction between the concurrent subtasks

Page 23 Introduction to High Performance Computing

 Between these extremes exist a range of

Page 24 Introduction to High Performance Computing

 If there is a single control unit that dispatches the same instruction

Page 25 Introduction to High Performance Computing

A typical SIMD architecture (a) and a typical MIMD architecture (b).

Page 26 Introduction to High Performance Computing

 Variants of this concept have found use in co-processing

 SIMD relies on the regular structure of computations (such

Page 27 Introduction to High Performance Computing

Page 28 Introduction to High Performance Computing

 Flynn's Classical Taxonomy

• There are different ways to classify parallel computers.

• One of the more widely used classifications, in use since

• Flynn's taxonomy distinguishes multi-processor

Page 29 Introduction to High Performance Computing

Page 30 Introduction to High Performance Computing

 A serial (non-parallel) computer

Page 31 Introduction to High Performance Computing

Page 32 Introduction to High Performance Computing

Page 33 Introduction to High Performance Computing

Page 34 Introduction to High Performance Computing

Page 35 Introduction to High Performance Computing

 Platforms that provide a shared data space are called

 Platforms that support messaging are also called message

Page 36 Introduction to High Performance Computing

Page 37 Introduction to High Performance Computing

Page 38 Introduction to High Performance Computing

Page 39 Introduction to High Performance Computing

Page 40 Introduction to High Performance Computing

 Instances of such a view come naturally from

Page 41 Introduction to High Performance Computing

Beowulf is a multi-computer architecture which can

Page 42 Introduction to High Performance Computing

 Shared address space platforms:

Page 43 Introduction to High Performance Computing

Page 44 Introduction to High Performance Computing

You might also like