0% found this document useful (0 votes)

26 views8 pages

HPC Unit 1 Solution

The document outlines the syllabus for a High Performance Computing course at Shree Ramchandra College of Engineering, focusing on parallel computing concepts such as SIMD and MIMD architectures, memory latency and bandwidth, and message passing costs. It emphasizes the importance of understanding different computing architectures and their applications, as well as the challenges associated with developing parallel software. Additionally, it discusses the implications of uniform and non-uniform memory access in shared-address-space architectures.

Uploaded by

Msdian 7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views8 pages

HPC Unit 1 Solution

Uploaded by

Msdian 7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

SRES’s

SHREE RAMCHANDRA COLLEGE OF ENGINEERING

Department of Computer Engineering
Lonikand, Pune – 412216

Ref. №: SRCOE/COMP /2023-24 Date: 28/01/24

Class: BE Subject: High Performance Computing.

Academic Year: 2023-2024 Semester: II

UNIT 1
Introduction to Parallel Computing
Q.1 What are the applications of Parallel Computing? April
Traditionally, software has been written for serial computation: To be run on a single 2023[6]
computer having a single Central Processing Unit (CPU).A problem is broken into a
discrete series of instructions. Instructions are executed one after another. Only one
instruction may execute at any moment in time. In the simplest sense, parallel computing is
the simultaneous use of multiple compute resources to solve a computational problem. To
be run using multiple CPUs A problem is broken into discrete parts that can be solved
concurrently Each part is further broken down to a series of instructions .Instructions from
each part execute simultaneously on different CPUs .

Motivating Parallelism:
• Development of parallel software has traditionally been thought of as time and

1
Shree Ramchandra College of Engineering, Lonikand, Pune
effort intensive.

• This can be largely attributed to the inherent complexity of specifying and

coordinating concurrent tasks, a lack of portable algorithms, standardized
environments, and software development toolkits.

1. The Computational Power Argument – from Transistors to FLOPS

In 1965, Gordon Moore made the following simple observation:"The complexity for
minimum component costs has increased at a rate of roughly a factor of two per year.
Certainly over the short term this rate can be expected to continue, if not to increase. Over
the longer term, the rate of increase is a bit more uncertain, although there is no reason to
believe it will not remain nearly constant for at least 10 years. That means by 1975, the
number of components per integrated circuit for minimum cost will be 65,000."
2. The Memory/Disk Speed Argument

The overall speed of computation is determined not just by the speed of the processor, but
also by the ability of the memory system to feed data to it. While clock rates of high-end
processors have increased at roughly 40% per year over the past decade, DRAM access
times have only improved at the rate of roughly 10% per year over this interval.
• The overall performance of the memory system is determined by the fraction of the
total memory requests that can be satisfied from the cache

3. The Data Communication Argument

In many applications there are constraints on the location of data and/or resources across
the Internet. An example of such an application is mining of large commercial datasets
distributed over a relatively low bandwidth network. In such applications, even if the
computing power is available to accomplish the required task without resorting to parallel
computing, it is infeasible to collect the data at a central location. In these cases, the
motivation for parallelism comes not just from the need for computing resources but also
from the infeasibility or undesirability of alternate (centralized) approaches.
Q.2 Explain with suitable diagram SIMD architecture. April
2023[4]

SIMD stands for 'Single Instruction and Multiple Data Stream'. It represents an

2
Shree Ramchandra College of Engineering, Lonikand, Pune
organization that includes many processing units under the supervision of a
common control unit. All processors receive the same instruction from the control
unit but operate on different items of data. The shared memory unit must contain
multiple modules so that it can communicate with all the processors simultaneously.

SIMD is short for Single Instruction/Multiple Data, while the term SIMD
operations refers to a computing method that enables processing of multiple data
with a single instruction. In contrast, the conventional sequential approach using
one instruction to process each individual data is called scalar operations.

The current era of SIMD processors grew out of the desktop-computer market
rather than the supercomputer market. As desktop processors became powerful
enough to support real-time gaming and audio/video processing during the 1990s,
demand grew for this particular type of computing power, and microprocessor
vendors turned to SIMD to meet the demand.[
Some of the earliest parallel computers such as the Illiac IV,MPP, DAP, CM-2, and
MasPar MP-1 belonged to this class of machines.
Variants of this concept have found use in co-processing units such as the MMX
units in Intel processors and DSP chips such as the Sharc. SIMD relies on the
regular structure of computations (such as those in image processing).It is often
necessary to selectively turn off operations on certain data items. For this reason,
most SIMD programming paradigms allow for an “activity mask”, which
determines if a processor should participate in a computation or not.

Q.3 Explain with suitable diagram SIMD architecture. April

2023[4]

MIMD stands for 'Multiple Instruction and Multiple Data Stream'.

In computing, multiple instruction, multiple data (MIMD) is a technique employed

to achieve parallelism. Machines using MIMD have a number of processors.

3
Shree Ramchandra College of Engineering, Lonikand, Pune
In this organization, all processors in a parallel computer can execute different
instructions and operate on various data at the same time. In MIMD, each processor
has a separate program and an instruction stream is generated from each program.
Which is the most fundamental and well-known type of parallel processor. It is a
technique used to achieve parallelism. The shared memory programming paradigm
and the distributed memory programming model are used in the MIMD
architecture. Every model has its own set of benefits and drawbacks. MIMD is an
abbreviation for Multiple Instruction and Multiple Data Stream. All processors of a
parallel computer may execute distinct instructions and act on different data at the
same time in this organization. Each processor in MIMD has its own program, and
each program generates an instruction stream.

It includes parallel architectures are made of multiple processors and multiple

memory modules linked via some interconnection network. They fall into two
broad types including shared memory or message passing. A shared memory
system generally accomplishes interprocessor coordination through a global
memory shared by all processors. These are frequently server systems that
communicate through a bus and cache memory controller. The bus/ cache
architecture alleviates the need for expensive multi-ported memories and interface
circuitry as well as the need to adopt a message-passing paradigm when developing
application software. Because access to shared memory is balanced, these systems
are also called SMP (symmetric multiprocessor) systems. Each processor has an
equal opportunity to read/write to memory, including equal access speed.

• SIMD-MIMD Comparison

• SIMD computers require less hardware than MIMD computers (single control
unit).

• However, since SIMD processors are specially designed, they tend to be

expensive and have long design cycles.

• Not all applications are naturally suited to SIMD processors.

• In contrast, platforms supporting the SPMD paradigm can be built from

inexpensive off-the-shelf components with relatively little effort in a short amount
of time.

Q.4 Explain the impact of Memory Latency and Memory Bandwidth on system April
performance. 2023[5]
It is very important to understand the difference between latency and bandwidth.
Consider the example of a fire-hose. If the water comes out of the hose two seconds
after the hydrant is turned on, the latency of the system is two seconds. Once the
water starts flowing, if the hydrant delivers water at the rate of 5 gallons/second, the

4
Shree Ramchandra College of Engineering, Lonikand, Pune
bandwidth of the system is 5 gallons/second.

• If you want immediate response from the hydrant, it is important to reduce

latency. If you want to fight big fires, you want high bandwidth.

• Memory Latency: An Example

Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a

latency of 100 ns (no caches). Assume that the processor has two multiply-add units
and is capable of executing four instructions in each cycle of 1 ns. The following
observations follow:

The peak processor rating is 4 GFLOPS. Since the memory latency is equal to 100
cycles and block size is one word, every time a memory request is made, the
processor must wait 100 cycles before it can process the data. On the above
architecture, consider the problem of computing a dot-product of two vectors.

A dot-product computation performs one multiply-add on a single pair of vector

elements, i.e., each floating point operation requires one data fetch. It follows that
the peak speed of this computation is limited to one floating point operation every
100 ns, or a speed of 10 MFLOPS, a very small fraction of the peak processor
rating.

• Improving Effective Memory Latency Using Caches

Caches are small and fast memory elements between the processor and DRAM.
This memory acts as low-latency high-bandwidth storage. If a piece of data is
repeatedly used, the effective latency of this memory system can be reduced by the
cache. The fraction of data references satisfied by the cache is called the cache hit
ratio of the computation on the system.Cache hit ratio achieved by a code on a
memory system often.

• Impact of Memory Bandwidth.

Memory bandwidth is determined by the bandwidth of the memory bus as well as

the memory units. Memory bandwidth can be improved by increasing the size of
memory blocks. The underlying system takes l time units (where l is the latency of
the system) to deliver b units of data (where b is the block size).

Q.5 Explain Message Passing Costs in Parallel Computers in parallel machines. April
2023[6]
These platforms comprise of a set of processors and their own (exclusive) memory.
Instances of such a view come naturally from clustered workstations and non-
shared-address-space multicomputer. These platforms are programmed using
(variants of) send and receive primitives. Shared address space platforms can easily

5
Shree Ramchandra College of Engineering, Lonikand, Pune
emulate message passing. The reverse is more difficult to do (in an efficient
manner).

The time taken to communicate a message between two nodes in a network is the
sum of the time to prepare a message for transmission and the time taken by the
message to traverse the network to its destination.

1. Startup time (ts): The startup time is the time required to handle a message
at the sending and receiving nodes. This includes the time to prepare the
message (adding header, trailer, and error correction information), the time
to execute the routing algorithm, and the time to establish an interface
between the local node and the router. This delay is incurred only once for a
single message transfer.
2. Per-hop time (th): After a message leaves a node, it takes a finite amount of
time to reach the next node in its path. The time taken by the header of a
message to travel between two directly-connected nodes in the network is
called the per-hop time. It is also known as node latency. The per-hop time
is directly related to the latency within the routing switch for determining
which output buffer or channel the message should be forwarded to.
3. Per-word transfer time (tw): If the channel bandwidth is r words per second,
then each word takes time tw = 1/r to traverse the link. This time is called the
per-word transfer time. This time includes network as well as buffering
overheads.

Store-and-Forward Routing

6
Shree Ramchandra College of Engineering, Lonikand, Pune
In store-and-forward routing, when a message is traversing a path with multiple
links, each intermediate node on the path forwards the message to the next node
after it has received and stored the entire message. And shows the communication
of a message through a store-and-forward network.

Packet Routing

Store-and-forward routing makes poor use of communication resources. A message

is sent from one node to the next only after the entire message has been received. in
which the original message is broken into two equal sized parts before it is sent. In
this case, an intermediate node waits for only half of the original message to arrive
before passing it on. The increased utilization of communication resources and
reduced communication time is apparent from goes a step further and breaks the
message into four parts.

Cut-Through Routing

In interconnection networks for parallel computers, additional restrictions can be

imposed on message transfers to further reduce the overheads associated with
packet switching. By forcing all packets to take the same path, we can eliminate the
overhead of transmitting routing information with each packet. By forcing in-
sequence delivery, sequencing information can be eliminated. By associating error
information at message level rather than packet level, the overhead associated with
error detection and correction can be reduced. Finally, since error rates in
interconnection networks for parallel machines are extremely low, lean error
detection mechanisms can be used instead of expensive error correction schemes.

Q.5 Describe Uniform-Memory-Access and Non-Uniform-Memory-Access with April

diagrammatic representation. 2023[6]

Typical shared-address-space architectures: (a) Uniform-memory-access shared-

address-space computer (b) Uniform-memory-access shared-address-space

7
Shree Ramchandra College of Engineering, Lonikand, Pune
computer with caches and memories (c) Non-uniform-memory-access shared-
address-space computer with local memory only.

Part of the memory is accessible to all processors. Processors interact by modifying

data objects stored in this shared-address-space. If the time taken by a processor to
access any memory word in the system global or local is identical, the platform is
classified as a uniform memory access (UMA), else, a non- uniform memory access
(NUMA) machine. His distinction between NUMA and UMA platforms is
important from the point of view of algorithm design. NUMA machines require
locality from underlying algorithms for performance. Programming these platforms
is easier since reads and writes are implicitly visible to other processors. However,
read-write data to shared data must be coordinated (This will be discussed in
greater detail when we talk about threads programming).
• Caches in such machines require coordinated access to multiple copies. This leads
to the cache coherence problem. A weaker model of these machines provides an
address map, but not coordinated access. These models are called non cache
coherent shared address space machines.

8
Shree Ramchandra College of Engineering, Lonikand, Pune

17.8.2 Packet Tracer - Skills Integration Challenge
No ratings yet
17.8.2 Packet Tracer - Skills Integration Challenge
3 pages
Logisim Technical Manual
No ratings yet
Logisim Technical Manual
15 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
CS802A Lec-2 PDF
No ratings yet
CS802A Lec-2 PDF
28 pages
Unit 1
No ratings yet
Unit 1
22 pages
Cloud Computing Lecture3
No ratings yet
Cloud Computing Lecture3
50 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
Parallel Processing
No ratings yet
Parallel Processing
22 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
Cloud Computing: Mr. Ajay B. Kapase
No ratings yet
Cloud Computing: Mr. Ajay B. Kapase
20 pages
Parallel
No ratings yet
Parallel
5 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
28 pages
Parallel 123
No ratings yet
Parallel 123
28 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Pda 2
No ratings yet
Pda 2
105 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Module 4 - Architecture
No ratings yet
Module 4 - Architecture
22 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Unit2 A
No ratings yet
Unit2 A
70 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Advanced Computer Architecture: The Architecture of Parallel Computers
No ratings yet
Advanced Computer Architecture: The Architecture of Parallel Computers
44 pages
Advanced Computer Architecture: The Architecture of Parallel Computers
No ratings yet
Advanced Computer Architecture: The Architecture of Parallel Computers
44 pages
HPC Insem 2024 FlyHigh Services
No ratings yet
HPC Insem 2024 FlyHigh Services
10 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Parallel Processing Report
No ratings yet
Parallel Processing Report
9 pages
I Notes
No ratings yet
I Notes
27 pages
ACA Assignment 4
No ratings yet
ACA Assignment 4
16 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
IJARCCE6G S Prabhudev Parallel PDF
No ratings yet
IJARCCE6G S Prabhudev Parallel PDF
4 pages
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
No ratings yet
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
4 pages
Module - 4 - Parallel Processing
No ratings yet
Module - 4 - Parallel Processing
32 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Computer Architecture Ebook
No ratings yet
Computer Architecture Ebook
443 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Lec1 Introduction To Parallel Computing
No ratings yet
Lec1 Introduction To Parallel Computing
40 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
CS0051 - M1-Parallel Computing Hardware
No ratings yet
CS0051 - M1-Parallel Computing Hardware
36 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Multi-Core Processors: Page 1 of 25
No ratings yet
Multi-Core Processors: Page 1 of 25
25 pages
20ai503 U1 LP3 22-23
No ratings yet
20ai503 U1 LP3 22-23
13 pages
Flynns
No ratings yet
Flynns
41 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Ppt1 Lecture 1 Distributed and Parallel Computing CSE423
No ratings yet
Ppt1 Lecture 1 Distributed and Parallel Computing CSE423
24 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Unit 4 COA
No ratings yet
Unit 4 COA
8 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Embedded Systems Programming with C++: Real-World Techniques
From Everand
Embedded Systems Programming with C++: Real-World Techniques
Robert Johnson
No ratings yet
Embedded Systems Programming with C: Writing Code for Microcontrollers
From Everand
Embedded Systems Programming with C: Writing Code for Microcontrollers
Larry Jones
No ratings yet
Steps To Be Followed To Rename or Move Control Files Location
No ratings yet
Steps To Be Followed To Rename or Move Control Files Location
8 pages
Supervivi Linux-WinCE Guide
No ratings yet
Supervivi Linux-WinCE Guide
15 pages
Naveena Guntupalli Resume
No ratings yet
Naveena Guntupalli Resume
6 pages
Modern Plumbing Illustrated
No ratings yet
Modern Plumbing Illustrated
458 pages
FPGA: Selectable ALTERA, XILINX Device Modules and Various Application Modules
No ratings yet
FPGA: Selectable ALTERA, XILINX Device Modules and Various Application Modules
4 pages
3 CCTV System History Part 1
No ratings yet
3 CCTV System History Part 1
9 pages
Manual RX-A3070 2070
No ratings yet
Manual RX-A3070 2070
209 pages
Readme Get Start
No ratings yet
Readme Get Start
4 pages
ConRoe1333-D667 R1.0 MultiQIG
No ratings yet
ConRoe1333-D667 R1.0 MultiQIG
99 pages
Stream Control Transmission Protocol (SCTP) : TCP/IP Protocol Suite
No ratings yet
Stream Control Transmission Protocol (SCTP) : TCP/IP Protocol Suite
88 pages
A Challenge Called Boot Time - Witekio
No ratings yet
A Challenge Called Boot Time - Witekio
5 pages
Project SRS Document
33% (3)
Project SRS Document
54 pages
Ground Branch Key Board/Mouse and Controller Settings: +axis Config
No ratings yet
Ground Branch Key Board/Mouse and Controller Settings: +axis Config
2 pages
EE204 Digital Electrncs N Logic Design
No ratings yet
EE204 Digital Electrncs N Logic Design
3 pages
Huawei Ar1200 Nat Configuration PDF
100% (1)
Huawei Ar1200 Nat Configuration PDF
4 pages
Oracle Solaris Cluster Cheat Sheet: Status
No ratings yet
Oracle Solaris Cluster Cheat Sheet: Status
2 pages
Basics of Information Technology
0% (1)
Basics of Information Technology
61 pages
Quotation: PT - Focus Central Securindo
No ratings yet
Quotation: PT - Focus Central Securindo
2 pages
Manual StandardCPU
50% (2)
Manual StandardCPU
26 pages
Cisco Tac
No ratings yet
Cisco Tac
102 pages
SSS Did
No ratings yet
SSS Did
11 pages
LSMW With Rfbibl00
0% (1)
LSMW With Rfbibl00
14 pages
Search and Classification TREX
No ratings yet
Search and Classification TREX
13 pages
Lirik Lagu Night Changes
No ratings yet
Lirik Lagu Night Changes
137 pages
Akira CT 14cps5cpt Ete 2
100% (1)
Akira CT 14cps5cpt Ete 2
44 pages
Multi Threading in Java
No ratings yet
Multi Threading in Java
63 pages
Answer
No ratings yet
Answer
2 pages
Tutorial 26. Parallel Processing
No ratings yet
Tutorial 26. Parallel Processing
18 pages

HPC Unit 1 Solution

Uploaded by

HPC Unit 1 Solution

Uploaded by

SRES’s

SHREE RAMCHANDRA COLLEGE OF ENGINEERING

Ref. №: SRCOE/COMP /2023-24 Date: 28/01/24

Class: BE Subject: High Performance Computing.

• This can be largely attributed to the inherent complexity of specifying and

1. The Computational Power Argument – from Transistors to FLOPS

3. The Data Communication Argument

Q.3 Explain with suitable diagram SIMD architecture. April

MIMD stands for 'Multiple Instruction and Multiple Data Stream'.

In computing, multiple instruction, multiple data (MIMD) is a technique employed

It includes parallel architectures are made of multiple processors and multiple

• However, since SIMD processors are specially designed, they tend to be

• Not all applications are naturally suited to SIMD processors.

• In contrast, platforms supporting the SPMD paradigm can be built from

• If you want immediate response from the hydrant, it is important to reduce

• Memory Latency: An Example

Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a

A dot-product computation performs one multiply-add on a single pair of vector

• Improving Effective Memory Latency Using Caches

• Impact of Memory Bandwidth.

Memory bandwidth is determined by the bandwidth of the memory bus as well as

Store-and-forward routing makes poor use of communication resources. A message

In interconnection networks for parallel computers, additional restrictions can be

Q.5 Describe Uniform-Memory-Access and Non-Uniform-Memory-Access with April

Typical shared-address-space architectures: (a) Uniform-memory-access shared-

Part of the memory is accessible to all processors. Processors interact by modifying

You might also like