0% found this document useful (0 votes)

52 views25 pages

Unit 1

This document provides an overview of parallel computing concepts. It discusses different types of parallelism like pipelining and superscalar execution. It also covers memory hierarchies and caches to reduce effective memory latency. Different parallel platforms are described based on their control structure like SIMD and MIMD, and communication models like shared memory and message passing. Network topologies for connecting processors like buses, crossbars, meshes and trees are outlined. Finally, the document discusses communication costs in parallel machines.

Uploaded by

rishisharma4201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views25 pages

Unit 1

Uploaded by

rishisharma4201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Unit 1

INTRODUCTION
Introduction
• The traditional logical view of a sequential computer consists of a
memory connected to a processor via a Datapath
• In this unit, we will discuss an overview of architectural concepts to
parallel processing
Implicit parallelism: Trends in Microprocessor Architectures

Pipelining and Superscalar Execution

• Processors have long relied on pipelines for improving execution rates
• The assembly-line analogy works well for understanding pipeline
• Example : Water bottle packing
• To improve instruction execution rate is to use multiple pipelines

Very long Instruction Word Processors

• The parallelism extracted by superscalar processors is often limited by the instruction
look-ahead
• An alternate concept for exploiting instruction-level parallelism used in very long
instruction word (VLIW) processors relies on the compiler to resolve dependencies and
resource availability at compile time
Example : Superscalar execution
Example : Superscalar execution
Limitations of Memory System Performance
• The effective performance of a program on a computer relies not just on the speed of the processor
but also on the ability of the memory system to feed data to the processor
• A memory system, possibly consisting of multiple levels of caches, takes in a request for a memory
word and returns a block of data of size b containing the requested word after l nanoseconds.

Improving Effective Memory Latency Using Caches

• One innovation addresses the speed mismatch by placing a smaller and faster memory between the
processor and the DRAM
• The fraction of data references satisfied by the cache is called the cache hit ratio of the computation
on the system
• The effective computation rate of many applications is bounded not by the processing rate of the
CPU, but by the rate at which data can be pumped into the CPU. Such computations are referred to
as being memory bound
Limitations of Memory System Performance
Impact of Memory Bandwidth
• Memory bandwidth refers to the rate at which data can be moved between the processor
and memory.
• One commonly used technique to improve memory bandwidth is to increase the size of
the memory blocks

Alternate Approaches for Hiding Memory Latency

• Prefetching
• Multithreading
Dichotomy of Parallel Computing Platforms
• An explicitly parallel program must specify concurrency and interaction between
concurrent sub tasks
• The two critical components of parallel computing from a programmer's perspective are
ways of expressing parallel tasks and mechanisms for specifying interaction between
these tasks
• The former is sometimes also referred to as the control structure and the latter as the
communication model
Control Structure of Parallel Program
• Parallel tasks can be specified at various levels of granularity
• At one extreme, each program in a set of programs can be viewed as one parallel task.
• At the other extreme, individual instructions within a program can be viewed as parallel
tasks
• Processing units in parallel computers either operate under the centralized control of a
single control unit or work independently
• If there is a single control unit that dispatched the same instruction to various processors
(that work on different data), the model is refers to as single instruction stream, multiple
data stream(SIMD)
• Computers in which each processing element is capable of executing a different program
independent of the other processing elements are called multiple instruction stream,
multiple data stream (MIMD) computers.
Control Structure of Parallel Program
Communication Model of Parallel Platforms
• There are two primary forms of data exchange between parallel tasks.
• Accessing a shared data space
• Exchanging messages
• Platforms that provide a shared data space are called shared-address-space machines or
multiprocessors.
• Platforms that support messaging are also called message passing platforms or
multicomputer

Shared-Address-Space Platforms

• Part (or all) of the memory is accessible to all processors

• Processors interact by modifying data objects stored in this shared-address-space
• If the time taken by a processor to access any memory word in the system global or local is
identical, the platform is classifies as a uniform memory access(UMA), else, a non-uniform
memory access(NUMA) machine
Communication Model of Parallel Platforms

NUMA and UMA Shared-address-space platforms

Communication Model of Parallel Platforms
Message-Passing Platforms
• These platforms comprise of a set of processors and their own memory
• Instances of such a view come naturally from clustered workstations and non-shared-
address-space multicomputer
• These platforms are programmed using (variants of ) send and receive primitives
• Libraries such as MPI and PVM provide such primitives
Physical Organization of Parallel Platforms
Architecture of an ideal Parallel Computer
• A natural extension of the Random access machine serial architecture is the parallel
Random access Machine, or PRAM.
• PRAM’s consists of p processors and a global memory of unbounded size that is
uniformly accessible to all processors
• Processors share a common clock but may execute different instructions in each cycle.
• Depending on how simultaneous memory accesses are handled, PRAM’s can be divided
into four subclasses.
• Exclusive-read, exclusive-write (EREW) PRAM
• Concurrent-read, exclusive-write (CREW) PRAM
• Exclusive-read, concurrent-write (ERCW) PRAM
• Concurrent-read, concurrent-write (CRCW) PRAM
Interconnection Networks for Parallel
Computers
• Interconnection networks provide mechanisms for data transfer between processing
nodes or between processors and memory modules
• Interconnection networks can be classified as static or dynamic
• Static networks consist of point-to-point communication links among processing nodes
and are also referred to as direct networks.
• Dynamic networks, on the other hand, are built using switches and communication links
and are also referred to as indirect networks
• Switched map a fixed number of inputs to outputs
Network Topologies
• A variety of network topologies have been proposed and implemented
• These topologies trade off performance for cost
• Commercial machines often implements hybrid of multiple topologies for reasons of
packaging, cost, and available components

Bus-Based Networks
• Some of the simplest and earliest parallel machines used buses
• All processors access a common bus for exchanging data
• The distance between any two nodes is O(1) in a bus. The bus also provides a convenient
broadcast media
• However, the bandwidth of the shared bus is a major bottleneck
• Typical bus based machines are limited to dozens of nodes. Sun enterprise servers and
Intel Pentium based shared- bus multiprocessors are examples of such architecture
Network Topologies
Crossbar Networks
•

Multistage Networks

• Crossbars have excellent performance scalability but poor cost scalability

• Buses have excellent cost scalability, but poor performance scalability
• Multistage interconnects strike a compromise between these extremes
Network Topologies
Completely connected network

Star Connected Networks

• Every node is connected only to a common node at the centre
• Distance between any pair of nodes is O(1). However, the central node becomes a
bottleneck
• In this sense, star connected networks are static counterparts of buses
Completely connected network and star
connected networks
Network Topologies
Linear Arrays, Meshes, and k-d Meshes
• In a linear array, each node has two neighbours, one to its left and one to its right. If the
nodes at either end are connected, we refer to it as a 1-D torus or a ring
• A generalization to 2D has nodes with 4 neighbours, to the north, south, east, and west.
• A further generalization to d dimensions has nodes with 2d neighbours
• A special case of a d-dimensional mesh is a hypercube. Here d=logp, where p is the total
number nodes
Network Topologies
Tree Based network
• The distance between any two nodes is no more than 2logp
• Links higher up the tree potentially carry more traffic than those at the lower levels
• For this reason, a variant called a fat-tree, fattens the links as we go up the tree
• Trees can be laid out in 2D with no wire crossings . This is an attractive property of trees
Communication Costs in Parallel Machines
• Along with idling and contention, communication is a major overhead in parallel
programs
• The cost of communication is dependent on a variety of features including the
programming model semantics, the network topology, data handling and routing, and
associated software protocols

Message Passing Costs in Parallel Machines

The total time to transfer a message over a network comprises of the following
• Startup time (ts): The startup time is the time required to handle a message at the
sending and receiving nodes
• Per-hop time (th): After a message leaves a node, it takes a finite amount of time to
reach the next node in its path
• Per-word transfer time (tw): If the channel bandwidth is r words per second, then each
word takes time tw = 1/r to traverse the link
Communication Costs in Parallel Machines
Store and forward routing :- In store-and-forward routing, when a message is traversing a
path with multiple links, each intermediate node on the path forwards the message to the
next node after it has received and stored the entire message

Packet Routing :- Packet routing breaks messages into packets and pipelines them through
the network. Since packets may take different paths, each packet must carry routing
information, error checking , sequencing, and other related header information

Cut- Through routing :- Takes the concept of packet routing to an extreme by further
dividing messages into basic units called flits. Since flits are typically small, the header
information must be minimized. This is done by forcing all flits to take the same path, in
sequence. A tracer message first programs all intermediate routers. All flits then take the
same route. Error checks are performed on the entire message, as opposed to flits. No
sequence numbers are needed
Impact of Process-Processor Mapping and
Mapping Techniques
When mapping a graph G(V,E) into G’(V’,E’), the following metrics are important:
• The maximum number of edges mapped onto any edge in E’ is called the congestion of
the mapping.
• The maximum number of links in E’ that any edge in E mapped onto is called the dilation
of the mapping.
• The ratio of the number of nodes in the set V’ to that in set V is called the expansion of
the mapping.
Impact of Process-Processor Mapping and
Mapping Techniques

Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Chap2 Slides Week3
No ratings yet
Chap2 Slides Week3
28 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Aca Notes
No ratings yet
Aca Notes
63 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Lec1 Introduction To Parallel Computing
No ratings yet
Lec1 Introduction To Parallel Computing
40 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Aca Notes: Scalability
No ratings yet
Aca Notes: Scalability
13 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Lecture 5 Network Topologies For Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies For Parallel Architectures - Updated
46 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
Chapter 5 - Shared Memory Multiprocessor
No ratings yet
Chapter 5 - Shared Memory Multiprocessor
96 pages
Introduction To Parallel Computing-Dr Nousheen
No ratings yet
Introduction To Parallel Computing-Dr Nousheen
43 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
No ratings yet
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
30 pages
Parallel Computing
No ratings yet
Parallel Computing
30 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
09 Communication Models of Parallel Platforms
No ratings yet
09 Communication Models of Parallel Platforms
25 pages
Taxonomy of Parallel Computing Paradigms
No ratings yet
Taxonomy of Parallel Computing Paradigms
9 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Lecture 5
No ratings yet
Lecture 5
72 pages
Unit 1
No ratings yet
Unit 1
21 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Unit 2 Cloud Computing
No ratings yet
Unit 2 Cloud Computing
19 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Distributed System
100% (1)
Distributed System
26 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Distributed Systems R19 - Unit-1
No ratings yet
Distributed Systems R19 - Unit-1
35 pages
Upd 2
No ratings yet
Upd 2
87 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
Introduction
No ratings yet
Introduction
34 pages
Large Computer Systems and Pipelining: Homework
No ratings yet
Large Computer Systems and Pipelining: Homework
11 pages
Introduction To Parallel Processing: Unit-2
No ratings yet
Introduction To Parallel Processing: Unit-2
32 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
15CS72 ACA Module1 Chapter1Final
No ratings yet
15CS72 ACA Module1 Chapter1Final
25 pages
Introduction
No ratings yet
Introduction
46 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
IGNOU MCS 231 Mobile Computing Previous Year Solved Papers
From Everand
IGNOU MCS 231 Mobile Computing Previous Year Solved Papers
Manish Soni
No ratings yet
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Planning, Negotiating, Implementing, and Managing Wide Area Networks: A Practical Guide
From Everand
Planning, Negotiating, Implementing, and Managing Wide Area Networks: A Practical Guide
Luiz Augusto de Carvalho
No ratings yet
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
Test Data Management
No ratings yet
Test Data Management
4 pages
Python & Excel Automation Cheat Sheet
No ratings yet
Python & Excel Automation Cheat Sheet
5 pages
Entries in Universal Journal in SAP: There Are Several Technical Changes in General Ledger Accounting
No ratings yet
Entries in Universal Journal in SAP: There Are Several Technical Changes in General Ledger Accounting
3 pages
Management Information Systems Managing The Digital Firm 14th Edition Laudon Test Bank Download
100% (25)
Management Information Systems Managing The Digital Firm 14th Edition Laudon Test Bank Download
28 pages
Wireless Datagram Protocol (WDP)
No ratings yet
Wireless Datagram Protocol (WDP)
3 pages
Introduction To Wonderware InTouch v10 0
No ratings yet
Introduction To Wonderware InTouch v10 0
25 pages
Ce Training Skillsoft0920
No ratings yet
Ce Training Skillsoft0920
31 pages
FFRTC Log
No ratings yet
FFRTC Log
2,333 pages
OOAD Lectures
No ratings yet
OOAD Lectures
104 pages
Project Report: Id Card Generator
No ratings yet
Project Report: Id Card Generator
37 pages
Cloud Computing
No ratings yet
Cloud Computing
3 pages
Release Notes: Arcsight Esm 6.8C
No ratings yet
Release Notes: Arcsight Esm 6.8C
32 pages
IQ Product Info IQ Overview
No ratings yet
IQ Product Info IQ Overview
4 pages
IBPS Interview Prep - Graduation Related Questions BE, BCom, BA
No ratings yet
IBPS Interview Prep - Graduation Related Questions BE, BCom, BA
6 pages
Arpita Upadhyay CV
No ratings yet
Arpita Upadhyay CV
2 pages
A Data Type Is Characterized by
No ratings yet
A Data Type Is Characterized by
3 pages
Synopsys For Library Management System Project
No ratings yet
Synopsys For Library Management System Project
16 pages
Sitecore Interview Questions
100% (2)
Sitecore Interview Questions
5 pages
Monitor Logic Apps With Azure Monitor Logs - Azure Logic Apps - Microsoft Docs
No ratings yet
Monitor Logic Apps With Azure Monitor Logs - Azure Logic Apps - Microsoft Docs
19 pages
Solaris Containers and ZFS Cheat Sheet
No ratings yet
Solaris Containers and ZFS Cheat Sheet
4 pages
L4 CURD Operation Aggregation (Answers) PDF
No ratings yet
L4 CURD Operation Aggregation (Answers) PDF
2 pages
TCPIPPB
No ratings yet
TCPIPPB
6 pages
Enterprise Resource Planning For Police Department
No ratings yet
Enterprise Resource Planning For Police Department
27 pages
Alcor Local
No ratings yet
Alcor Local
2 pages
A New Bridge Management System Based On Spatial Database and Open Source GIS
No ratings yet
A New Bridge Management System Based On Spatial Database and Open Source GIS
14 pages
SSRF Report
No ratings yet
SSRF Report
5 pages
Jay MTech 2nd Sem LAB207
No ratings yet
Jay MTech 2nd Sem LAB207
28 pages
Win The Digital Banking Race
No ratings yet
Win The Digital Banking Race
20 pages
System Administrator Interview Questions
No ratings yet
System Administrator Interview Questions
2 pages
Smart Petcare System
100% (1)
Smart Petcare System
28 pages

Unit 1

Uploaded by

Unit 1

Uploaded by

Unit 1

Pipelining and Superscalar Execution

Very long Instruction Word Processors

Improving Effective Memory Latency Using Caches

Alternate Approaches for Hiding Memory Latency

• Part (or all) of the memory is accessible to all processors

NUMA and UMA Shared-address-space platforms

• Crossbars have excellent performance scalability but poor cost scalability

Star Connected Networks

Message Passing Costs in Parallel Machines

You might also like