0% found this document useful (0 votes)
63 views78 pages

02 Lecture Flynn IN

The document discusses parallel computing, focusing on Flynn's Taxonomy, which classifies computer architectures based on instruction and data streams into SISD, SIMD, MISD, and MIMD categories. It also covers interconnection networks, highlighting static and dynamic types, and various architectures such as shared memory and message passing. Additionally, the document explores bus-based and switch-based interconnection networks, detailing their synchronization and control strategies.

Uploaded by

John Wadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views78 pages

02 Lecture Flynn IN

The document discusses parallel computing, focusing on Flynn's Taxonomy, which classifies computer architectures based on instruction and data streams into SISD, SIMD, MISD, and MIMD categories. It also covers interconnection networks, highlighting static and dynamic types, and various architectures such as shared memory and message passing. Additionally, the document explores bus-based and switch-based interconnection networks, detailing their synchronization and control strategies.

Uploaded by

John Wadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

CSE 455

Higher Performance
Computing
LECTURE 2: FLYNN’S TAXONOMY +
INTERCONNECTION NETWORKS
# Chapter Subtitle
Agenda
• Motivation for parallel computing
• Flynn’s Taxonomy
• Interconnection networks
• Dynamic Networks
• Static Networks
Motivation for Parallel Computing
• A multiprocessor is expected to reach a faster speed
than the fastest single-processor system.
• A multiprocessor is more cost-effective than a high-
performance single processor.
• If a processor fails, the remaining processors should be
able to provide continued service, albeit with degraded
performance.
Four Decades of Computing
Motivation for Parallel Computing

• One of the clear trends in computing is the substitution


of expensive and specialized parallel machines by th
more cost-effective clusters of workstations.
• A cluster is a collection of stand-alone computers
connected using some interconnection network.
• Additionally, the pervasiveness of the Internet created
interest in network computing and more recently in
grid computing. Grids are geographically distributed
platforms
# Chapter Subtitle
Agenda
• Motivation for parallel computing
• Flynn’s Taxonomy
• Interconnection networks
• Dynamic Networks
• Static Networks
Flynn’s Taxonomy
• The most popular taxonomy of computer architecture
was defined by Flynn in two types of information flow
into a processor: instructions and data.
• The instruction stream is defined as the sequence of
instructions performed by the processing unit.
• The data stream is defined as the data traffic
exchanged between the memory and the processing
unit.
Flynn’s Taxonomy

▪ According to Flynn’s classification, either of the


instruction or data streams can be single or multiple.
▪ Computer architecture can be classified into the
following four distinct categories:
▪ Single-Instruction Single-Data (SISD);
▪ Single-Instruction Multiple-Data (SIMD);
▪ Multiple-Instruction Single-Data (MISD);
▪ Multiple-Instruction Multiple-Data (MIMD).
▪ Conventional single-processor von Neumann
computers are classified as SISD systems
Flynn’s Taxonomy

SISD (SIMD)
Single instruction stream Single instruction stream
Single data stream Multiple data stream

MISD (MIMD)
Multiple instruction stream Multiple instruction stream
Single data stream Multiple data stream

Copyright © 2010, Elsevier Inc. All rights Reserved 9


Single-Instruction Single-Data (SISD)
• Conventional single-processor von Neumann
computers are classified as SISD systems.
Single Instruction Multiple Data (SIMD)
• Consists of 2 parts:
• a front-end Von Neumann computer.
• A processor array: connected to the memory bus of the front
end.
• Applies the same instruction to multiple data items.
SIMD example

n data items
control unit
n ALUs

x[1] x[2] … x[n]


ALU1 ALU2 ALUn

for (i = 0; i < n; i++)


x[i] += y[i];

Copyright © 2010, Elsevier Inc. All rights Reserved 12


SIMD ARCHITECTURE
• SIMD Scheme 1
• Each processor has its own local memory.
• Ex: The ILLIAC IV
SIMD ARCHITECTURE
• SIMD Scheme 2
• Processors and memory modules communicate with each
other via interconnection network.
• Ex: The BSP (Burroughs’ Scientific Processor)
SIMD drawbacks
◼ All ALUs are required to execute the same
instruction, or remain idle.
◼ In classic design, they must also operate
synchronously.
◼ The ALUs have no instruction storage.
◼ Efficient for large data parallel problems,
but not other types of more complex
parallel problems.

Copyright © 2010, Elsevier Inc. All rights Reserved 15


Graphics Processing Units (GPU)
◼ Real time graphics application
programming interfaces or API’s use
points, lines, and triangles to internally
represent the surface of an object.

Copyright © 2010, Elsevier Inc. All rights Reserved 16


GPUs
◼ A graphics processing pipeline converts
the internal representation into an array of
pixels that can be sent to a computer
screen.

◼ Several stages of this pipeline


(called shader functions) are
programmable.
◼ Typically just a few lines of C code.

Copyright © 2010, Elsevier Inc. All rights Reserved 17


GPUs
◼ Shader functions are also implicitly
parallel, since they can be applied to
multiple elements in the graphics stream.

◼ GPU’s can often optimize performance by


using SIMD parallelism.
◼ The current generation of GPU’s use SIMD
parallelism.
◼ Although they are not pure SIMD systems.

Copyright © 2010, Elsevier Inc. All rights Reserved 18


Multiple Instruction Multiple Data (MIMD)
• Made of multiple independent processors (own
control and ALU) and multiple memory modules
connected via some interconnection network.
• 2 broad categories:
• Shared memory
• Message passing
MIMD Architecture

Shared Memory Message Passing Architecture


MIMD Architecture – Shared Memory
• Shared Memory Organization
• Inter-processor coordination is accomplished by reading and
writing in a global memory shared by all processors.
• Typically consists of servers that communicate through a bus
and cache memory controller.
• It requires Access control, synchronization, protection, and
security.
Shared Memory Organization - Access Control

 Access control determines which process accesses are


possible to which resources.
 Access control models make the required check for
every access request issued by the processors to the
shared memory, against the contents of the access
control table.
 All disallowed access attempts and illegal processes
are blocked.
Shared Memory Organization - Synchronization

• Synchronization constraints limit the time of accesses


from sharing processes to shared resources.
• Appropriate synchronization ensures that the
information flows properly and ensures system
functionality.
Shared Memory Organization - Protection
• Sharing and protection are incompatible
• The simplest shared memory system consists of one
memory module that can be accessed from two
processors.
• Depending on the interconnection network, a shared
memory system leads to systems that can be classified
as:
• uniform memory access (UMA),
• nonuniform memory access (NUMA), and
• cache-only memory architecture (COMA).
Shared Memory Organization - UMA
• In the UMA system, a shared memory is accessible by
all processors through an interconnection network in
the same way a single processor accesses its memory.
• Therefore, all processors have equal access time to
any memory location
• The interconnection network used in the UMA can be a
single bus, multiple buses, a crossbar, or a multiport
memory.
UMA multicore system

Time to access all


the memory locations
will be the same for Figure 2.5
all the cores.
Shared Memory Organization - NUMA, and COMA

• In the NUMA system, each processor has part of the


shared memory attached.
• The memory has a single address space. Therefore,
any processor could access any memory location
directly using its real address.
• In COMA, the shared memory consists of cache
memory, and data are required to migrate to the
processor requesting it.
NUMA multicore system

A memory location a core is directly


Figure 2.6
connected to can be accessed
faster than a memory location that
must be accessed through another
chip.

28
MIMD Architecture – Message Passing
• Message Passing Organization
• Each processor has access to its own local memory. No shared
memory
• Communications are performed via send-and-receive
operations. (data copy – consistency issues)
• Message-passing multiprocessors employ a variety of static
networks in local communications.
MIMD Architecture

• Programming in the shared memory model was easier but


designing in the message passing model provided scalability.
• The distributed-shared memory (DSM) architecture began to
appear in systems like the SGI Origin2000, and others.
• In DSM, memory is physically distributed
• The architecture behaves like a shared memory machine, but a
message passing architecture lives underneath the software.
• Thus, the DSM machine is a hybrid that takes advantage of both
design schools.
Distributed Memory System

Figure 2.4

Copyright © 2010, Elsevier Inc. All rights Reserved 31


# Chapter Subtitle
Agenda
• Motivation for parallel computing
• Flynn’s Taxonomy
• Interconnection networks
• Dynamic Networks
• Static Networks
1.5 Interconnection Networks (INs)

• Mode of Operation
– Synchronous:
• a single global clock is used by all components in the system
(lock-step manner).

– Asynchronous:
• No global clock required
• Hand shaking signals are used to coordinate the operation of
asynchronous systems.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
1.5 Interconnection Networks (INs)

• Control Strategy
– Centralized: one central control unit is used to control
the operations of the components of the system.

– Decentralized: the control function is distributed


among different components in the system.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
1.5 Interconnection Networks (INs)

• Switching Techniques
– Circuit switching: a complete path has to be
established prior to the start of communication
between a source and a destination.

– Packet switching: communication between a source


and a destination takes place via messages divided
into smaller entities, called packets

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
1.5 Interconnection Networks (INs)

• Topology
– Describes how to connect processors and memories
to other processors and memories.
– Static: direct fixed links are established among nodes
to form a fixed network.
– Dynamic: connections are established when needed.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.1 Interconnection Networks Taxonomy

Interconnection Network

Static Dynamic

1-D 2-D HC Bus-based Switch-based

Single Multiple SS MS Crossbar

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
# Chapter Subtitle
Agenda
• Motivation for parallel computing
• Flynn’s Taxonomy
• Interconnection networks
• Dynamic Networks
• Static Networks
2.2 Bus-Based Dynamic Interconnection
Networks

• Single Bus Systems


– Simplest way to connect multiprocessor systems.
– The use of local caches reduces the processor-
memory traffic.
– Size of such system varies between 2 and 50
processors.
– Single bus multiprocessors are inherently limited by:
• Bandwidth of bus.
• 1 processor can access the bus.
• 1 memory access can take place at any given time.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic Interconnection
Networks

• Single Bus Systems

p1 p2 ••• pN −1 pN

Shared Memory I/O

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic
Interconnection Networks
• Multiple Bus Systems
– Several parallel buses to interconnect multiple
processors and multiple memory modules.
– Many connection schemes are possible.
– Examples:
• Multiple Bus with Full Bus – Memory Connection (MBFBMC).
• Multiple Bus with Single Bus – Memory Connection
(MBSBMC).
• Multiple Bus with Partial Bus – Memory Connection
(MBPBMC).
• Multiple Bus with Class-based Bus – Memory Connection
(MBCBMC).

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic Interconnection
Networks
• Multiple Bus Systems:
– Multiple Bus with Full Bus – Memory
Connection (MBFBMC).

P1 P2 P3 P4 P5 P6

M1 M2 M3 M4

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic
Interconnection Networks
• Multiple Bus Systems:
– Multiple Bus with Single Bus – Memory
Connection (MBSBMC).

P1 P2 P3 P4 P5 P6

M1 M2 M3 M4

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic
Interconnection Networks
• Multiple Bus Systems:
– Multiple Bus with Partial Bus – Memory
Connection (MBPBMC).

P1 P2 P3 P4 P5 P6

M1 M2 M3 M4

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic
Interconnection Networks
• Multiple Bus Systems:
– Multiple Bus with Class-based Memory
Connection (MBCBMC).

P1 P2 P3 P4 P5 P6

M1 M2 M3 M4 M5 M6

Class 1 Class 2 Class 3


Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND
Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic
Interconnection Networks
• Bus Synchronization
– A bus can be synchronous:
• Time for any transaction is known in advance.
– A bus can be asynchronous:
• Depends on the availability of data and readiness of devices
to initiate bus transactions.
– Bus arbitration logic is required to resolve bus
contention when more than 1 processor compete to
access the bus in single bus multiprocessor.
• Process of passing mastership from 1 processor to another is
called handshaking
– Requires a bus request and a bus grant.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic
Interconnection Networks
• Bus
Synchronization

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.2 Bus-Based Dynamic
Interconnection Networks
• Bus Synchronization
– Bus arbitration logic uses a a predefined
priority scheme:
• Random
• Simple rotating
• Equal priority
• Least Recently Used (LRU)

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
– Crossbar Networks
• Provide simultaneous connections among all its
inputs and all its outputs.
• A Switching Element (SE) is at the intersection of
any 2 lines extended horizontally or vertically
inside the switch.
• It is a non-blocking network allowing multiple input-
output connection pattern to be achieved
simultaneously.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
– Crossbar Networks
M1 M2 M3 M4 M5 M6 M7 M8
P1
P2
P3
P4
P5
P6
P7
P8

Straight Switch Setting Diagonal Switch Setting

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
Figure 2.7

(a)
A crossbar switch connecting 4 processors
(Pi) and 4 memory modules (Mj)

(b)
Configuration of internal switches in a
crossbar

(c) Simultaneous memory accesses


by the processors

Copyright © 2010, Elsevier Inc. All rights Reserved 51


2.3 Switch-Based Interconnection
Networks
• Single-Stage Networks
– A single stage of SE exists between the inputs
and outputs of the network.
– Possible settings of a 2x2 SE are:

Straight Exchange Upper-broadcast Lower-broadcast

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Single-Stage Networks
– A well-known connection pattern is the
Shuffle–Exchange.
– These can be defined using an m bit-wise
address pattern of the inputs, pm-1pm-2 . . .
p1p0, as follows:

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Multistage Interconnection Networks
(MINs)
– A MIN consists of a number of stages each
consisting of a set of 2x2 SEs.
– Stages are connected to each other using
Inter-Stage Connection (ISC) pattern.
– MINs provide a number of simultaneous
paths between the processors and the
memory modules.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Multistage Interconnection Networks
(MINs)
• In MINs the routing of a message from a
given source to a given destination is based
on the destination address (self-routing).
• The destination address bits are scanned
and the stages accordingly.
• If the bit in the destination address is 0, then
the message is routed to the upper output of
the switch. If the bit is 1, the message is
routed to the lower output of the switch.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Multistage Networks (MINs)

ISC 1 ISC
x-1

Switches Switches Switches


(Stage 1) (Stage 2) (Stage x)

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Shuffle-Exchange Network
Construct: Shuffle
function,
P5 (101) → 3 (011) → 6
(110)
Self-routing:
left-to-right of the
destination address
e.g., From 101 to 011

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Banyan Network
Self-routing:
right-to-left of the
destination address

For example:
From 101 to 011

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Omega Network
inputs to each stage follow the shuffle interconnection pattern.
Construct: Shuffle
function,
P5 (101) → 3 (011) → 6
(110) → 5 (101)

Routing: use destination


bit for each stage
From 101 → 011

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Blockage in Multistage Interconnection
Networks
– Blocking networks:
• when an interconnection between a pair of
input/output is currently established, the arrival of a
request for a new interconnection between 2
arbitrary unused input and output may or may not
be possible.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Shuffle-Exchange Network
When 101 to 011 established
- 001 to 010 not possible
- 100 to 110 possible

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Blockage in Multistage Interconnection
Networks
– Rearrangeable networks:
• Always possible to rearrange already established
connections in order to make allowance for other
connections to be established simultaneously

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Blockage in Multistage Interconnection
Networks
– Rearrangeable networks
000 000
001 001
010 010
011 011
100 100
101 101
110 110
111 111

000 000
001 001
010 010
011 011
100 100
101 101
110 110
111 111

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
• Blockage in Multistage Interconnection
Networks
– Non-blocking networks:
• In presence of a currently established connection
between any pair of input/output, it is always
possible to establish a connection between any
arbitrary unused pair of input/output.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.3 Switch-Based Interconnection
Networks
000
001

010
011

100
101

110
111
Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND
Mostafa Abd-El-Barr PARALLEL PROCESSING
2.5 Analysis and Performance Metrics

• Dynamic Networks
• The network cost is the number of switching points
• The delay (latency) measured in terms of the amount of the
input to output delay
• Nonblocking network allows multiple output connection
patterns (permutation) to be achieved
• A fault-tolerant system can be simply defined as a system that
can still function even in the presence of faulty components
inside the system.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.5 Analysis and Performance
Metrics
• Dynamic Networks
Networks Delay Cost Blockin Degree of FT
g
Bus O(N) O(1) Yes 0

Multiple-bus O(mN) O(m) Yes (m-1)

MIN O(logN) O(NlogN) Yes 0

Crossbar O(1) O(N2) No 0

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
# Chapter Subtitle
Agenda
• Motivation for parallel computing
• Flynn’s Taxonomy
• Interconnection networks
• Dynamic Networks
• Static Networks
2.4 Static Interconnection Networks
• Have fixed paths, unidirectional or bi-
directional, between processors.
• Types:
– Completely connected networks (CCN):
Number of links: O(N2), delay complexity:
O(1).
1 2

6 3 completely connected
network.

5 4

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.4 Static Interconnection Networks

– Limited Connection Networks:


• Linear arrays
• Ring (Loop) networks
• Two-dimensional arrays
• Tree networks
• Cube network

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.4 Static Interconnection Networks

Linear arrays

Ring (Loop) networks

Tree networks
Two-dimensional arrays

Cube network

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.4 Static Interconnection Networks
– Cube Connected Networks:
• Patterned after the n-cube structure
• In an n-cube, every processor is connected to n others
• vertices are connected iff the binary representation of
their addresses differ by one and only one bit
Ex: a 4-cube:

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.4 Static Interconnection Networks
– Cube Connected Networks:
• The route at node i and destined for node j can be found by
XOR-ing the binary address representation of i and j.
• If the XOR-ing operation results in a 1 in a given bit position,
then the message has to be sent along the link that spans the
corresponding dimension.
• E.g.: S 0101 → D 1011 : XOR = 1110
• Send to 0111, 0001, 1101

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.4 Static Interconnection Networks

– Mesh Connected Networks:

D
D

S S

Example 3X3X2 mesh network

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.5 Analysis and Performance Metrics

• Static Networks
• Degree of a node, d, is defined as the number of channels
incident on the node.
• Diameter, D, of a network having N nodes is defined as the
longest path, p, of the shortest paths between any two nodes
• A network is said to be symmetric if it is isomorphic to itself
with any node labeled as the origin
• Cost means the total number of links in the network

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.5 Analysis and Performance
Metrics
• Static Networks

Networks Degree Diameter (D) Cost (No. Symmetry Worst


(d) of links) Delay
CCNs N-1 1 N(N-1)/2 Yes 1
Linear array 2 N -1 N -1 No N
Binary tree 3 2(Log2 N-1) N -1 No Log2 N
n-cube Log2 N Log2 N nN/2 Yes Log2 N
2D-mesh 4 2(n-1) 2(N-n) No N
K-ary n-cube 2n N/k/2 nxN Yes k x log2 N

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
2.6 Summary

• Different topologies used for interconnecting


multiprocessors were discussed.
• Taxonomy for interconnection networks based
on their topology is introduced.
• Dynamic and static interconnection schemes
have been studied.
• A number of basic performance aspects related
to both dynamic and static interconnection
networks have been introduced.

Hesham El-Rewini & ADVANCED COMPUTER ARCHITECTURE AND


Mostafa Abd-El-Barr PARALLEL PROCESSING
References
• Hesham El-Rewini and Mostafa Abd-El-Barr, “Advanced
Computer Architecture and Parallel Processing”,
chapters 1 and 2

• Peter S. Pacheco, Matthew Malensek “An Introduction


to Parallel Programming” 2nd ed, chapter 2

You might also like