0% found this document useful (0 votes)
11 views57 pages

4 - Interconnection Networks

The document discusses various types of interconnection networks used in multiprocessors and multicomputers, including bus-based, switched, and multistage networks. It highlights the challenges of cache coherency, performance scalability, and the architecture of different network topologies such as crossbar, omega, and hypercube. Additionally, it evaluates the properties of these networks, including diameter, bisection bandwidth, and cost, providing examples to illustrate their performance characteristics.

Uploaded by

srujansaireddy.p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views57 pages

4 - Interconnection Networks

The document discusses various types of interconnection networks used in multiprocessors and multicomputers, including bus-based, switched, and multistage networks. It highlights the challenges of cache coherency, performance scalability, and the architecture of different network topologies such as crossbar, omega, and hypercube. Additionally, it evaluates the properties of these networks, including diameter, bisection bandwidth, and cost, providing examples to illustrate their performance characteristics.

Uploaded by

srujansaireddy.p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Communication /

Interconnection Networks

1
2
Bus based Multiprocessors

• Data transmitted in the form of packets


• Cache coherency (difficult to implement)
• A simple configuration is to have a high speed backplane or motherboard into
which CPU or memory cards can be inserted. A typical bus has 32 or 64
address lines, 32 or 64 data lines, and perhaps 32 or more control lines, all of
which operate in parallel.
• The problem with this scheme is that with as few as 4 or 5 CPUs , the bus
will usually be overloaded and performance will drop drastically. The
solution is to add the high speed cache memory between the CPUs and the
bus as shown in the fig.
3
Bus based Multiprocessors

• Cache coherency (difficult to implement)


• One of the solution for it is to implement write-through cache and
snoopy cache.
• In write-through cache, cache memories are designed so that whenever
a word is written to the cache, it is written through to memory as well.
• In addition, all caches constantly monitor the bus. Whenever a cache
sees a write occurring to a memory address present in its cache, it
either removes that entry from its cache or updates the cache entry
with the new value. Such a cache is called snoopy cache. because it is
always snooping on the bus.
• Using it, it is possible to put about 32 or possibly 64 CPUs on a single bus.4
Switched Multiprocessors
• To build a multiprocessor with more than 64 processors, a different
method is needed to connect the CPUs with the memory.

• Two switching techniques are employed for it.


a. Crossbar Switch
b. Omega Switch

5
Switched Multiprocessors

6
Switched Multiprocessors
• Memory is divided into the modules and
are connected to the CPUs with the
crossbar switch.
• Each CPU and each memory has a
connection coming out of it, as shown.
• At every intersection is a tiny electronic
crosspoint switch that can be opened
and closed in hardware.
• When a CPU wants to access a
particular memory, the crosspoint switch
connecting them is closed, to allow the
access to take place.
• If two CPUs try to access the same
memory simultaneously, one of them will
have to wait.
• The downside of the crossbar switch is that with n CPUs and n memories,
n2 crosspoint switches are needed. For large n this number can be
prohibitive. 7
Switched Multiprocessors
• It contains 2x2 switches, each having
two inputs and two outputs.

• Each switch can route either input to


either output. With proper setting of the
switches, every CPU can access every
memory.

• In general case, with n CPUs and n


memories, the omega network requires
log2n switching stages, each containing
n/2 switches, for a total of (n log2n)/2
switches.

8
Bus based Multicomputers

Fig. A multicomputer consisting of workstations on a LAN.

CPUs are connected to each other via LAN (10 or 100Mbps).

• Limitation – slow as no. of computers increase


9
Switched Multicomputers

• Switched multicomputers do not have a single bus over which all traffic goes.
Instead, they have a collection of point-to-point connections.
• A grid is easy to understand and easy to lay out on a printed circuit board
or chip. This architecture is best suited to problems that are two
dimensional in nature (graph theory, vision,etc.)
• Another design is a hypercube which is an n-dimensional cube. One can
imagine a 4-dimensional hypercube as a pair of ordinary cubes with the
corresponding vertices connected, as shown in fig (b).
• Similarly, a 5-dimensional hypercube can be represented as two copies of
10
Fig.(b), with the corresponding vertices connected, and so on.
Example 1
• A multicomputer with 256 CPUs is organized
as a 16 × 16 grid. What is the worst-case delay
(in hops) that a message might have to take?

• A: Assuming that routing is optimal, the


longest optimal route is from one corner of
the grid to the opposite corner. The length
of this route is 30 hops. If the end
processors in a single row or column are
connected to each other, the length
becomes 15. 11
Example 1

12
Example 2
• Now consider a 256-CPU hypercube. What
is the worst-case delay here, again in hops?
• A: With a 256-CPU hypercube, each node has
a binary address, from 00000000 to
11111111. A hop from one machine to
another always involves changing a single bit
in the address. Thus from 00000000 to
00000001 is one hop. From there to 00000011
is another hop. In all, eight hops are needed.

13
Interconnection Networks
for Parallel Computers
• Interconnection networks carry data
between processors and to memory.
• Interconnects are made of switches and
links (wires, fiber).
• Interconnects are classified as static or dynamic.
• Static networks consist of point-to-point
communication links among processing nodes
and are also referred to as direct networks.
• Dynamic networks are built using switches and
communication links. Dynamic networks are also
referred to as indirect networks.
14
Static and Dynamic
Interconnection Networks

Classification of interconnection networks: (a) a static network;


and (b) a dynamic network.

15
Static and Dynamic
Interconnection Networks
Properties of a Topology/Network
Bisection Bandwidth
-Often used to describe network performance
-Cut network in half and sum bandwidth of links
severed
-(Min # channels spanning two halves) * (BW of
each channel)
-Meaningful only for recursive topologies
-Can be misleading, because does not account for
switch and routing efficiency

17
Properties of a Topology/Network
Bisection Bandwidth

18
Many Topology Examples
• Bus
• Crossbar
• Ring
• Tree
• Omega
• Hypercube
• Mesh
• Torus
• Butterfly
• …

19
Buses

Bus-based interconnects (a) with no local caches; (b) with local


memory/caches.

Since much of the data accessed by processors is local to the


processor, a local memory can improve the performance of bus-
based machines. 20
21
Crossbars
A crossbar network uses an p×m grid of switches to connect
p inputs to m outputs in a non-blocking manner.

A completely non-blocking crossbar network connecting p processors to b


memory banks.
22
23
24
Multistage Networks
• Crossbars have excellent performance
scalability but poor cost scalability.
• Buses have excellent cost scalability, but poor
performance scalability.
• Multistage interconnects strike a compromise
between these extremes.

25
Multistage Networks

The schematic of a typical multistage interconnection network.

26
Multistage Networks
• One of the most commonly used
multistage interconnects is the Omega
network.
• This network consists of log p stages,
where p is the number of inputs/outputs.
• At each stage, input i is connected to
output j if:

27
Multistage Omega Networks
Each stage of the Omega network implements a perfect
shuffle as follows:

A perfect shuffle interconnection for eight inputs and outputs.

28
Multistage Omega Network
• The perfect shuffle patterns are connected
using 2×2 switches.
• The switches operate in two modes – crossover
or passthrough.

Two switching configurations of the 2 × 2 switch:


(a) Pass-through; (b) Cross-over.

29
Multistage Omega Network
A complete Omega network with the perfect shuffle
interconnects and switches can now be illustrated:

A complete omega network connecting eight inputs and eight outputs.

An omega network has p/2 × log p switching nodes, and


the cost of such a network grows as (p log p).

30
Multistage Omega Network – Routing
• Let s be the binary representation of the source
and d be that of the destination processor.
• The data traverses the link to the first switching
node. If the most significant bits of s and d are
the same, then the data is routed in pass-through
mode by the switch else, it switches to crossover.
• This process is repeated for each of the log p
switching stages.
• Note that this is not a non-blocking switch.

31
Multistage Omega Network – Routing

An example of blocking in omega network: one of the messages


(010 to 111 or 110 to 100) is blocked at link AB.

Every switch may need a buffer, else the data for one transmission will be lost
32
Example 3
• A multiprocessor has 1024 100-MIPS CPUs
connected to memory by an omega network.
How fast do the switches have to be to allow a
request to go to memory and back in one
instruction time?

• Solution: 5120 switches and each switch


should take 0.5 nano sec to open or close

33
Example 4
• A multiprocessor has 4096 50-MIPS CPUs
connected to memory by an omega network.
How fast do the switches have to be to allow a
request to go to memory and back in one
instruction time?

34
Butterfly Network
• Indirect topology
• n = 2d processor nodes connected by n(log n + 1) switching nodes
• Rows are labeled 0 … n. Each
processor has four connections 0 1 2 3 4 5 7
to other processors (except 6

processors in top and bottom Rank 0 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7

row).
• Processor P(r, j), i.e. processor Rank 1 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7
number j in row r is
connected to P(r-1, j) and
P(r-1, m) where m is Rank 2 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7

obtained by inverting the


rth significant bit
in the binary representation of Rank 3 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7

j. 35
Butterfly Network Routing

36
Evaluating Butterfly Network
• Diameter: log n

• Bisection width: n / 2

• Edges per node: 4

• Constant edge length?


No
37
Completely Connected Network
• Each processor is connected to every other
processor.
• The number of links in the network scales
as
O(p2).
• While the performance scales very well, the
hardware complexity is not realizable for large
values of p.
• In this sense, these networks are static
counterparts of crossbars.
38
Star Connected Network

• Every node is connected only to a common


node at the center.
• Distance between any pair of nodes is O(1).
However, the central node becomes a
bottleneck.
• In this sense, star connected networks are
static counterparts of buses.

39
Completely Connected and Star
Connected Networks
Example of an 8-node completely connected network.

(a) A completely-connected network of eight nodes;


(b) a star connected network of nine nodes.

40
Linear Arrays, Meshes, and k-d
Meshes
• In a linear array, each node has two neighbors,
one to its left and one to its right. If the nodes at
either end are connected, we refer to it as a 1-D
torus or a ring.
• A generalization to 2 dimensions has nodes with
4 neighbors, to the north, south, east, and west.
• A further generalization to d dimensions has
nodes with 2d neighbors.
• A special case of a d-dimensional mesh is a
hypercube. Here, d = log p, where p is the total
number of nodes. (4-dimensional hypercube
has 16 nodes)
41
Linear Arrays

Linear arrays: (a) with no wraparound links; (b) with wraparound


link.

42
43
Two- and Three Dimensional Meshes

Two and three dimensional meshes: (a) 2-D mesh with no wraparound;
(b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with
no wraparound.

44
45
46
Hypercubes and their Construction

Construction of hypercubes from hypercubes of lower dimension.

47
Properties of Hypercubes

• The distance between any two nodes is at


most log p. (log number of nodes- p is the
total number of nodes)
• Each node has log p neighbors.
• The distance between two nodes is given by
the number of bit positions at which the two
nodes differ.

48
Radix of the network router is the number of I/O ports that a
router provides to connect to adjacent routers. High-radix
routers (>5 I/O ports) enable low-diameter topologies,49
allowing all processing nodes to be reached in just a few hops
Tree-Based Networks

Complete binary tree networks: (a) a static tree network; and (b) a
dynamic tree network.

50
Tree Properties
• The distance between any two nodes is no
more than 2logp ( p is the total number of
nodes).
• Links higher up the tree potentially carry more
traffic than those at the lower levels.
• For this reason, a variant called a fat-tree,
fattens the links as we go up the tree.
• Trees can be laid out in 2D with no wire
crossings. This is an attractive property of
trees.
51
Fat Trees

A fat tree network of 16 processing nodes.

52
53
Evaluating
Static Interconnection Networks
• Diameter: The distance between the farthest two nodes in the network.
The diameter of a linear array is p − 1, that of a mesh is 2( − 1), that
of a tree and hypercube is log p, and that of a completely connected
network is O(1).
• Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts. The bisection width of a linear
array and tree is 1, that of a mesh is , that of a hypercube is p/2 and
that of a completely connected network is p2/4.
• Cost: The number of links or switches (whichever is asymptotically higher)
is a meaningful measure of the cost. However, a number of other
factors, such as the ability to layout the network, the length of wires,
etc., also factor in to the cost.

54
Evaluating
Static Interconnection Networks
• The number of bits that can be communicated simultaneously over a link
connecting two nodes is called the channel width. Channel width is
equal to the number of physical wires in each communication link.
• The peak rate at which a single physical link can deliver bits is called the
channel rate.
• The peak rate at which data can be communicated between the ends of a
communication link is called channel bandwidth.
• Channel bandwidth is the product of channel rate and channel width.
• The bisection bandwidth of a network is defined as the minimum volume
of communication allowed between any two halves of the network.
It is the product of the bisection width and the channel bandwidth.
Bisection bandwidth of a network is also sometimes referred
to as crosssection bandwidth.

55
Evaluating
Static Interconnection Networks
Bisectio Arc Cost
Network Diameter
n Width Connectivi (No. of links)
ty
Completely-connected

Star

Complete binary tree

Linear array

2-D mesh, no wraparound

2-D wraparound mesh

Hypercube

Wraparound k-ary d-cube

56
Evaluating Dynamic Interconnection Networks

Bisectio Arc Cost


Network Diameter
n Width Connectivi (No. of links)
ty
Crossbar

Omega Network

Dynamic Tree

57

You might also like