0% found this document useful (0 votes)
32 views76 pages

Network 34

The document discusses different types of interconnect networks that can be used to connect processors in a multiprocessor system. It describes static networks like buses, stars and meshes. It also covers dynamic networks using switches like crossbars, omega networks and clos networks. The document compares properties of different networks like cost, diameter and blocking characteristics.

Uploaded by

joudsamardali246
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views76 pages

Network 34

The document discusses different types of interconnect networks that can be used to connect processors in a multiprocessor system. It describes static networks like buses, stars and meshes. It also covers dynamic networks using switches like crossbars, omega networks and clos networks. The document compares properties of different networks like cost, diameter and blocking characteristics.

Uploaded by

joudsamardali246
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 76

Multiprocessors and the

Interconnect

1
Scope

 Taxonomy
 Metrics
 Topologies
 Characteristics
 cost
 performance

2
Interconnection
 Carry data between processors and to
memory
 Interconnect components
 switches
 links (wires, fiber)
 Interconnection network types
 static networks: point-to-point communication
links
 Other name is direct networks.
 dynamic networks: switches and
communication links
 Other name is indirect networks.

3
Static vs. Dynamic

4
Dynamic Networks
 Switch: maps a fixed number of inputs to outputs
 Number of ports on a switch = degree of the
switch.
 Switch cost
 grows as the square of switch degree
 peripheral hardware grows linearly with switch degree
 packaging cost grows linearly with the number of pins
 Key property: blocking vs. non-blocking
 blocking
 path from p to q may conflict with path from r to s
 for independent p, q, r, s
 non-blocking
 disjoint paths between each pair of independent sources
and sinks
5
Network Interface
 Processor node’s link to the interconnect
 Network interface responsibilities
 packetizing communication data
 computing routing information
 buffering incoming/outgoing data
 Network interface connection
 I/O bus: PCI (PCI Peripheral Component Interconnect) or PCIx
on many modern systems
 memory bus: e.g. Intel QuickPath
 higher bandwidth and tighter coupling than I/O bus
 Network performance
 depends on relative speeds of I/O and memory buses

6
Topologies

 Many network topologies


 Tradeoff: performance vs. cost
 Machines often implement hybrids of
multiple topologies
 packaging
 cost
 available components

7
Metrics
 Degree
 number of links per node
 Diameter
 longest distance between two nodes in the network
 Bisection Width
 min # of wire cuts to divide the network in 2 halves
 Cost:
 # links or switches

8
Topologies: Bus
 All processors access a common bus for
exchanging data
 Used in simplest and earliest parallel
machines
 Advantages
 distance between any two nodes is O(1)
 provides a convenient broadcast media
 Disadvantages
 bus bandwidth is a performance bottleneck

9
Bus Systems
 A bus system is a hierarchy of buses connection various
system and subsystem components.
 has a complement of control, signal, and power lines.
 a variety of buses in a system:
 Local bus – (usually integral to a system board) connects
various major system components (chips)
 Memory bus – used within a memory board to connect the
interface, the controller, and the memory cells
 Data bus – might be used on an I/O board or VLSI chip to
connect various components
 Backplane – like a local bus, but with connectors to which other
boards can be attached

10
Bridges
 The term bridge is used to denote a device that
is used to connect two (or possibly more) buses.
 The interconnected buses may use the same
standards, or they may be different (e.g. PCI in a
modern PC).
 Bridge functions include
 Communication protocol conversion
 Interrupt handling
 Serving as cache and memory agents
*
11
Bus

 Since much of the data accessed by processors is local to the


processor, cache is critical for the performance of busbased
machines
12
Bus Replacement: Direct
Connect

 Intel Quickpath interconnect (


13
Direct Connect: 4 Node
Configurations

4N SQ 4N FC
XFIRE BW 14.9GB/s XFIRE BW 29.9GB/s
Diam 2 avg:1 Diam 1, Avg: 0.75
Figure Credit : The Opteron CMP NorthBridge
Architecture, Now and in the Future, AMD , Pat
Conway, Bill Hughes , HOT CHIPS 14 2006
Direct Connect: 8 Node
Configurations

15
Crossbar Network

 A crossbar network uses an p×m grid of switches to connect p


inputs to m outputs in a non-blocking manner
 A non-blocking crossbar network connecting p processors to b
memory banks
 Cost of a crossbar: O(p^2)
 Generally difficult to scale for large values of p
 Earth Simulator: custom 640-way single-stage crossbar 16
Assessing Network
Alternatives

 Buses
 excellent cost scalability
 poor performance scalability
 Crossbars
 excellent performance scalability
 poor cost scalability
 Multistage interconnects
 compromise between these extremes

17
 Multistage interconnection networks (MINs) are a class of high-speed
computer networks usually composed of processing elements (PEs) on
one end of the network and memory elements (MEs) on the other end,
connected by switching elements (SEs).
 The switching elements themselves are usually connected to each other in
stages, hence the name.
 MINs are typically used in high-performance or parallel computing as a
low-latency interconnection (as opposed to traditional packet switching
networks), though they could be implemented on top of a packet
switching network.
 Though the network is typically used for routing purposes, it could also be
used as a co-processor to the actual processors for such uses as sorting;
cyclic shifting, as in a perfect shuffle network; and bionic sorting.

18
Multistage Network

19
Multistage Omega Network

 The Omega network is a type of


interconnection network commonly used
in parallel computing systems.
 It is designed to provide efficient
communication between processing
elements (PEs) or nodes in a parallel
computer system.

20
Multistage Omega Network
 The Omega network is a variant of a multistage
interconnection network (MIN).
 It consists of multiple stages of switches arranged
in a grid-like structure. Each switch in the network
connects to a subset of PEs or nodes.
 The switches are organized in rows and columns,
forming a 2D grid. The number of rows and
columns in the grid typically corresponds to the
number of PEs in the system.

21
Multistage Omega Network

 Organization
 log p stages
 p inputs/outputs
 At each stage, input i is connected to
output j if:

22
Omega Network Stage

 Each Omega stage is connected in a perfect shuffle


23
Omega Network Switches

 2×2 switches connect perfect


shuffles
 Each switch operates in two modes

24
Multistage Omega Network

 Cost: p/2 × log p switching nodes → O(p log p)

25
Omega Network Routing
 Let
 s = binary representation of the source processor
 d = binary representation of the destination processor
or memory
 The data traverses the link to the first switching
node
 if the most significant bit of s and d are the same
 route data in pass-through mode by the switch
 else
 use crossover path
 Strip off leftmost bit of s and d
 Repeat for each of the log p switching stages

26
Omega Network Routing

27
Blocking in an Omega
Network

28
Clos Network (non-blocking)

29
Static Networks

30
Star Connected Network

 Static counterparts of buses


 Every node connected only to a
common node at the center
 Distance between any pair of nodes
is O(1)

31
Completely Connected
Network

 Each processor is connected to every


other processor
 static counterparts of crossbars
 number of links in the network scales as
O(p^2)

32
Linear Array

 Each node has two neighbors: left &


right

 If connection between nodes at


ends: 1D torus (ring)

33
Meshes and k-d Meshes
 Mesh: generalization of linear array to 2D
 nodes have 4 neighbors: north, south, east,
and west.
 k-d mesh:
 d-dimensional mesh
 node have 2d neighbors

34
Hypercubes

 Special d-dimensional mesh: p


nodes, d = log p

35
4D

36
Hypercube Properties

 Distance between any two nodes is


at most log p.
 Each node has log p neighbors
 Distance between two nodes = # of bit
positions that differ between node
numbers

37
Trees

38
Tree Properties
 Distance between any two nodes is no
more than 2 log p
 Trees can be laid out in 2D with no wire
crossings
 Problem
 links closer to root carry > traffic than those at
lower levels.
 Solution: fat tree
 widen links as depth gets shallower
 copes with higher traffic on links near root

39
Fat Tree Network

 Fat tree network for 16 processing nodes


 Can judiciously choose “fatness” of links
 take full advantage of technology and packaging
constraints 40
Metrics for Interconnection
Networks

41
Metrics for Dynamic
Interconnection Networks

42
Self Routing
 Omega network has self-routing property
 The path for a cell to take to reach its
destination can be determined directly from its
routing tag (i.e., destination port id)
 Stage k of the MIN looks at bit k of the tag
 If bit k is 0, then send cell out upper port
 If bit k is 1, then send cell out lower port
 Works for every possible input port (really!)
Example of Self Routing
Cell destined for output port 4 (= 100 )2
0 0
1 4 1

2 2

3 3

4 4

5 5

6 6

7 7
Example of Self Routing
Cell destined for output port 4 (= 100 )2
0 0
1 1

2 4 2

3 3

4 4

5 5

6 6

7 7
Example of Self Routing
Cell destined for output port 4 (= 100 )2
0 0
1 1

2 2

3 4 3

4 4

5 5

6 6

7 7
Example of Self Routing
Cell destined for output port 4 (= 100 )2
0 0
1 1

2 2

3 3

4 4

5 5

6 4 6

7 7
Example of Self Routing
Cell destined for output port 4 (= 100 )2
0 0
1 1

2 2

3 3

4 4

5 5

6 4 6

7 7
Example of Self Routing
Cell destined for output port 4 (= 100 )2
0 0
1 1

2 2

3 3

4 4

5 4 5

6 6

7 7
Example of Self Routing
Cell destined for output port 4 (= 100 )2
0 0
1 1

2 2

3 3

4 4 4

5 5

6 6

7 7
Path Contention
 The omega network has the problems as the
delta network with output port contention
and path contention
 Again, the result in a bufferless switch fabric
is cell loss (one cell wins, one loses)
 Path contention and output port contention
can seriously degrade the achievable
throughput of the switch
Path Contention
0 0
1 4 1

2 2

3 5 3

4 4

5 5

6 6

7 7
Path Contention
0 0
1 1

2 4 2

3 3

4 4

5 5

6 5 6

7 7
Path Contention
0 0
1 1

2 2

3 4 3

4 4

5 5

6 6

7 5 7
Path Contention
0 0
1 1

2 2

3 3

4 4

5 5

6 4 6

7 5 7
Path Contention
0 0
1 1

2 2

3 3

4 4

5 5

6 6

7 7
Path Contention
0 0
1 1

2 2

3 3

4 4

5 5

6 5 6

7 7
Path Contention
0 0
1 1

2 2

3 3

4 4

5 5 5

6 6

7 7
Path Contention
0 0
1 1

2 2

3 3

4 4

5 5 5

6 6

7 7
Performance Degradation
0 0
1 4 1

2 2

3 6 3

4 1 4

5 7 5

6 0 6

7 3 7
Performance Degradation
0 0
1 1 1

2 4 2

3 7 3

4 4

5 0 5

6 6 6

7 3 7
Performance Degradation
0 1 0
1 1

2 2

3 3

4 0 4

5 5

6 3 6

7 6 7
Performance Degradation
0 1 0
1 1

2 2

3 7 3

4 0 4

5 5

6 3 6

7 6 7
Performance Degradation
0 1 0
1 0 1

2 2

3 3

4 4

5 3 5

6 7 6

7 6 7
Performance Degradation
0 0
1 1

2 2

3 3

4 4

5 3 5

6 6

7 7
Performance Degradation
0 0 0
1 1

2 2

3 3

4 4

5 3 5

6 6

7 6 7
Performance Degradation
0 0 0
1 1

2 2

3 3 3

4 4

5 5

6 6

7 6 7
Performance Degradation
0 0 0
1 1

2 2

3 3 3

4 4

5 5

6 6 6

7 7
A Solution: Batcher Sorter
 One solution to the contention problem is to sort
the cells into monotonically increasing order
based on desired destination port
 Done using a bitonic sorter called a Batcher
 Places the M cells into gap-free increasing
sequence on the first M input ports
 Eliminates duplicate destinations
Batcher-Banyan Example
0 0 0
1 1 1

2 3 2

3 4 3

4 6 4

5 7 5

6 6

7 7
Batcher-Banyan Example
0 0 0
1 6 1

2 1 2

3 7 3

4 3 4

5 5

6 4 6

7 7
Batcher-Banyan Example
0 0 0
1 6 1

2 1 2

3 7 3

4 3 4

5 5

6 6

7 4 7
Batcher-Banyan Example
0 0 0
1 3 1

2 6 2

3 3

4 1 4

5 5

6 7 6

7 4 7
Batcher-Banyan Example
0 0 0
1 3 1

2 2

3 6 3

4 1 4

5 5

6 4 6

7 7 7
Batcher-Banyan Example
0 0 0
1 1 1

2 3 2

3 3

4 4

5 4 5

6 6 6

7 7 7
Batcher-Banyan Example
0 0 0
1 1 1

2 2

3 3 3

4 4 4

5 5

6 6 6

7 7 7

You might also like