Interconnection Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 91

INTERCONNECTION

NETWORKS
Prof. Dr. Aman Ullah Khan
INTERCONNECTION NETWORKS
 One of the dominant factor that hinders the Parallel Computers
performance is how, different processors exchange the data.
 Processors need to access shared memories
 Possessors needs to communicate with each other

2
12/1/2020 Advanced Computers Architecture
INTERCONNECTION NETWORKS …
 One of the dominant factor that hinders the Parallel
Computers performance is the inter processor communication
 Processors need to access distributed memories
 Possessors needs to communicate with each other

Performance /
ODE Transport I/O Total Cost
System Configuration Cost Ratio (Mflops
(sec) (sec) (sec) (sec) ($M)
/ sec per $M)
Cray C90 7 4 16 27 20 44
Intel Paragon 12 24 10 46 10 78
NOW (Ethernet) 4 23,340 4,030 27,347 4 0.32
NOW + ATM 4 192 2,015 2,211 5 3.3
NOW + ATM + PIO 4 192 10 205 5 35
NOW + ATM + PIO + AM 4 8 10 21 5 342
+Cray C90 PVP with 16 processors ++Intel Paragon has 256 nodes +++ NOW with 256 RS6000 Workstations

PIO: Programmed I/O ATM: Asynchronous Transfer Mode 3


12/1/2020 Advanced Computers Architecture
PERMUTATIONS

4
12/1/2020 Advanced Computers Architecture
PERMUTATIONS
 Let S = {1,2,3, …, n} and Sn is the set of all one-to-one
mappings (permutations) of S onto itself.

 The Sn forms a group under the compositions of mappings.


▪ A set G is said to be a group under operation ‘o’, if
1. Closure property holds i.e. ∀X,Y ∈ G, XoY∈ G.
2. ‘o’ is Associative i.e. Xo(YoZ) = (XoY)oZ , ∀X,Y, Z ∈ G
3. Identity element exists i.e. e o X = X o e = X,
4. Inverse elements exits i.e. X o X-1 = X-1 o X = e

5
12/1/2020 Advanced Computers Architecture
PERMUTATIONS
 A permutation is P is written as follows:
1 2 3 4 5
P=
2 3 5 4 1
 The permutation P maps the elements as follows:
 1 →2, 2 → 3, 3 → 5, 4 → 4 and 5 → 1
 In a shorter, notation we skip the domain and just write the
range elements of P i.e. P = [ 2 3 5 4 1]
 The electrical notation of P is shown below
1 1
2 2
3 3
4 4
5 5

6
12/1/2020 Advanced Computers Architecture
PERMUTATIONS - EXAMPLE
 Let S = {1, 2, 3}, Then
 Then Sn = {e, t1, t2, t3, s1,s2 }, where e is identity elements forms a group under
compositions of permutations.
 The two permutations when applied in sequence are the composition of two
permutations
1 2 3 t1 .t2 = 1 3 2 3 2 1
e= e= 1 2 3
1 2 3
1 2 3 1 2 3 1 2 3
t1 = t1 = 1 3 2 =
1 3 2 1 3 2 3 2 1
1 2 3 1 2 3
t2 = t2 = 3 2 1 =
3 2 1 3 1 2
OR = 3 1 2
1 2 3 t3 = 2 1 3
t3 =
2 1 3 1 1
1 1 1 1
1 2 3 2
s1 = s1 = 2 3 1 2
2
2 2 2
2 3 1
3 3
1 2 3 3 3 3 3
s2 = s2 = 3 1 2 t1 t2 t1.t2
3 1 2
Prove: t1.t1=e, t2.t2=e, t3.t3=e, t1.s1=t3, t1.s2=t2, t2.s1=t1, s13=e, s23=e.
7
12/1/2020 Advanced Computers Architecture
DECOMPOSITION OF PERMUTATIONS INTO CYCLES
 Note that the permutations s1 and s2 are cycles.
 A cycle with two elements is called transposition or exchange
permutation.
 A cycle with n elements is written as (1 2 3 … n).

 If there is a cycle of m elements in a permutation of n (m<=n)


elements then the notation just list the m elements of the
cycle. For example, the permutation P = [2 3 4 1 5 6] will be
written down as P = (1 2 3 4).
 Cycles (1 2) and (4 5 6) stands for (2 1 3 4 5 6) and (1 2 3 5 6 4)

 A permutation can be decomposed into a number of cycles


such that their product generates the original permutation.

8
12/1/2020 Advanced Computers Architecture
DECOMPOSITION OF PERMUTATIONS INTO CYCLES …
 Let P be a permutation. A cycle with an element x may be
generated by applying P repeatedly to element x.
 The elements generated in the process are {x,P1(x),P2(x),..,Pk-
1(x)} such that Pk(x)= x.

 An orbit of an element x under the permutation P is the cycle:


{x,P1(x),P2(x),..,Pk-1(x)} such that Pk(x)= x.
 The orbits under P = [2 1 3 5 6 4] for various elements are list
in the table: x P Cycle

 The cycles of P are thus: 1 [1 2] (1 2)


2 [2 1] (1 2)
 (1 2), (3), (4 5 6) and
3 [3] (3)
 P = (1 2) (3) (4 5 6) 4 [4 5 6] (4 5 6)
5 [5 6 4] (4 5 6)
6 [6 4 5] (4 5 6)
9
12/1/2020 Advanced Computers Architecture
EXCHANGE (TRANSPOSITION) PERMUTATION
 Let a = (an-1 an-2 … a1 a0) be the binary notation for the number a.
 The exchange permutation E is defined as E(a) = (an-1 an-2 … a1 a’0)
 The class of exchange permutations Ek is defined as:
Ek(a) = an-1,…, ak‘,…, a0, where “ ‘ “ denotes the Boolean complement.
 For example, for a = 8, the exchange permutations E0, E1and E2 are listed
below:

E0(000) = 001 E1(000) = 010 E2(000) = 100


E0(001) = 000 E1(001) = 011 E2(001) = 101
E0(010) = 011 E1(010) = 000 E2(010) = 110
E0(011) = 010 E1(011) = 001 E2(011) = 111
E0(100) = 101 E1(100) = 110 E2(100) = 000
E0(101) = 100 E1(101) = 111 E2(101) = 001
E0(110) = 111 E1(110) = 101 E2(110) = 010
E0(111) = 110 E1(111) = 110 E2(111) = 011
E0 =(0 1) (2 3) (4 5) (6 7) E1 =(0 2) (1 3) (4 6) (5 7) E2 =(0 4) (1 5) (2 6) (3 7)
10
12/1/2020 Advanced Computers Architecture
EXCHANGE (TRANSPOSITION) PERMUTATION …
0 0 0 0 0 0

1 1 1 1 1 1

2 2 2 2 2 2

3 3 3 3 3 3

4 4 4 4 4 4

5 5 5 5 5 5

6 6 6 6 6 6

7 7 7 7 7 7
E0 E1 E2

P0 P1 P2 P3 P4 P5 P6 P7

11
12/1/2020 Advanced Computers Architecture
PERFECT SHUFFLE PERMUTATION
 Let a = (an-1 an-2 … a1 a0) be the binary notation for the number a.
 The Perfect Shuffle permutation S is defined as:
S(an-1 an-2 … a1 a0) = an-2 an-3 … a1 a0 an-1 i.e. rotate binary address left
by one position.
 For example, the table below lists shuffle permutations for a = 8.
0 0
S(000) = 000
S(001) = 010 1 1

S(010) = 100 2 2
S(011) = 110 3 3
S(100) = 001 4 4
S(101) = 011
5 5
S(110) = 101
6 6
S(111) = 111
7 7
S =(0) (1 2 4) (3 6 5) (7) S
12
12/1/2020 Advanced Computers Architecture
PERFECT SHUFFLE PERMUTATION
 Class of permutations called subshuffles and supershuffles are
defined as:
an-1 an-2 … ak ak-1 … a1 a0

Rotating this part gives subshuffle Sk


Rotating this part gives supershuffle Sk

 For 3-bit addresses, the permutations S1 and S1 are listed below:


S1(000) = 000 S1(000) = 000
S1(001) = 010 S1(001) = 001
S1(010) = 001 S1(010) = 100
S1(011) = 011 S1(011) = 101
S1(100) = 100 S1(100) = 010
S1(101) = 110 S1(101) = 011
S1(110) = 101 S1(110) = 110
S1(111) = 111 S1(111) = 111
S1=(0) (1 2) (3) (4) (5 6) (7) S1 = (0) (1) (2 4) (3 5) (6) (7)

13
12/1/2020 Advanced Computers Architecture
BUTTERFLY PERMUTATIONS
 The butterfly permutation is obtained by flying and exchanging the
MSBit with LSBit.
B(an-1an-2 … a1a0) = a0an-2 … a1an-1
 The subbutterfly permutation Bk is defined as:
Bk(an-1an-2 … ak+1akak-1 … a1a0) = an-1 an-2 … a0 ak-1 … a1ak
 The superbutterfly permutation Bk is defined as:
Bk(an-1an-2 … ak+1akak-1 … a1a0) = akan-2 … ak+1an-1ak-1 … a1a0
B(000) = 000 B1(000) = 000 B1(000) = 000
B(001) = 100 B1(001) = 010 B1(001) = 001
B(010) = 010 B1(010) = 001 B1(010) = 100
B(011) = 101 B1(011) = 011 B1(011) = 101
B(100) = 001 B1(100) = 100 B1(100) = 010
B(101) = 101 B1(101) = 110 B1(101) = 011
B(110) = 011 B1(110) = 101 B1(110) = 110
B(111) = 111 B1(111) = 111 B1(111) = 111
B = (0) (1 4) (2) (3 6) (5) (7) B1=(0) (1 2) (3) (4) (5 6) (7) B1 = (0) (1) (2 4) (3 5) (6) (7)
14
12/1/2020 Advanced Computers Architecture
BUTTERFLY PERMUTATIONS …
 The figure below depicts the butterfly permutations B, B1 and B1.

0 0 0 0 0 0

1 1 1 1 1 1

2 2 2 2 2 2

3 3 3 3 3 3

4 4 4 4 4 4

5 5 5 5 5 5

6 6 6 6 6 6

7 7 7 7 7 7
B B1 B1

15
12/1/2020 Advanced Computers Architecture
BIT REVERSAL PERMUTATIONS
 The bit reversal permutation R is defined as:
R(an-1an-2 … a1a0) = a0a1 … an-2an-1
 The sub bit reversal permutation Rk is defined as:
Rk(an-1 an-2 … ak ak-1 … a1 a0) = an-1an-2 … ak+1 a0 … ak-1ak
 The super bit reversal permutation Bk is defined as:
Rk(an-1an-2 …ak+1akak-1 … a1a0) = akak+1 … an-2an-1ak-1 … a1a0
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
1
R = (0)(1)(2 4)(3 5)(6)(7)
R = (0)(1 4)(2)(3 6)(5)(7) R1 = (0)(1 2)(3)(4)(5 6)(7)

16
12/1/2020 Advanced Computers Architecture
SHIFT PERMUTATIONS
 The shift permutation L is defined as:
L(a) = (a + 1) mod N, where N = 2n.
 The sub shift permutation Lk is defined as:
Lk(a) = (a + 1) mod 2k
 The super bit reversal permutation Bk is defined as:
Lk(a) = (a + 2n-k) mod N
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
L = (1 2 3 4 5 6 7 0) L1 = (1 2 3 0)(5 6 7 4) L1 = (2 4 6 0)(3 5 7 1)
17
12/1/2020 Advanced Computers Architecture
NETWORK CLASSIFICATIONS

18
12/1/2020 Advanced Computers Architecture
NETWORK CLASSIFICATIONS
 Commonly used criteria to classify Interconnection networks :
▪ Connectivity
▪ Topology or
▪ Circuit switching
▪ Packet switching Crossbar

Switch-based
▪ Others
single sided
Single Stage

Dynamic
double sided

By Topology
by connectivity

Multi Stage
blocking
Single

Bus-based
nonblocking

Static
Multiple
rearrangeable

regular
circuit switching

packet switching irregular

12/1/2020 19
Advanced Computers Architecture
NETWORK CLASSIFICATIONS ..
 Single sided networks have only one sided connections to the
node whereas double sided networks have two sided
connections.
1 1 1
2 2 2
3 3 3
connection connection
network network
N-1 N-1 N-1
N N N

Single sided network double sided network

 A non-blocking network of N nodes allows any node to talk


any other node such that N nodes can talk simultaneously.
 Programmable networks allow the path to be dynamically
setup through control.
20
12/1/2020 Advanced Computers Architecture
NETWORK TERMINOLOGY
 A network not following any pattern or order in topology is
called irregular.
 A network employing more than one stage to implement the
permutation is called multistage network.
 A circuit switching network provides the electrical path
between a source and a destination, thus providing the
dedicated links for transfer.
 A packet switched network breaks down messages into
packets and packets travel a dynamically chosen path through
intermediate network nodes. The electrical path exists
between the source and a neighboring node through a link .
A particular link may have number of packets waiting on it to
transfer.
21
12/1/2020 Advanced Computers Architecture
CONNECTIVITY BASED CLASSIFICATION
 Static Interconnection
▪ Networks Processors are connected directly to other processors via
point-to-point communication links
 Dynamic Interconnection
▪ Networks Processors are connected dynamically via switches

Various setting for 2x2 SE

Straight

Exchange

Upper Broadcast

Lower Broadcast

22
12/1/2020 Advanced Computers Architecture
EXAMPLE STATIC INTERCONNECTION NETWORKS

Mesh Network Tree Network

Completely-Connected network

Dynamic Tree
Mesh with wraparound

Star Network

Linear Array and Ring Dynamic Fat Tree


Three-dimensional Mesh
23
12/1/2020 Advanced Computers Architecture
4-CUBE, 16 NODES HYPERCUBE

24
12/1/2020 Advanced Computers Architecture
K-ARY N-CUBE NETWORKS

8-ary 1-cube (8 nodes ring) network

8-ary 2-cube (eight 8-node rings) network.


25
12/1/2020 Advanced Computers Architecture
EVALUATING STATIC INTERCONNECTION NETWORKS
 Distance - Shortest path between two processors
 Diameter - Maximum distance between any two processors
 Connectivity - Measure of the multiplicity between two processors
 Arc Connectivity - Minimum number of communication links that can be
removed to break it into two disconnected networks
 Bisection Width - Minimum number of communication links that can be
removed to break it into two equal sized disconnected networks
 Channel Width - Number of bits that can be sent simultaneously over a
communication link or Number of wires in the communication link
 Channel Rate - Peak rate at which a single wire can deliver bits
Channel bandwidth Peak rate at which a communication link can deliver bits
 Bisection bandwidth - Bisection Width * Channel bandwidth
 Cost - Total number of communication links
 Degree of a Processor - Number of communication links connected to a
processor
 Expandability of network - How much work is needed to add processors to
an existing network?

26
12/1/2020 Advanced Computers Architecture
EXAMPLE STATIC INTERCONNECTION NETWORKS
Characteristics of Static Networks with p Processors
Network Diameter Bisection Width Arc Connectivity Number of Links
Completely-connected 1 p2 / 4 p-1 p(p-1)/2
Star 2 1 1 p-1
Complete binary tree 2lg((p+1)/2) 1 1 p-1
Linear array p-1 1 1 p-1
ring |p /2 | 2 2 p
2-D mesh no wrap 2(√p -1) √p 2 2(p - √p)
2-D mesh with wrap 2 floor(√p/2) p/2 log p 2p
Hypercube log(p) p/2 log p (p log(p))/2

Networks used by Various Commercial Computers


Network Machine
Linear Array & Ring CDC Cyperplus
2-D Mesh DAP, Paragon
3-D Mesh Cray T3D, J-Machine

Tree CM-5 (Fat tree)


Hypercube NCube

27
12/1/2020 Advanced Computers Architecture
NETWORK TERMINOLOGY …
 In parallel computers number of processors and memory modules
need to exchange data in parallel.
 This calls for providing the ability to connect any node from one set
of nodes (say processors) to other set of nodes (memory modules),
thus, requires an arbitrary permutation (non blocking) to be done
by the connection network.
 A non-blocking network is capable of providing N parallel paths
between pairs of nodes forming any arbitrary permutations.
 Some commonly used parameters to compare networks are:
▪ Diameter which is the maximum distance between any to nodes,
▪ Routing control,
▪ Cost in terms of number of links or switches used in the network,
▪ Connection methodology: packet switching or circuit switching,
▪ Resilience towards failures.

28
12/1/2020 Advanced Computers Architecture
THE CROSS BAR
 A cross bar provides full connectivity i.e. any permutation can
be implemented. A N x M cross bar, where N is number of
input nodes and M is number of output nodes is show below:

1 1
2 2
3 3 1
NXM I
2
Cross Bar N
N-1 M-1 P 3
N M U
.
Block Diagram T
S .
Various Switch Setting
N

Straight
1 2 . . M-1 M
OUTPUTS
Switch Connections
Diagonal

29
12/1/2020 Advanced Computers Architecture
THE CROSS BAR
 The permutation itself tells the addresses of the switches to be
closed.
 For example, for the permutation below we need to close the
switches given by the row and column numbers as: (1 3), (2 4), (3 2)
and (4 1).
Open Closed
Switch Switch

1
row select I
2
N
1 2 3 4 P 3
3 4 2 1 U
4
T
column select
S .

1 2 3 4 M-1 M
OUTPUTS

30
12/1/2020 Advanced Computers Architecture
CROSS BAR CHARACTERISTICS
 A NXM cross bar has the following characteristics:
▪ Hardware cost: c*N*M switches, where c is a constant.
▪ Connectivity: Full, non-blocking. Any input can be connected to any
output. Any permutation can be setup.
▪ Delay: One unit (data passes through only one stage)
▪ Routing Control: Simple

31
12/1/2020 Advanced Computers Architecture
CROSS BAR CHARACTERISTICS

An 8x8 crossbar network with a single-point failure.


32
12/1/2020 Advanced Computers Architecture
CLOS NETWORK
 Three stage network composed of smaller size cross bars
 Reduced switch cost in contrast to cNM for an NxM cross bar
 A NxM Clos Network is show below.
1 1 1 1 1 1 1 1
2 2 n1 x m 2 2 r1 x r2 2 2 m x n2 2 2
3 3 3 3 3 3 3 3
Cross Bar Cross Bar Cross Bar
. . . . . . . .
#1 #1 #1
n1 n1 m r1 r2 m n2 n2

n1+1 1 1 1 1 1 1 n2+1
n1+2 2 n1 x m 2 2 r 1 x r2 2 2 m x n2 2 n2+2
n1+3 3 3 3 3 3 3 n2+3
Cross Bar Cross Bar Cross Bar
. . . . . . . .
#2 #2 #2
2n1 n1 m r1 r2 m n2 2n2
. . .
. . .
1 1 1 1 1 1
2 n1 x m 2 2 r1 x r2 2 2 m x n2 2
3 3 3 3 3 3
Cross Bar Cross Bar Cross Bar
. . . . . . . .
#r2 #m #r2
r2xn1 =N n1 m r1 r2 n n2 m

33
12/1/2020 Advanced Computers Architecture
CLOS NETWORK ARCHITECTURE
 The architecture of the Clos Network is described as:
▪ Assume that N and M are composite and can be factored as
N = n1 x r1 and M =n2 x r2.
▪ Now, N input lines are grouped into r1 groups, such that each group
has n1 input lines.
▪ The first stage of the Clos network has r1 cross bars each of size
n1xm. Here m is an integer greater than or equal to n1 & n2.
▪ Similarly, the output lines are grouped into r2 groups and each
group has n2 lines.
▪ The output stage has r2 cross bars each of size m x n2.
▪ The middle stage has m cross bars each of size r1 x r2.

34
12/1/2020 Advanced Computers Architecture
CLOS NETWORK ARCHITECTURE
 The connections can be expressed as follows:
▪ Let CB1k, CB2k and CB3k be the kth cross bars in the stage 1, stage 2
and stage 3, respectively. Further let
➢ CB1Ok(j) = jth output line of the kth cross bar at stage 1,
➢ CB2Ok(j) = jth output line of the kth cross bar at stage 2,
➢ CB3Ok(j) = jth output line of the kth cross bar at stage 3, and
➢ CB1Ik(i) = ith input line of the kth cross bar at stage 1,
➢ CB2Ik(i) = ith input line of the kth cross bar at stage 2, and
➢ CB3Ik(i) = ith input line of the kth cross bar at stage 3,
▪ The Clos network is defined by the following connections
➢ Join CB1Ok(j) to CB2Ij(k) for k, j = 1,2, …, r1
➢ Join CB2Ok(j) to CB3Ij(k) for k, j = 1,2, …, r2
 Thus, any cross bar can be decomposed into Clos Network
which is non-blocking and can generate any permutation.

35
12/1/2020 Advanced Computers Architecture
HARDWARE COMPLEXITY OF CLOS NETWORK
 Total number of switches =
▪ = sw1 + sw2 + sw3 , where swi is the number of switches in stage i.
▪ = r1 x (n1 x m) + m x (r1 x r2) + r2 x (m x n2)
▪ = m x (r1 x n1 + r1 x r2 + r2 x n2)
▪ Assuming that M = N with n1 = n2 = r1 = r2 = sqrt(N) = n
▪ The number switches required to implement Clos Network is:
▪ = m (3 n2)
▪ = 3 n3 (since n = m)
▪ = 3 N1.5

36
12/1/2020 Advanced Computers Architecture
A THREE-STAGE CLOS NETWORK

37
12/1/2020 Advanced Computers Architecture
CLOS NETWORK CONTROL 0 1 2 3 4 5 6 7 8
6 5 1 7 8 3 2 0 4
 In 1953, Clos proposed the cross bar
decomposition into a network of smaller
cross bars. However, the method to setup
the network for a permutation has been
lacking for a long time. I/O 0 1 2 3 4 5 6 7 8
0 x
 Anderson proposed an algorithm for a class
of networks which has sub-cross bars 1 x
having sizes as powers of 2. 2 x

 The general algorithm is as under: 3 x


4 x
 Assume that N and M can be
decomposed as N=n1xr1 and M=r2xn2. 5 x
There are m cross bars of size (r1xr2) in 6 x
the middle such that (n1≥ m). 7 x
 Map the permutation on the 8 x
permutation matrix P such that
x if input i is required to be connected to the output j
P(i,j) =
blank otherwise.

38
12/1/2020 Advanced Computers Architecture
CLOS NETWORK CONTROL …
 Partition permutation matrix P as a group matrix such that it
has n2 columns and n1 rows in, general.
 Place the index of the middle cross bars on x’s such that no
column or row of the group matrix has repeated index.
 Following, the paths through indexed middle cross bars,
implement the given permutation.

39
12/1/2020 Advanced Computers Architecture
CLOS NETWORK EXAMPLE
0 1 2 3 4 5 6 7 8
6 5 1 7 8 3 2 0 4

I/O 0 1 2 3 4 5 6 7 8
0 0
0 x2
1 1
1 x1
2 0 2
2 x0
3 x1
4 x0
5 x2
3 3
6 x1 1
4 4
7 x2
8 x0 5 5

Cost:
1. Cross bar: 2
9x9 = 81 switches 6 6
2. Clos Network:
7 7
3x4x3 + 3x3x4 + 4x3x3
= 36 + 36 + 36 = 108 switches 8 8
3

40
12/1/2020 Advanced Computers Architecture
CLOS NETWORK SETUP EXAMPLE

0 1 2 3 4 5 6 7 8 0 0
6 5 1 7 8 3 2 0 4
1 1

2 0 2

I/O 0 1 2 3 4 5 6 7 8
0 x2
1 x1
2 x3 3 3
1
3 x3 4 4
4 x0 5 5
5 x2
6 x1
7 x2
2

8 x0 6 6

7 7

8 8
3

41
12/1/2020 Advanced Computers Architecture
12X12 = (4X3)(3X4) CLOS NETWORK

0 0

1 1

2 2
3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10
11 11

42
12/1/2020 Advanced Computers Architecture
12X12 = (4X3)(3X4) CLOS NETWORK …

0 1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 0

I/O 0 1 2 3 4 5 6 7 8 9 10 11
0 x0
1 x1
2 x2
3 x4
4 x0
5 x1
6 x2
7 x3
8 x0
9 x1
10 x2
11 x3

43
12/1/2020 Advanced Computers Architecture
BENES NETWORK 0
2x2 2x2
0
1 1
 Multistage network with N = M = 2r 2 2
and composed of 2x2 cross bars 3
2x2 8x8 2x2
3
only. 4
Cross 4
 Reduced switch cost in contrast to 2x2 bar 2x2
5 5
cN2 for an NxN cross bar. 6 6
2x2 2x2
 First & last stages have N/2 cross 7 7
bars each of size 2x2, while the 8 8
2x2 2x2
middle stage has two cross bars 9 9
each of size N/2xN/2. 10 10
 Further, decomposition of the 11
2x2 8x8 2x2
11
middle stage N/2xN/2 cross bars 12
cross 12
usig N/4xN/4 can be done, and in 2x2 bar 2x2
13 13
general the idea is applied 14 14
recursively. 15
2x2 2x2
15
 The recursion terminates when Benes Network Principle
trivial cross bars of size 2x2 are
reached. 2x2 2x2
 The 2x2 cross bar could be 2x2 Cell with two routing functions
implemented using 4 switches and
permits two permutations, viz., 0
2x2 2x2 2x2
0
identity and exchange permutation. 1 1

 A 2x2 cell can be implemented with 2 2


2x2 2x2 2x2
the identity and exchange 3 3
permutation using a 1-bit control. 4 x4 Benes Network
44
12/1/2020 Advanced Computers Architecture
BENES NETWORK CONTROL
0 0
2x2 2x2 2x2 2x2 2x2
1 1
 Benes Network is a 2
2x2 2x2 2x2 2x2 2x2
2
3 3
special case of the Clos 4 4
2x2 2x2 2x2 2x2 2x2
Network, therefore, the 5 5
6 6
Clos Network control is 7
2x2 2x2 2x2 2x2 2x2
7

applicable to Benes 8 x 8 Benes Network

Network.
0 0
 Just like Clos Network, 1
2x2 2x2 2x2 2x2 2x2
1

the permutation matrix 2


2x2 2x2 2x2 2x2 2x2
2
3 3
is divided into groups of 4
2x2 2x2 2x2 2x2 2x2
4

2x2 sub matrices. 5


6
5
6
2x2 2x2 2x2 2x2 2x2
 Apply the algorithm as 7 7

was applied to Clos


Network.

45
12/1/2020 Advanced Computers Architecture
BENES NETWORK ROUTING
4x4 subnetwork 0
0 1 2 3 4 5 6 7 0 0
P= 2x2 2x2 2x2 2x2 2x2
5 2 4 6 1 7 0 3 1 1
2 2
Group Columns 2x2 2x2 2x2 2x2 2x2
3 3
4 4
0 1 2 3 4 5 6 7 2x2 2x2 2x2 2x2 2x2
5 5
0 x0
6 6
1 x1 2x2 2x2 2x2 2x2 2x2
7 7
Group Rows

2 x1
4x4 subnetwork 1
3 x0
4 x0
5 x1
6 x1 4x4 subnetwork 0
7 x0 0
0 0
2x2 2x2 2x2 2x2 2x2
1 3 1
2 4 2
2x2 2x2 2x2 2x2 2x2
3 7 3
Note:
1. Connections index 0 are to be routed to 4 1 4
2x2 2x2 2x2 2x2 2x2
upper subnetwork 0 5 2 5
2. Connections index 1 are to be routed to 6 5 6
upper subnetwork 1. 2x2 2x2 2x2 2x2 2x2
7 7
6
4x4 subnetwork 1

46
12/1/2020 Advanced Computers Architecture
4x4 subnetwork 0
0 1 2 3 4 5 6 7
0
0 0
0 x0 2x2 2x2 2x2 2x2 2x2
1 3 1
1
2 4 2
2 2x2 2x2 2x2 2x2 2x2
3 7 3
3 x1
4 1 4
4 x0 2x2 2x2 2x2 2x2 2x2
5 2 5
5 x1 6 6
5
2x2 2x2 2x2 2x2 2x2
6 7 7
6
7 x1 4x4 subnetwork 1

0 1 2 3 4 5 6 7
0 0 0
2x2 2x2 2x2 2x2 2x2
1 x1 1 1
2 2
2 x0 2x2 2x2 2x2 2x2 2x2
3 3
3
4 4
4 2x2 2x2 2x2 2x2 2x2
5 5
5 x1
6 6
6 x0 2x2 2x2 2x2 2x2 2x2
7 7
7

47
12/1/2020 Advanced Computers Architecture
HARDWARE COMPLEXITY OF BENES NETWORK
 Benes network provides an attractive way to implement non-
blocking networks using considerably small number of
switches. Assume that N = 2n,
▪ # of network stages = 2n – 1 = 2 log2(N) - 1
▪ # of 2 x 2 cells in each stage = N/2
▪ Total # of cells = (2 log2(N) -1 )N/2 = Nlog2(N) - N/2
▪ Total # switches = 4 (Nlog2(N) - N/2) = 4Nlog2(N) - 2N
Number of switches used by Various Networks
Ser. # N Cross Bar Clos Net Benes Net
1 2 4 9 4
2 4 16 24 16
3 8 64 69 80
4 16 256 192 224
5 32 1024 543 576
6 64 4096 1536 1408
7 128 16384 4344 3328
8 256 65536 12888 7680
48
12/1/2020 Advanced Computers Architecture
REARRANGEABLE NETWORKS
 Rearrangeable networks are characterized by the property that it is always
possible to rearrange already established connections in order to make
allowance for other connections to be established simultaneously. The Benes
is a well-known example of rearrangeable networks. Figure below shows an
example 8x8 Benes network.
 Two simultaneous connections 110→100 and 010 → 110 are depicted in the
network – Fig-A. In the presence of the connection 110 → 100, it will not be
possible to establish the connection 101 → 001 unless the connection 110 →
100 is rearranged as shown in Fig. B.

Fig. A

Fig. B

49
12/1/2020 Advanced Computers Architecture
COMMONLY USED INTERCONNECTION NETWORKS

50
12/1/2020 Advanced Computers Architecture
SHUFFLE-EXCHANGE NETWORK (SEN)
 Implements both Shuffle and Exchange permutations i.e. nodes
have both the Shuffle and Exchange connectivity.
▪ S =(0) (1 2 4) (3 6 5) (7)
▪ E =(0 1) (2 3) (4 5) (6 7)

P0 P1 P2 P3 P4 P5 P6 P7

Exchange Permutation

P0 P1 P2 P3 P4 P5 P6 P7

Shuffle Permutations

P0 P1 P2 P3 P4 P5 P6 P7

Shuffle-Exchange Direct Connection Network

51
12/1/2020 Advanced Computers Architecture
SHUFFLE-EXCHANGE NETWORK (SEN) …
 The r-dimensional shuffle-exchange graph has:
▪ 2r nodes
▪ 3x2r-1 edges
▪ Label the nodes from 0 to 2r -1 in binary
▪ Two nodes U and V are connected directly by an edge if either:
➢ U and V differ in only in the last bit (exchange edge)
➢ U is a left or right cyclic shift of V (shuffle edge)

2-dimensional shuffle-exchange 3-dimensional shuffle-exchange

52
12/1/2020 Advanced Computers Architecture
SHUFFLE-EXCHANGE NETWORK (SEN) ROUTING
 Any sequence of left rotate (Shuffle) and complement steps
(Exchange) which transforms a source address (in binary) to a
given destination address forms the routing steps for
recirculating SEN. A sequence with a smallest number of
steps thus shall be the shortest path route.
Worked Example
Steps Address Transformation
Step Nodes on the link Binary Addresses Bit Operation
1 1 → 2 by S (Shuffle) 001 → 010 Rotate Left
2 2 → 3 by E (Exchange) 010 → 011 Complement LSB
3 3 → 6 by S (Shuffle) 011 → 110 Rotate Left
4 6 → 7 by E (Exchange) 110 → 111 Complement LSB

53
12/1/2020 Advanced Computers Architecture
SEN BROADCASTING
 The broadcast is a very important operation and the IN’s need
to provide an efficient mechanism for broadcast.
 To implement broadcast, the SEN supports three kind of
operations shown below:
exchange 011 011 011
Node NOP 111
Shuffle 110 110
010 010 010 010 010
101 101
100 100 100 100
001 001
000
Broadcasting Illustration on an 8-node SEN

54
12/1/2020 Advanced Computers Architecture
SHUFFLE-EXCHANGE INTERCONNECTION
 This interconnection scheme permits
dynamic reconfigurations
 Each processing node contains a switch
▪ The switches have four modes of operation
that are controlled by switch control bits.
▪ This makes the routing quite simple as the
address bits are used as control bits.
 Examples of Shuffle-Exchange INs:
▪ Banyan
▪ Benes
▪ Omega
▪ Theta
 Examples of Multiprocessors using
Shuffle-Exchange INs:
▪ BBN Butterfly

55
12/1/2020 Advanced Computers Architecture
EXCHANGE NETWORK
 Routing Algorithm: Node P0 ==> P3
1. route = src XOR dst; Route = 000 XOR 011 = 011
Route[0] == 1, follow E0 to reach P1;
2. for each i = 0,1, …, k-1 do Route[1] == 1 follow E1 to reach P3;
if ith bit of route is 1 Route[2] == 0 ignore;
then follow Ei from current node
else ignore ;

P0 P1 P2 P3 P4 P5 P6 P7

56
12/1/2020 Advanced Computers Architecture
CUBE CONNECTED NETWORK
• Ci(an-1 an-2 … ai … a1 a0) = an-1 an-2 … a’i … a1 a0)
• Routing:
1. C = S XOR D
2. If Ci = 1 move in i-direction else no movement.

C0 P0 P1 P2 P3 P4 P5 P6 P7

C1 P0 P1 P2 P3 P4 P5 P6 P7

C2 P0 P1 P2 P3 P4 P5 P6 P7

P0 P1 P2 P3 P4 P5 P6 P7

Cube Connected / Hypercube Network

57
12/1/2020 Advanced Computers Architecture
CUBE CONNECTED MS NETWORK
 Routing – same Algorithm

0
0 0 0
0
4 2x2 2x2 2x2
1 1
2

2 2
4
2
2x2 2x2 2x2
3 6 6
3

4 1
1 4
2x2 2x2 2x2
5 5
3 5

6 3
5 6
2x2 2x2 2x2
7
7 7 7

58
12/1/2020 Advanced Computers Architecture
THE PLUS-MINUS 2I (PM2I) NETWORK /DATA MANIPULATOR
 The PM2I network, also known as Data Manipulator consists of 2k
interconnection function defined as follows:
▪ PM2+i(P) = P + 2i mod N (0 ≤ i < k)
▪ PM2-i(P) = P - 2i mod N (0 ≤ i < k)

PM2I for N = 8
P PM2+0 = P+20 mod 8 PM2+1 = P+21 mod 8 PM2+1 = P+22 mod 8 PM2-1 = P-21 mod 8 PM2-2 = P-22 mod 8
0 1 = (0 + 1) mod 8 2 = (0 + 2) mod 8 4 = (0 + 4) mod 8 6 = (0 - 2) mod 8 4 = (0 - 4) mod 8
1 2 = (1 + 1) mod 8 3 = (1 + 2) mod 8 5 = (1 + 4) mod 8 7 = (1 - 2) mod 8 5 = (1 - 4) mod 8
2 3 = (2 + 1) mod 8 4 = (2 + 2) mod 8 6 = (2 + 4) mod 8 0 = (2 - 2) mod 8 6 = (2 - 4) mod 8
3 4 = (3 + 1) mod 8 5 = (3 + 2) mod 8 7 = (3 + 4) mod 8 1 = (3 - 2) mod 8 7 = (3 - 4) mod 8
4 5 = (4 + 1) mod 8 6 = (4 + 2) mod 8 0 = (4 + 4) mod 8 2 = (4 - 2) mod 8 0 = (4 - 4) mod 8
5 6 = (5 + 1) mod 8 7 = (5 + 2) mod 8 1 = (5 + 4) mod 8 3 = (5 - 2) mod 8 1 = (5 - 4) mod 8
6 7 = (6 + 1) mod 8 0 = (6 + 2) mod 8 2 = (6 + 4) mod 8 4 = (6 - 2) mod 8 2 = (6 - 4) mod 8
7 0 = (7 + 1) mod 8 1 = (7 + 2) mod 8 3 = (7 + 4) mod 8 5 = (7 - 2) mod 8 3 = (7 - 4) mod 8

59
12/1/2020 Advanced Computers Architecture
PM2I NETWORK FOR N = 8
P0 P1 P2 P3 P4 P5 P6 P7
PM2+0

P0 P1 P2 P3 P4 P5 P6 P7
PM2+1

P0 P1 P2 P3 P4 P5 P6 P7
PM2+2

PM2-0
P0 P1 P2 P3 P4 P5 P6 P7

P0 P1 P2 P3 P4 P5 P6 P7 PM2-1

P0 P1 P2 P3 P4 P5 P6 P7 PM2-2

60
12/1/2020 Advanced Computers Architecture
PM2I (DATA MANIPULATOR) SS NETWORK FOR N = 8

P0 P1 P2 P3 P4 P5 P6 P7

12/1/2020 Advanced Computers Architecture 61


MULTISTAGE DATA MANIPULATOR
 Multistage PM2I network is based on cells under a 2-bit control and
connects the inputs to one of the three outputs for each of the cells.
• Routing c
d e
f
1. Let C(Cs,Ci) = D – S, Cs is a
b g j k
m n
h i l
sign & Ci is the difference.
2. Use table below to follow the 0 0 0 0
link at ith stage 1 1 1 1
Cs Ci Meaning of Control
2 2 2 2
0 0 Use straight link
0 1 Use +2i link
3 3 3 3
1 0 Use straight link 4 4 4 4
1 1 Use -2i link 5 5 5 5
6 6 6 6
2-bit Control
in1 -2i output 7 7 7 7
e
d k j m n
in2 Cell Straight output f c l i
g b
h a
in3 +2i output Stages i = 2 1 0

62
12/1/2020 Advanced Computers Architecture
MULTISTAGE DATA MANIPULATOR ROUTING EXAMPLE
 Let S = 101 and D = 010
▪ C(Cs,Ci) = D - S = 010 – 101 = 1,011
▪ Since Cs=1 and C2=0, use straight link, i.e. move data from node 5
to (5 + 0 = 5) itself.
▪ Cs=1 and C1 = 1 use -2i link to move data from 5 to (5 - 21 =) 3
▪ Cs=1 and C0 = 1 use -2i link to move data from 3 to (3 - 20 =) 2

 Let S = 100 and D = 001


▪ C(Cs,Ci) = D - S = 100 – 001 = 0,011
▪ Since Cs=0 and C2=0, use straight link, i.e. move data from node 4
to (4 + 0 = 4) itself.
▪ Cs=0 and C1 = 1 use +2i link to move data from 4 to (4 + 21 =) 6
▪ Cs=0 and C0 = 1 use +2i link to move data from 6 to (6 + 20 =) 7

63
12/1/2020 Advanced Computers Architecture
BUTTERFLY NETWORK
 Interconnection used in Butterfly network is the butterfly
permutation:
▪ B(an-1an-2 … a1a0) = a0an-2 … a1an-1

64
12/1/2020 Advanced Computers Architecture
MULTISTAGE INTERCONNECTION NETWORKS (MINS)
 A general MIN consists of a number of stages each consisting
of 2x2 SE’s. Stages are connected to each other using Inter-
stage Connection (ISC) pattern as shown below:

ISC 1 ISC 2 ISC n-1

Switches Switches Switches

Fig.: Multistage Interconnection Network

65
12/1/2020 Advanced Computers Architecture
SHUFFLE-EXCHANGE NETWORKS
 A general, shuffle-exchange
networks consist of a sequence
of log2(N) exchange
permutations interspersed with
shuffle or butterfly
permutations.

66
12/1/2020 Advanced Computers Architecture
SHUFFLE-EXCHANGE NETWORKS
 A general, shuffle-exchange networks consist of a sequence of log2(N)
exchange permutations interspersed with shuffle or butterfly permutations.
 Assume that S is the label of an object entering the network, and D is the label
of the destination of that object. We associate a temporary label L with the
object, and this is initially set to S. If we can modify L by a sequence of
permutations so that it becomes equal to D then the object will arrive at its
destination.
 Since the E1 permutation provides us with the choice of inverting the least
significant bit of the input label or leaving it intact, it is possible to use the E1
permutation to make the least significant bit in L equal to the least significant
bit in D. This is the basic step in converting from L to D, and the choice of ε1 or
I permutation determines the switch-node setting in the general exchange box
of Figure 1. The next step is to expose the next bit in L to the E1 permutation,
and this is done most simply by shifting L by one bit. This is directly equivalent
to a perfect-shuffle permutation on all labels L in the range 0 to N, as shown
for N=8 in Figure 2. After n=log2N applications of the shuffle and exchange
permutations all bits in L will have been changed, and L will be equal to D. As a
direct consequence of this, the object located at label L will have been routed
to the output port identified by D, and the network will have performed its
function.

67
12/1/2020 Advanced Computers Architecture
MULTISTAGE SHUFFLE-EXCHANGE NETWORK
 # of Nodes : N = 2n  Diameter : n
 # of stages : log2N = n  No of stages = n
 # of SE’s per stage : N / 2  No of SE’s : O(N log2N)
 # of SE’s : O(N log2N)  Network Type: Blocking

o Routing : at ith stage


o if ith bit of dest is :
o 0 use Straight conn.
o 1 use Exchange conn.

000 000

001 001

010 010

011 011

100 100

101 101

110 110

111 111
Stage - 0 Stage - 1 Stage - 2
68
12/1/2020 Advanced Computers Architecture
BANYAN NETWORK
 The banyan network of Goke and Lipovski [1], denoted by the
composite routing function Yn, can be defined as a sequence
of general exchange and butterfly permutations, thus Yn =
E1β2E1β3 ... βnE1 In this network there are n = log2N stages
each consisting of N/2 active E1 nodes, with successive stages
connected by passive βi permutations. Figure below depicts a
three-stage (8-input, 8-output) banyan network.

69
12/1/2020 Advanced Computers Architecture
BANYAN NETWORK
 A multi-stage IN using 2 × 2 switch boxes and a perfect shuffle interconnect pattern between
the stages
 In the Banyan MIN there is one unique path from each input to each output.
 No redundant paths → no fault tolerance and the possibility of blocking
 The Inter Stage Connections (ISC) are created by shifting right by 1 bit to bit representation
of an input to make the output number.
0000 1 9 17 25 0000

0001 0001
0010 2 10 18 26 0010
0011 0011
0100 0100
3 11 19 27
0101 0101
0110 4 12 20 28 0110
0111 0111
1000 5 13 21 29 1000

1001 1001
1010 6 14 22 30 1010
1011 1011
1100 1100
7 15 23 31
1101 1101
1110 8 16 24 32 1110
ISC-0 ISC-1 ISC-2 ISC-3
1111 1111
Stage - 0 Stage - 1 Stage - 2 Stage - 3 70
12/1/2020 Advanced Computers Architecture
BANYAN NETWORK
 # of Nodes = N = 2n  Diameter = n
 # of stages = log2(N) = n  No of SE’s : O(N log2N)
 # of SE’s per stage = N / 2  Network Type: Blocking

• Routing : at ith stage


• if ith bit of dest is 0 use Upper Output else use Lower Output

Example Routes:
1. 101 → 101 2. 000 → 101 3. 010 → 111

000 1 5 9 000

001 001

010 2 6 10 010

011 011

100 3 7 11 100

101 101

110 4 8 12 110

111 111
Stage - 0 Stage - 1 Stage - 2
71
12/1/2020 Advanced Computers Architecture
OMEGA NETWORK
 The n-stage Omega network of Lawrie [2], denoted by the
composite routing function Ωn, is defined as a sequence of
shuffles and general exchange permutations, thus Ωn = (σnE1)n
Lawrie's Ω-network uses switch-nodes with upper and lower
broadcast capability, and it is worth noting that all stages in
the network are identical. However, it can be seen from Figure
2 that the Ω-network is incapable of establishing connections
from nodes 4 to 4 and 6 to 5 simultaneously. For this reason
the Ω-network is a blocking network. In principle all multi-
stage networks with log N stages are blocking networks,
although techniques for overcoming blockages vary between
implementations.

72
12/1/2020 Advanced Computers Architecture
8X8 OMEGA NETWORK

73
12/1/2020 Advanced Computers Architecture
OMEGA NETWORK
 A multi-stage IN using 2 × 2 switch boxes and a perfect shuffle interconnect
pattern between the stages
 In the Omega MIN there is one unique path from each input to each output.
 No redundant paths → no fault tolerance and the possibility of blocking

000 1 5 9 000

001 001

010 2 6 10 010

011 011

100 3 7 11 100

101 101

110 4 8 12 110
ISC-0 ISC-1 ISC-2
111 111
Stage - 0 Stage - 1 Stage - 2

12/1/2020 74
Advanced Computers Architecture
OMEGA NETWORK CHARACTERISTICS
 # of Nodes : N = 2n
 Diameter : n

 # of stages : log2(N) = n

 # of SE’s per stage : N / 2

 # of total SE’s S : N/2 log2(N) = O(N log2N)

 # of permutations = 2S

 Network Type: Blocking

 Routing : at ith stage

▪ if ith bit of destination is:


o 0 use upper output

o 1 use lower output .

12/1/2020 75
Advanced Computers Architecture
OMEGA NETWORK EXAMPLE
 Example:
• Connect input 101 to output 001
• Use the bits of the destination address, 001, for dynamically selecting a
path
• Routing:
▪ - 0 means use upper output
▪ - 1 means use lower output

76
12/1/2020 Advanced Computers Architecture
THE INDIRECT BINARY N-CUBE NETWORK
 The indirect binary n-cube suggested by Pease [3], denoted
here as Rn, can be defined formally as Rn = E1β2E1β3 ... βnE1σn-1
The indirect binary n-cube, sometimes known simply as the
multistage cube, is very similar to the Ω-network although the
pairs of connections which it is unable to connect are
different from those of the Ω-network. The indirect binary n-
cube is illustrated in Figure 3. Although the shuffle-exchange
class of networks are blocking networks they still have a rich
interconnection structure, capable of supporting a large
number of simultaneous connections, at a relatively low cost.
Many high-performance computers which incorporate a
multi-stage network use some form of shuffle-exchange
switch, for example the Bolt Beranek & Newman Butterfly
machine described in the section on Shared Memory
Multiprocessor
77
12/1/2020 Advanced Computers Architecture
THE INDIRECT BINARY N-CUBE NETWORK

78
12/1/2020 Advanced Computers Architecture
BASELINE NETWORKS
 The network can be generated recursively
 The first stage N × N, the second (N/2) × (N/2),

 The Inter Stage Connections (ISC) are created
recursively using the method of cyclic bit
shifting.
▪ The bit representation of an input is shifted right
by 1 bit to make the output number.
▪ Between stages the next stage is always cut into
2 separate sections. So the second stage would
have 2 sections and the third stage would have 4
sections and so on.
 The switch rules are similar to Omega
networks
 Networks are topologically equivalent if one
network can be easily reproduced from the
other networks by simply rearranging nodes at
each stage.
79
12/1/2020 Advanced Computers Architecture
BASELINE NETWORKS
 Networks are topologically equivalent if one network can be
easily reproduced from the other networks by simply
rearranging nodes at each stage.

80
12/1/2020 Advanced Computers Architecture
COMPARISON OF SINGLE/MULTI STAGE NETWORKS
Network Links/PE Diameter Cost Routing Function
Ring 2 N/2 2N Shift to Neighbor
Mesh 4 2 √N 4N Shift in two dimension
Cube Log2(N) Log2(N) N Log2(N) Hamming Neighbor Shift
Cross Bar N-1 1 N2 Direct Addressing
Shuffle-Exchange 2 Log2(N) 2N (1) Shuffle (2) Exchange
PM2I 3 Log2(N) 3N Plus Minus 2 power I

Network # of Switches # of Stages Switch/Cell based Blocking


Cross Bar N2 1 Switch based No
Clos 3N1.5 3 Switch based No
Benes 4N Log2(N) – 2N 2Log2(N) – 1 2 x 2 Cell based No
Shuffle-Exchange N Log2(N)/2 N/2 2 x 2 Cell based Yes
Hypercube N Log2(N)/2 Log2(N) 2 x 2 Cell based Yes
PM2I N Log2(N)/2 Log2(N) Cell based Yes

81
12/1/2020 Advanced Computers Architecture
BUS-BASED DYNAMIC INTERCONNECTION NETWORKS

 Single Bus Systems


 Multiple Bus Systems
▪ Multiple Bus with Full Bus–Memory Connection (MBFBMC),
▪ Multiple Bus with Single Bus-Memory Connection (MBSBMC),
▪ Multiple Bus with Partial Bus–Memory connection (MBPBMC), and
▪ Multiple Bus with Class-Based Memory connection (MBCBMC).

82
12/1/2020 Advanced Computers Architecture
SINGLE BUS SYSTEMS
 A Single Bus System consists of N processors, each having its own local
cache (to reduce processor–memory traffic), connected by a shared bus.
 All processors communicate with a single shared memory.
 Typical size of such a system varies between 2 to 50 processors. The actual
size is determined by the traffic per processor and the bus bandwidth
(defined as the maximum rate at which the bus can propagate data once
transmission has started).
 Complexity of single bus network, measured in terms of the number of
buses used, is O(1), while the time complexity, measured in terms of the
amount of input to output delay is O(N).
Characteristics of Some Commercially Available Single Bus Systems
Machine Name Max. # Processor Clock Max. Bandwidth
Processors Rate Memory
HP 9000 K640 4 PA-8000 180 MHz 4,096 MB 960 MB/s
IBM RS/6000 R40 8 PowerPC 112 MHz 2,048 MB 1,800 MB/s
604
Sun Enterprise 6000 30 UltraSPA 167 MHz 30,720 2,600 MB/s
RC 1 MB

83
12/1/2020 Advanced Computers Architecture
MULTIPLE BUS SYSTEMS
 A multiple bus multiprocessor system uses several parallel buses to
interconnect multiple processors and multiple memory modules.
 A number of connection schemes are possible in this case. Among
the possibilities are:
▪ Multiple Bus with Full Bus–Memory Connection (MBFBMC),
▪ Multiple Bus with Single Bus-Memory Connection (MBSBMC),
▪ Multiple Bus with Partial Bus–Memory connection (MBPBMC), and
▪ Multiple Bus with Class-Based Memory connection (MBCBMC).
 In general, multiple bus multiprocessor organization offers a
number of desirable features such as high reliability and ease of
incremental growth.
 A single bus failure will leave (B-1) distinct fault-free paths between
the processors and the memory modules. On the other hand, when
the number of buses is less than the number of memory modules
(or the number of processors), bus contention is expected to
increase.

84
12/1/2020 Advanced Computers Architecture
AN ILLUSTRATION OF MULTI BUS CONNECTION SCHEMES
WHEN N=6 PROCESSORS, M=4 MEMORY MODULES, AND B=4 BUSES

Multiple bus with full bus–memory


connection (MBFBMC)

Multiple bus with single bus-


memory connection (MBSBMC);

multiple bus with partial bus–


memory connection (MBPBMC);

multiple bus with class-based


memory connection (MBCBMC).
85
12/1/2020 Advanced Computers Architecture
CHARACTERISTICS OF MULTIPLE BUS ARCHITECTURES

86
12/1/2020 Advanced Computers Architecture
BUS SYNCHRONIZATION
 A bus can be classified as synchronous or asynchronous.
 The time for any transaction over a synchronous bus is known
in advance. In accepting and/or generating information over
the bus, devices take the transaction time into account.
 Asynchronous bus, on the other hand, depends on the
availability of data and the readiness of devices to initiate bus
transactions.

87
12/1/2020 Advanced Computers Architecture
BUS SYNCHRONIZATION
 In a single bus multiprocessor system, bus arbitration is required in
order to resolve the bus contention that takes place when more
than one processor competes to access the bus.
 In this case, processors that want to use the bus submit their
requests to bus arbitration logic. The latter decides, using a certain
priority scheme, which processor will be granted access to the bus
during a certain time interval (bus master).
 The process of passing bus mastership from one processor to
another is called handshaking and requires the use of two control
signals:
▪ Bus request – (indicates that a given processor is requesting mastership),
and
▪ Bus grant – (indicates that bus mastership is granted).
 A third signal, called bus busy, is usually used to indicate whether or
not the bus is currently being used.

88
12/1/2020 Advanced Computers Architecture
BUS HANDSHAKING MECHANISM

89
12/1/2020 Advanced Computers Architecture
BUS SYNCHRONIZATION ARBITRATION LOGIC SCHEMES
 In deciding which processor gains control of the bus, the bus
arbitration logic uses a predefined priority scheme. Among
the priority schemes used are:
▪ Random priority,
▪ Simple Rotating priority,
▪ Equal priority, and
▪ Least Recently Used (LRU) priority.
 After each arbitration cycle, in simple rotating priority, all
priority levels are reduced one place, with the lowest priority
processor taking the highest priority.
 In equal priority, when two or more requests are made, there
is equal chance of any one request being processed.
 In the LRU algorithm, the highest priority is given to the
processor that has not used the bus for the longest time.
90
12/1/2020 Advanced Computers Architecture
PERFORMANCE COMPARISON
Performance Comparison of Dynamic Networks

Performance Comparison of Static Networks

91
12/1/2020 Advanced Computers Architecture

You might also like