0% found this document useful (0 votes)
63 views15 pages

Unit-3.3 Dynamic Interconnection Network

Uploaded by

developerads134
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views15 pages

Unit-3.3 Dynamic Interconnection Network

Uploaded by

developerads134
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Dynamic interconnection N etworks

• Dynamic interconnection networks are connections between processing nodes and memory nodes
that are usually connected through switching element nodes.
• It is scalable because the connections can be reconfigured before or even during the execution. of a
parallel program. ,~
• Here instead of fixed connections, the switches or arbiters are used.
• The dynamic networks are normally used in shared memory (SM) multiprocessors.
• These networks use configurable paths and do not have a processor associated with each node.
• Switches or arbiters must be used along the connecting paths to offer the dynamic connectivity
instead ofusing fixed connections.
• The price tags of dynamic interconnection networks are attributed to the cost of wires, switches,
arbiters and connectors required.

!
Shar9dpath .SWltchlng_

.A ·
~bus ~ltlple
• .,-1'
- •--;,..,~networks

'

Qossbar
\
.
'

Multistage
networks

, Figure 18.2 , Classification of dynamic inCeroonneetion networlcs.

Shared path N etworks: Shared bus


Sha~ed bus organiz.aticin is simply extension of the buses employed in uniprocessors. It contain same bus
lines(address, data, controi intenupt) and some additional ones to solve the contention on the bus when
several processors simultaneously want to use the shared bus. These lines are called arbitration lines and
play a.crucial role in the implementation of shared buses.
. '
Secondly, the shared bus is a very cost-effective interconnection scheme. It can be raising the number of
0

processors_does not improve the price of the shared bus. However, the contention on -the shared bus
represents a strong limitation OQ the number of application processors.
Obviously, as tlie number of processors on the bus increases, the probability of contention also increases
proportio·nally, reaching a point when the whole bandwidth of the bus is exhausted by the processors, and
hence, adding a new processor will not' cause any potential speed-up in the multiprocessor. One of the main
' · design issues in, shared bus multiprocessors is the enhancement of the number of applicable processors by
different methods. . .
I

There are the three most importanttechniques are as follows -


I •

• It can be introduc~g private memory.


• It can be introducing coherent cache memory.
• It can be introducing multiple buses.
Without these improvements, the applicable number of processors is in the rang~ of 3-5. By introducing
private memory and coherent cache memory, the number of processors can _be mcreased by an order of
magnitude up to 30 processors. Bus hierarchies open the way to constructmg scalable shared memory
systems based on bus interconnection.
According to the state of the bus request lines and the applied bus allocation policy, the arbiter grants one of
the requesters via the grant lines.

... ;,,,.
t'k t,lf .. ·.• ~ Mn .
,

! • '• • j •
'
, .' ' ' '
. :,,
.. Address
,,,
' :~ .. ..
.
• I,

-~.,
Bus-
,,arbiter·, -::_
' and .
'
, i, .
; ., .
I'
·,.
.
• ~.

.,
.
,,~
.. ~
a
-
Control
.-·
- i • '· '
.
,
.logk:; ;-,
•· ' - tnferrupt
-
q; , 1; ,., ·- Bus eltdwlge fines
'. ,

J: . - .

.t~
,e, ~e.
1
bu..s .,;- .,-{ .,.- r -' I, ; I, I,·.·

~~1re r 'YYlO re.. b-tU.

!c:rtf.
110 ,,II()
&~tau. 't ······•' ' ffl [email protected] e.J
wtJ~ M .\b
~of bus mutti , . . , , -~..... , , ,,, ,' J~.uJ 'h10.a ~'r~ c,v, l.t
·. ·· - · -•~ · _,,..,...,.._C-ll-' ctllo~{um f o r ~ r ,
Ahhough the uniprocessor and muhiprocessor buses are very similar. There is an important difference in

t
their mode of operation. Uniprocessor and first-generation multiprocessor systems use locked buses.e.g:-
MultiBus, VMEbus The second-generation muhiprocessor used pended buses.
A memory write access need two phases that are as follows - , ,1,.

Phasel :- The address and data are transferred i.e., bus to the memory controller.] 3-lf r~
tho...~.:tt.i-oe& ~-.
Phase2:- The memory write operation including parity check, error correction, and so on is executed by the
memory controller.

P4

P3
P2
P1
The exploitation of the fast bus en be further improved by optimizing memory read access. A memory read
can access three phases are as follows -
• The address is transferred via bus to the memory controller.
• The memory read operation is executed by the memory controller.
• The data is transferred via bus to requesting processor.

(ia)
.
Time

P-1ocaee<>1"8

(c)

.
Legend: -p......-1:edcl
. _ _. _ _ _ u.-d
. ..

-~3:~bt;aalaueed
of ..-ead band.wldc:ba ofl':varaous t,paea. (•> ~emocy 1Cead on lQCked
~-b!:;5(:b) eomparitH>D
!,.• . ..
S'C08d pended_ bus; {c) snom.ory road on aplit.-tranaaed.on baa. -- -
OL
,.-. .

the three phases must be exectued sequentially. but in multiprocessor, we can reduce three phases into two
by combining phase 1 and phase 3(as phase 1 does not require data bus and phase 3 does not require address
bus). hence , combing the first phase and third phas of two memory reds, executed on different memory
units
,: \
\ , "\.

V- . "'
\.;

- --..,,
I

Arbiter Logics
Bus Arbitration refers to the process by which the current bus master accesses and then leaves the control of
the bus and passes it to another bus requesting processor unit. The controller that has access to a bus at an
instance is known as a Bus master.
A conflict may arise if the number of DMA controllers or other controllers or processors try to access the
common bus at the same time, but access can be given to only one of those. Only one processor or controller
can be Bus master at the same point in time. To resolve these conflicts, the Bus Arbitration procedure is
implemented to coordinate the activities of all devices requesting memory transfers. The selection of the bus
master must take into account the needs of various devices by establishing a priority system for gaining access
to the bus. The Bus Arbiter decides who would become the current bus master.
There are two approaches to bus arbitration:
~cJ., bu.A ~ter , 'Y\rl,iti.." •.,,,, v.-f- ')-e..QA d
I. Centralized bus arbitration - -- - al
A single bus arbiter perfonns the required arbitration. 0/'I"' •At.. , tt,
Q»\ha..ta~r.
2. Distributed bus arbitration -
All devices participating in the selection of the next bus master.

Methods of Centralized BUS Arbitration ,...


There are three bus arbitration methods:
(i) Daisy Chaining method -
It is a simple and cheaper method where all the bus masters use the same line for making bus requests. The bus
grant signal serially propagates through each master until it encounters the first one that is requesting access to
the bus. This master blocks the propagation of the bus grant signai therefore any other requesting module will
not receive the grant signal and hence cannot access the bus.
During any bus cycle, the bus master may be any device - the processor or any DMA contro lier unit, connected
to the bus.
Advantages -
Simplicity and Scalability.
The user can add more devices anywhere
. ~~ ~.ft:·. along the chain, up to a certain maximum
_....... . . \
ii
'?
---- value.
1 '• Device m BGT
1 8GT . ,2 BGT Disadvantages -
The value of priority assigned to a device
depends on the position of the master bus.
Bus Propagation delay arises in this method.
BRQ
If one device fails then the entire system
will stop working.
SACK

Daisy chained bus arbitration


· (ii) Polling or Rotating Priority
method-
In this, the' controller is used to generate
the address for the master(unique

---
priority), the number of address lines Device 1
Devlcem BGT
BGT
required depends on the number of , BGT
masters connected in the system. The
controller generates a sequence of BRQ
master addresses. When the requesting
master recognizes its address, it SACK
activates the busy line and begins to use
the bus.
Rotating priority bus arbitration
Advantages -
This method does not favor any particular device and processor
The method is also quite simple. .
If one device fails then the entire system will not stop workmg.

Disadvantages - • ·t
• Adding bus masters is difficult as increases the number of address lines of the crrcm ·

(iii) Fixed priority or Independent Request method - . . .


In this, each master has a separate pair of bus request and bus grant lines and each patr has a pnorrty assigned
to it.
The built-in priority decoder within the controller selects the highest priority request and asserts the
correspond,ing bus ;grant signal ·
• I l • •
. I l '
'
t l f ,....

"BRQ1
Advantages -
~~Ve..oiit.!t BGT1 Device 1
-----._ .

r~o,it~
• ,-- ,
This method
l t, -4 BRQ2 generates a fast
Bus arbiter
BGT2 response.
Devlce2
Disadvantages -
deC(;Jdl'f ••• Hardware cost is
BRQ.m • high as a large no .
BGTm Devlcem of control lines is
required.

Fixed priority bus arbitration method


Distributed BUS Arbitration .:
In this, aU devices participate in the selection of the next bus master. Each device on the bus is assigned a 4bit
identification number. The priority of the device will be detennined by the generated ID.

Multiple Shared Bus


The limited bandwidth of the single shared bus represents a major limitation in building scalable multiprocessors.
There are several ways to increase the bandwidth of the interconnection network. A natural idea is to multiply the
number of buses, like the processors and memoty units. Four different ways have been proposed for connecting
buses to the processors, memoty units and other buses are as follows -
1-dimensional multiple bus system - The simplest generalization of the single bus system towards a multiple
bus system is the I-dimension multiple bus system as shown in the figure. This approach leads to a typical
uniform memoiy access (UMA) machine where any processor can access any memoiy unit through any of the
buses.The employment of the I-of-N arbiters is not sufficient in such systems. Arbitration is a two-stage process
in I-dimension multiple bus systems. First, the I-of-n arbiters (one per memoty unit) can resolve the conflict
when multiple pr6cessors
\
need unshared access to a similar shared memoty unit.
After the first stage m (out of n) processors can get access to one of the memoiy units. However, when the
structure -o f a 1-dirnenslon multiple bus multiprocessor multiple buses (b) is less
than the number of
B1 memory units (m), the
second stage of aibitration
...___"_•··..----..--•
B2 is required where an
additional b-of-m aibiter
is employed to assign
+--ti~--~._____;...___ ---4~--4~--. Bb buses to those processors
that strongly retrieved
access to a memory unit
P1 P2 Pn M2 ··· Mm
or 3 dimension s s terns - A further generalization of the ]-dimension multiple buses is the introduction of
4he second and third dimensions. In these systems, multiple buses compose a grid interconnection network. Each
ptocessor node is linked to a row bus and a column bus.
Process?rs along a row or column constitute a conventional single bus multiprocessor. The memory can be
distributed in several ways. The most traditional approach is to attach memory units to each bus. The main
problem of these architectures is the maintenance of cache coherency.
Ouster bus system - The third alternative to introduce several buses into the multiprocessor is the cluster
architecture which represents a NUMA machine concept. The main idea of cluster architectures is that single bus
multiprocessors, called clusters, are connected by a higher-level bus. Each cluster has its local memory.
The access time of a local cluster memory is much less than the access time of a remote cluster memory. Keeping
the code and stacks in the cluster memory can significantly reduce the need to access remote cluster memory.
However, it turned out that without cache support this structure cannot avoid traffic jams on higher-level buses.
Hierarchical bus s~tem -There is another natural generalization of the single bus system is the hierarchical bus
system where singfo bus 'supemodes' are connected to a higher-level bus via a higher-level cache or
'supercache'.

- -
c0'c.u-i+c~ Ntdwo *:.:
Cross bar' N etworks

C:rossbar networks allow any pr~ r in the system to connect


to any other processor or memory unit so that tnany f '.l=- ~~~[S....n-,.r:1-n-.n Swttch
processors can oommunicate simultaneously without N p~ / Po\nt
contention. A new connection can be established at any fo~ts time as
long as the requested input and output ports are free. ('l Crossbar
networks are used in the design of high-perfotmance small-
scale multiprocessors, in the design of routers for direct
l, fY' • .•
networks, and as basic components in the design oflarge- f •
scale
, f M Ooq:N:U
indirect networks. A crossbar can be defined as a
switching network with N inputs and M outputs, which allows up to min{N, M} one-to-one interconnections
without contention. Figure 1.9 shows an N x M crossbar network. Usually, M =N except for crossbars ·
connecting processors and memory modules.

The cost of~~ch a_ network is O(N M), which is prohibitively high with large N and M. Crossbar networks have
been traditionally used in small-scale shared-mem ory multiprocessors, where all processors are allowed to
access memories simultaneously as long as each processor reads from, or writes to, a different memory. When
two or more processors contend for the same memory module, arbitration lets one processor proceed while the
others wait. The arbiter in a crossbar is distributed among all the switch points connected to the same output.
However, the arbitration scheme can be less complex than the one for a bus becau~e conflicts in crossbar are the

For a crossbar network with distributed


controL each switch point may have four
as shown in Figure 1.10. In Figure 1.lO(a),
r---+--.
__......,..,
-ffi-
exception rather than the rule, and therefore easier to resolve.

_.·· .- states,
the
inpubt from the fdow containinthg the switch - • J point
h as een grante access to e :....,. · _:...
(a) -(h) (e) (d)
corresponding output, while mputs from ,· upper
rows requesting the same output are blocked. In Figure 1.1 O(b), an input from an upper row has been granted
access to the output. The input from the row containing the switch point does not request that output and can be
propagated to other switches. In Figure 1.10(c), an input from an upper row has also been granted access to the
output. However, the input from the row containing the switch point also requests that output and is blocked.
The configuration in Figure 1. JO(d) is only required if the crossbar has to support multicasting (one-to-many
communication).

The advent of VLSI permitted the integration of hardware for thousands of switches into a single chip.
However, the number of pins on a VLSI ·chip cannot exceed a few hundred, which restricts the size of the
largest crossbar that can be integrated into a single VLSI chip. Large crossbars can be realized by partitioning
them into smaller crossbars, each one implemented using a single chip. Thus, a full crossbar of size N x N can
be implemented with (N/n)(N/n)n x n crossbars.

,'
Functional
. design of a cross bar switch
. connected to one memory module 1s . shown ·m fi gure. Th e crr· cuit
contams multiplexers which choose the data, . } Data, Addres~ and
address, and control from one CPU for
Data
-
control trom CPU 1

communication with the memory module. -,, } Data, Address and


Arbitration logic established priority levels to _ Address
Multiplexers
and
Control
I""<
from CPU 2
Memory
select one CPU when two or more CPUs attempt Module , .Read!Write
Arbltratlon
Logic
-; }
Data. Address and
to access the same memory. The multiplexers can "'l
..- control from CPU 3
Memory
be handled by the binary code which is produced .:
Enable --,. } Data. Address and
by a priority encoder within the arbitration logic.
I~
"'l control fTom CPU 4

Figure - Crossbar Swltch

Advantages of Crossbar Networks:


(
1. It is a Non Blocking Network that allows multiple i/o connections to be achieved
simultaneous1y.
2. It provides full connectivity, i.e., any permutation can be implemented using a crossbar.
-3. It is highly useful in a multiprocessor system, as all processors can send memory requests
independently and
4. It gives . maximum utilization of bandwidth as compared to other networks like bus
system, multistage networks, etc.

Multistage Switching N etwork:


. -

The 2x2.crossbar switch is used in the multistage network. It has 2 inputs (A & B) and 2 outputs (0 & 1). To
establish the connecti<ili between the input & output terminals, the control inputs CA & CB are associated.
The input is connected to _Q__ou~ut if the control
input is O& the input is connected to I outputif
the control input .is 1. This switch can arbitrate
between conflicting requests. Only 1 will be
connected if both A & B require the sa~e output
"
. "- • a
0

.
Be \
~:j e.,- 1
r:.: terminal, the other will be blocked/ rejected .

We can construct a multistage network using


2x2 switches, in order to control the
)M( i .-oryNnd'l>La

B -=--
.....12

. C? ~ .;
communication between a number of sources &
destinations. Creating a binary tree of cross-bar
switches accomplishes the connections to
connect the input to one of the 8 possible
~o
destinations.

pA & PB are 2 processors connected to 8


memory modules in a binary way from 000(0) to
111(7) through switches. To choose output in a
- - ~I
I

level, one bit is assigned to each ofth 3 I I Th .


e eve s. ere are 3 bits in the d f f
the output of the sw.itch in 1st level 2nd bit in 2nd I I & 3 db" . es ma ton number: 1st bit determines
' eve r tt m the 3rd level.
Example: If the source is· PB & the dest· t· .
· ma ton 1s memory modul 011 ( · h fi
PB to O output in 1st level output 1 in 2 d 1- & . e as mt e tgure): A path is formed from
' n eve1 output 1 m 3rd level.
Usually, the processor acts as the source and the mem . . . . . .
The destination is . o~ unrt acts as a destmatton ma tightly coupled system.
a memory module. But, processmg umts act as both the source and the destination in a
IooseIy coupled system. '

Many patterns can be made using 2x2 switches such as Omega networks,Butterfly Network, etc.

Interconnection. structure can decid e the overa II system


· ' s performance ·m a multi-processor
· environment. To
overcom~ the disadvantage of the common bus system, i.e., availability of only 1 path & reducing the
complexrty (crossbar have the complexity ofO(n2))of other interconnection structure, Multi-Stage Switching
ne~ork cam~. They used smaller switches, Le., _2 x2 switches to reduce the complexity. To set the switches,
r<?uting algorithms can be used. Its complexity and cost are less than the cross-bar interconnection network.

Butterfly network
a.
A butterfly network is technique to link multiple computers into a high-speed network. This form
of_1m1histage interconnedion network topology can be used to connect different nodes in
a multiprocessor system. The interconnect network for a shared memory multiprocessor system must have
low latency and high bandwidth unlike other network systems, like local area networks (LAN,s) or intet111etl1l for
three reasons:Messages are relatively short as most messages are coherence protocol requests and responses
without data.Messages are generated frequently because each read-miss or \\'Tite-miss generates messages to
eveiy node in the system to ensure coherence. Read/write misses occur when the requested data is not in the
processor's cache and must be fetched either from memory or from another processor's cache.Messages are
generated :frequently, therefore rendering it difficult for the processors to hide the communication delay.
Butterfly network building
For a butterfly network with p processo!:_nodes, there need to be p(log2 p + 1) switching nodes. Figure 1 shows a
network with 8 processor nodes, which implies 32 switching nodes. It represents each node as N(rank, colump
number). For example, the node at column 6 in rank 1 is represented as (1,6) and node at column 2 m rank O1s
r~presented as (0,2).P1
For any 'i' greater than zero, a switching node N(i,j) gets connected to ~(i-1,j) and N~i-1, m), where, mis .
inverted bit on jth location ofj. For example, consider the node N(l,6): 1 equals 1 and J equals 6, therefore m 1s
th
obtained by inverting the i bitof6.
As a result, the
' · _Binary Decimal nodes connected
I Variable representation Representation to N(l,6) are :
I N(ij) N(i-lj) N(i-1,m)

I
I.
iJ
I
i
I
i
,~ 6

(1,6) (0,6)
--•·l

(0,2)
·---
Thus, N(0,6),
N(l,6), N(0,2),
N(l,2) form a
butterfly pattern.
Several butterfly
Im
I
I 010
l
2 patterns exist in
the figure and
Network.
therefore, this network is called a Butterfly
V \ ~-
-~ :-.;

~- "
4,i -~
\ ",,' (

' ~: {;,
.,,q, § 'tr
Butterfly network routing •,

In a wrapped butterfly network (which means rank Ogets merged with rank 3), a message is sent from processot . .f lru j
5 to processor 2.(3] Ip. figure 2, this is shown by replicating the processor nodes below rank 3. The packet f

./
n~ ~i

r·· . - ---------,

( • i•:)
------~~·y•.,

. . ..... .·"'. Gi) l OQ no tll


transmitted over the link follows the form:

Header Pay load Trailer

The header contains the destination of the niessage, which is processor 2 (010 in binary). The payload is the
message, Mand trailer contains checksum. Therefore, the actual message transmitted from processor 5 is:

010 M checksum
Upon reaching a switching node, one of the two output links is selected based on the most significant bit of the
destination address. If that bit is zero, the left link is selected. If that bit is one, the right link is selected.
Subsequently, this bit is removed from the destination address in the packet transmitted through the selected
link. This is shown in figure .The above packet reaches N(0,5). From the header of the packet it removes the
leftmost bit to decide-the direction. Since it is a zero, left link ofN(0,5) (which connects to N(l,l)) gets
selected. The ,wheader is '10'.The new packet reaches N(l,1). From the header of the packet it removes the
....leftmost bit to decide the direction. Since it is a one, right link ofN(l,l) (which connects to N(2,3)) gets
selected. nfenew header is 'O'.The new packet reaches N(2,3) . From the header of the packet it removes the
leftmost bit to decide the direction. Since it is a zero, left link ofN(2,3) (which connects to N(3,2)) gets
selected. The header field is empty.
Processor 2 receivJs-the packet, which now contains only the payload 'M' and the checksum.

Butterfly network parametersSeveral parameters help evaluate a network topology. The prominent ones relevant
in designing large-scale multi-processor systems are summarized below and an explanation of how they~
calculated for a butterfly network with 8 processor nodes

'

..
.)
I
)
\
Bisection Bandwidth: Them . . . . .
network Th" b . axunum bandwidth required to sustam communication between all nodes in the
• 1s can e interpreted as the m · · b f 1· ks h
tw 0 1 mimum num er o m t at need to be severed to split the system into
_ equa portions. Fo~ example, the 8 node butterfly network can be split into two by cutting 4 links that
crisscross across the ~1ddle. Thus bisection bandwidth of this particular s~stem is 4J t is a representative
measu~e of the bandwidth bottleneck which testricts overall communication.

Diameter: The worst case .latency (between two nodes) possible in the system. It can be calculated in terms of
network hops, which is the number of links a message must travel in order to reach the destination node. In the
8 node butterfly network, it appears that N(O;O) and N(3,7) are farthest away, but upon inspection, it is apparent
that due to th_e symmetric nature of the network, trav5.sing from any rank Onode to any rank 3 node requires
only 3 hops. Therefore, the diameter of this system iV\
Links: Total number of links required to construct the entire network structure. This is an indicator of overall
cost and complexity of implementation. The example network shown in figure 1 requires a total of 48 links (16
links each between rank{) and 1, rank l and 2, rank 2 and 3). --

Degree: The complexity of each router in the network. This is equal to the number of in/out links connected to
each- switching node. The butterfly network switching nodes have 2 input links af?d 2 output links, hence it is a
4-degree network. '

Comparison with other network topologies


.
Th is section ,c ompares the butterfly network with linear array, ring; 2-D mesh and hypercube networks.[7] Not~
that linear array can be considered as a 1-D mesh topology. Relevant parameters are compiled mthe table[8]
('p' represents the number-0f processor nodes).
' '· •1' "'
Network parameters

Topology Diameter Bisection lJandwidth Links Degree

- I
Linear array p-1 p-1 2

Ring p/2 2 p 2 \
2-Dmesh - 1) - 1) 4
4

Hypercube log2(p)p/2 , Iog2(p) x (p/2) log2(p)

Butterfly log2(p)2-"h Iog2(p) x 2p "'4· .. ."'

Advantages , .,

1 . l"k r
array ring and 2-D mesh. This
Butterfly networks have !9wer diamete,& than other topo ogtes 1 ea mea~d rea~h·itl~estination in a lower
implies that in butterfly network, a message sent from of Yprocessor wou ,- 'I
number of network hops. ·

, . have ~igber bisection b~nd)Y.,idth than other topologies .. This_inipl~s ~hat_ in butterfly
Butterfly
network, ane~ghorks
h1 er numb er• o f 1m
· ks need to be broken in order to prevent
• global commurucat10n.

It has a bigger computer range.


Disadvantages

Butterfly netwo~ks are more complex and costlier than other topologies due to the higher number of links
required to sustain the network.

The difference between hypercube and butterfly lies within their implementation. Butterfly network has a
symmetric structure where all processor nodes between two ranks are equidistant to each other, whereas
hypercube is more suitable for a multi-processor system which demands unequal distances between its nodes.
By looking at the number of links required, it may appear that hypercube is cheaper and simpler compared to a
butterfly network, but as the number of processor nodes go beyond 16, the router cost and complexity
(represented by degree) of butterfly network becomes lower than hypercube because its degree is independent
of the number ofnodes. ·

In conclusion, no single network topology is best for all scenarios. The decision is made based on factors like
the number of processor nodes in the system, bandwidth-latency requirements, cost and scalability .

,..., ....
-t'

~-A-M~~
. [ , ft ... j
..,
I \
....
~.s I

1r--..
r
J
,
'
r~ •
' .. [/ , 'A
/'
Bsc.v ] .)~1
,.. -

,.,' _.4'
f•,rblk-< _,.
'T' .
I
MJ

&wl.t-z:J?
'F<r) , ~-
,

' t
!
'-
, I\
'1-

. :. . ... I•
'
,
/
• I~
,.

,II',"' ,,. _ ML
' -

t
...., I

Omega N etwork
An 8x8 Omeg::i network is a multistag~ interconnection network, tneaning that processing elements (PEs) are
connected using multipb stages nf switches. Inputs and outputs are given addresses as shown in the figure. Toe
outp1_.rs from each ::.tage are co~11ccted to the inputs of the next stage using a perfect shuffle connection system.
This means that the connections at ~ach r.tage rep~esent the movement of a deck of cards divided into 2 equal
decks and. then shuffled together, wtth each card from one deck alternating with the corresponding card from the
other deck. In tem1s of binary representation of the PEs, each stage of the perfect s1mffle can be thought of as a
cyclic logical left shift; each bit in the address is shifted once to the left, with the most significant bit moving to
the least significant bit.1be Omega Network is highly blocking, though one path can always be made from any
input to any output in a free network _ I
TI1ere is some request design, however, cannot be linked together. For example, any two sources cannot l~c , 1
linked together to destinations 000 and 001.
. /
• The interconnections between the stages in 11 Omega N etwork are defined by the "rotate left" of the
bits used in the port IDs.
Example:

.An 8 x 8 Omega Netw01~: is i:1terr.onnected as follows:

ooo -~->- o.oo .


001.
:~->-c~o-: -~; >. ~~ol
- --> Q.!_Q, ----:-. ..!Q.Q_ ---> O_Ql ij
8\~8OMEGA NETVvORK
· o:;.o ---> roo -•·--> 001 ---> 010 0
011 ---> 110 ---> 101 ---> 011
ioo --->
101 --->
· 110 --->
001 ---~ 010
Cll ---> 110
101 ---> 011
--->
--->
--->
100
101
110 .
\\:r
111 ---> 111 ---> 111 ---> 111
1n·llilia--- ·:1:1s-•t1--iMi~ ~ ~
1111-lill·

3
• How ~o read the figure:
o Pick a number al the left (e.g., 4 = 100)
o Roiare left: 100 ---> 001 (= 1) I '\ _

>
o Connect 4 to I

Ycu have to do this in every st:1gc l


·'
x-
'\
/ \.:,
ti
7

At each stage, adjacent pairs of inputs an: comectcci ~o a simple exchange elc111~11t, ,, hid1 can b~ scr eithr::r
straight (pass inputs directly through to cr:.itput:;) or crossed (send top input !o bollc,111 output, aud vice versa).
For N processing element, an Omega network contains N/2 ~witches at each s~", unj l,0g2N stages. 111~
manner i.ri which these switches are set detennir.es the c·onnection paths availa'i:lle in the network at any given
time. Two such mclhods are destination-tag routing and XOR-tag routing, discussed in detail below.

f'-_\;: +
N : '3
'I t---1( ::> Lf
...Qo!). N -:.
,,

~.,:~ ·t~1 o:.:~~bf

D 2.
B

Cl r·
I

- -
-- ,
\-.J

1
000
J;:i'.1'_ 000
At'
,, 01
001 001
010 010
A2 :1 a2 02
011 '\ 011

100
Aa , ··,$~: ca, (ioi\
101 j, 101
' ·,.
"

110 ,....
110
A4 84 04
111

---
111

Destination-tag routing
d~stinatio~-tag routin~, s~itch setting~ are determined solely by the message destination. The most
s _gn~cant b~t ?f the destmatlon address is used to select the output of the switch in the first stage; if the most
s~gn~cant !J~t is ~• the upper output is selected, and if it is 1, the lower output is selected. The next-most
significant bit of the destination aadress is used to select the output of the switch m the next stage and so on
' umil the fmal output has been selected. '
For ~x~pl~, if a message's ~estinati?n is PE 001, the switch settings are: upper, upper, lower. If a message's
destmatlon is PE 101, the switch settmgs are: lower, upper, lower. These switct. settings hold regardless of the
PE sending the message.

XOR-tag routing
In XOR-tag routing, switch settings are based on (source PE) XOR (destination PE). This XOR-tag contains ls
in the bit positions that must be swapped and Os in the bit positions that both source and destination have in
common. The most significant bit of the XOR-tag is used to select the setting of the switch in the first stage; if
the most significant bit is 0, the switch is set to pass-through, and if it is 1. the switch is cross~~- The next-most
significant bit of the tag is used to set the switch in the next stage, and so on until the final output has been

selected.
For example, if PE 00 l wishes to send a message to PE@ he XOR-tag will be~d the appropriate

- --
switch settings are:A2 straight, B3 crossed, C2 crossed.

Applications
In multiprocessing, omega networks may be used as connectors between the CPUs and their shared memory, in
order to decrease the probability that the CPU-to-memory connection becomes a bottleneck.
This class of networks has been built into the Illinois Cedar Multiprocessor, into tl1e IBM RP3, and into the
NYU Ultracomputer

You might also like