Full Text 01
Full Text 01
%\
0DJQXV-RQVVRQ
Centre for
Department of Computer Systems Architecture
Computer Engineering Halmstad University
Chalmers University of Technology Box 823
SE-412 96 Göteborg SE-301 18 Halmstad
Sweden Sweden
1
High Performance Fiber-Optic Interconnection Networks
for Real-Time Computing Systems
Magnus Jonsson
Göteborg 1999
ISBN 91-7197-852-6
Doktorsavhandlingar vid
Chalmers Tekniska Högskola
Ny serie 1548
ISSN 0346-718X
2
3UHIDFHDQG$FNQRZOHGJHPHQW
My work concerning fiber-optic network architectures and protocols for such
networks was initiated already in my master thesis work at the Centre for
Computer Systems Architecture (CCA), Halmstad University, finished in
September 1994 and supervised by Kenneth Nilsson. Recently after that, I
was registered as a doctoral candidate at the Department of Computer
Engineering (CE), Chalmers University of Technology, and the work
towards a Ph.D. thesis could start. The whole work was performed at
Halmstad University but with support in various ways from Chalmers.
The work reported in this thesis has been part of two projects: (i) the
REMAP project, financed by NUTEK, the Swedish National Board for
Industrial and Technical Development, and (ii) the PARAD project, first
financed by the Swedish Ministry of Education in c-ooperation with Ericsson
Microwave Systems AB (EMW), later by the KK Foundation in co-operation
with EMW. Both CCA and CE have taken part in the two projects.
I also express my gratitude to those (not mentioned above) for whom I have
supervised master or bachelor thesis work in the scope of the projects:
Henrik Arleving, Ali Jamshid Far, Bassam Mahmoud, and Andreas
Sundgren. OPTOBUS and ECL components for the bulding of a prototype
have been sponsored by Motorola and National Semiconductor, respectively.
Last, I want to encourage curious people who have in mind taking a Ph.D.
It´s a lot of work but it gives experiences well worth it.
3
$EVWUDFW
Parallel and distributed computing systems become more and more
powerful and hence place increasingly higher demands on the networks that
interconnect their processors or processing nodes. Many of the applications
running on such systems, especially embedded systems applications, have
real-time requirements and, with increasing application demands, high-
performance networks are the hearts of these systems. Fiber-optic networks
are good candidates for use in such systems in the future.
The WDM star architecture is attractive but its future success depends on
components becoming more commercially mature. Fiber-ribbon links,
offering instead an aggregated bandwidth of several Gbit/s, have already
reached the market with a promising price/performance ratio. This has
motivated the development and investigation of two new ring networks
based on fiber-ribbon links. The networks take advantage of spatial
bandwidth reuse, which can greatly enhance performance in applications
with a significant amount of nearest downstream neighbor communication.
One of the ring networks is control channel based and not only has support
for real-time services like the WDM star network but also low level support
for, e.g., group communication.
The approach has been to develop network protocols with support for
dynamic real-time services, out of time-deterministic static TDMA systems.
The focus has been on functionality more than pure performance figures,
mostly on real-time features but also on other types of functionality for
parallel and distributed systems. Worst-case analyses, some simulations,
and case studies are reported for the networks. The focus has been on
embedded supercomputer applications, where each node itself can be a
parallel computer, and it is shown that the networks are well suited for use
in the radar signal processing systems studied. Other application examples
in which these kinds of networks are valuable are distributed multimedia
systems, satellite imaging and other image processing applications.
5
&RQWHQWV
Preface and Acknowledgement......................................................................... 3
Abstract ............................................................................................................. 5
Contents............................................................................................................. 7
List of Figures ................................................................................................. 11
List of Tables ................................................................................................... 15
7
2.6.5 Interleaved TDMA ..................................................................... 54
2.6.6 FatMAC ...................................................................................... 55
2.7 Conclusions ....................................................................................... 55
3. Interconnections in Parallel Computers................................................ 57
3.1 Design and Performance Parameters.............................................. 57
3.2 Static Networks ................................................................................ 61
3.3 Dynamic Networks ........................................................................... 63
3.4 Shared Medium Networks ............................................................... 65
3.5 Hybrid Networks .............................................................................. 66
3.6 Group Communication ..................................................................... 66
3.7 Routing.............................................................................................. 67
3.8 High Performance Networks for Coarse Grained Parallel
Computers......................................................................................... 68
8
7.1.1 Circuit Switched Networks...................................................... 106
7.1.2 Packet Switched Real-Time WDM Networks ......................... 106
7.1.3 A New Protocol with Real-Time Support ................................ 107
7.1.4 Protocol and Real-Time Features............................................ 109
7.2 Protocol Description ....................................................................... 110
7.2.1 Transmitter and Receiver Cycles ............................................ 112
7.2.2 Distributed Slot Allocation Algorithm .................................... 113
7.3 Real-Time Services......................................................................... 116
7.3.1 Arriving Messages.................................................................... 116
7.3.2 Control Slots............................................................................. 117
7.3.3 Data Slots ................................................................................. 118
7.3.4 Slot Reserving .......................................................................... 118
7.4 Implementation Aspects ................................................................ 119
7.4.1 Clock Synchronization ............................................................. 119
7.4.2 Clock Recovery ......................................................................... 119
7.4.3 Computational Complexity...................................................... 120
7.4.4 Electronic Stars........................................................................ 120
7.5 Deterministic Performance ............................................................ 121
7.6 Case Study ...................................................................................... 123
7.7 Simulation Results ......................................................................... 124
7.8 Summary ........................................................................................ 127
8. WDM Star-of-Stars Network................................................................ 129
8.1 Network Architecture and Protocol ............................................... 129
8.2 Deterministic Performance ............................................................ 131
8.3 Clock Synchronization Aspects ...................................................... 135
8.4 Summary ........................................................................................ 139
9. Control Channel Based Fiber-Ribbon Pipeline Ring Network ........... 141
9.1 Network Overview.......................................................................... 142
9.2 Related Networks ........................................................................... 143
9.3 The CC-FPR Protocol ..................................................................... 144
9.4 User Services .................................................................................. 148
9.4.1 Real-Time Virtual Channels.................................................... 149
9.4.2 Guarantee Seeking Messages.................................................. 150
9.4.3 Best Effort Messages................................................................ 151
9.4.4 Barrier Synchronization .......................................................... 151
9.4.5 Global Reduction ...................................................................... 152
9.4.6 Low Level Support for Reliable Transmission........................ 153
9.5 Implementation Aspects ................................................................ 153
9.6 Case Study ...................................................................................... 155
9.7 Performance Analysis .................................................................... 157
9.8 Summary ........................................................................................ 163
10. Packet and Circuit Switched Fiber-Ribbon Pipeline Ring Network .. 165
10.1 Circuit Switched Traffic ................................................................. 165
10.2 Packet Switched Traffic ................................................................. 167
10.3 Circuit Establishment .................................................................... 167
9
10.4 Case Study ...................................................................................... 168
10.5 Mode Changes ................................................................................ 168
10.6 Summary ........................................................................................ 169
11. Conclusions and Future Work.............................................................. 171
11.1 Conclusions ..................................................................................... 171
11.2 Ongoing and Future work.............................................................. 171
References...................................................................................................... 175
Abbreviations ................................................................................................ 221
10
/LVWRI)LJXUHV
Figure 1: Three passive optical network architectures: (a) ring, (b) (dual)
bus, and (c) star........................................................................................ 44
Figure 2: Folded bus. ...................................................................................... 45
Figure 3: Some static topologies: (a) linear array, (b) ring, (c) 2-dimensional
mesh, (d) 2-dimensional torus, (e) 3-dimensional binary hypercube, and
(e) 4-dimensional binary hypercube. ....................................................... 61
Figure 4: Omega network for an eight-node system. One path through the
network is highlighted. ............................................................................ 64
Figure 5: Possible states of a 2 × 2 switch. .................................................... 64
Figure 6: A rearrangeble Benes network. ...................................................... 65
Figure 7: Fat-tree of switches where nodes are leaves in the tree. .............. 65
Figure 8: Computation module with eight 8 × 8 meshes of PEs. .................. 74
Figure 9: The MIMD system enables a functional decomposition and
allocation of the program to the modules. Each function or block of
functions is then mapped onto the processors in the computation
module (CM). ............................................................................................ 75
Figure 10: Description of one mode of operation in the 64-channel ground
based radar system. ................................................................................. 76
Figure 11: Data flow between the modules in the signal processing chain.. 77
Figure 12: A pure pipeline chain where one or more stages in the chain are
mapped on each module........................................................................... 79
Figure 13: If the SPMD model is used, all PEs work together on one stage in
the signal processing chain at a time...................................................... 80
Figure 14 : Several parallel groups of PEs, each running a pipeline chain
program. ................................................................................................... 81
Figure 15: Several parallel groups of PEs, each running an SPMD program.
.................................................................................................................. 82
Figure 16: Several PEs in parallel, where each data cube is processed by a
single PE................................................................................................... 83
Figure 17: Example of spatial bandwidth reuse. Node M sends to Node 1 at
the same time as Node 1 sends to Node 2 and Node 2 sends a multicast
packet to Nodes 3, 4, and 5...................................................................... 85
Figure 18: Control channel based network built up with fiber-ribbon point-
to-point links. ........................................................................................... 86
11
Figure 19: WDM star network........................................................................ 87
Figure 20: WDM star multi-hop network. ..................................................... 89
Figure 21: Multi-hop topology. ....................................................................... 90
Figure 22: A foil of fibers connects four computational nodes. In addition, a
clock node distributes clock signals to the computational nodes........... 91
Figure 23: Array of passive optical stars connects a number of nodes via
fiber-ribbon cables.................................................................................... 92
Figure 24: Fully connected topology where each node has an array of N − 1
laser diodes and an array of N − 1 photo diodes, where N is the number
of nodes. .................................................................................................... 93
Figure 25: Row of smart-pixel arrays............................................................. 95
Figure 26: Example of a planar free space system. The direction of the beam
is steered by the optical element on the way between two chips........... 97
Figure 27: Optical backplane configurations: (a) with planar free space
optics, (b) with smart pixel arrays, and (c) with a mirror. ..................... 98
Figure 28: Multiple passive optical stars topology. ..................................... 108
Figure 29: Passive optical star network where the nodes are equipped with
fixed transmitters and tunable receivers.............................................. 109
Figure 30: The transmitter cycle, lower table, is filled by taking all owned
slots in the receiver cycles, upper table, in all other nodes. Note that a
multicast is possible in Slots 1, 9, and 10. ............................................ 112
Figure 31: A receiver cycle is partitioned into data slots and control slots.
................................................................................................................ 113
Figure 32: Allocation scheme for the receiver cycles in a four-node system.
................................................................................................................ 114
Figure 33: Electronic star. ............................................................................ 121
Figure 34: Latency calculation. .................................................................... 122
Figure 35: Real-time performance plotted as fraction of messages that miss
their deadlines versus traffic intensity. ................................................ 125
Figure 36: Latency for guarantee seeking messages plotted versus traffic
intensity.................................................................................................. 126
Figure 37: Mean latency for best-effort messages in a network with 10 %
guarantee seeking and 90 % best-effort messages. .............................. 126
Figure 38: Mean latency for best-effort messages plotted versus bandwidth
utilization. The traffic consisted of 10 % guarantee seeking and 90 %
best-effort messages............................................................................... 127
12
Figure 39: Gateway node with higher channel bandwidth on the backbone
side (right side)....................................................................................... 130
Figure 40: Each backbone slot is divided into 5 sub-slots with the same pair
of gateway nodes as source and destination. ........................................ 131
Figure 41: Worst-case latency when full slot reservation by other nodes is
assumed. The x-axis represents the number of ordinary nodes and the
slot length is assumed to be γ = 1.0 µs. ................................................. 133
Figure 42: Worst-case latency when no slot reservation is assumed. The x-
axis represents the number of ordinary nodes and the slot length is
assumed to be γ = 1.0 µs......................................................................... 134
Figure 43: Node bandwidth, when full slot reservation by other nodes is
assumed, in number of high priority slots per total number of slots in a
cycle. The x-axis represents the number of ordinary nodes. ................ 136
Figure 44: Node bandwidth, when no slot reservation is assumed, in number
of high priority slots per total number of slots in a cycle. The x-axis
represents the number of ordinary nodes.. ........................................... 137
Figure 45: With the synchronization scheme used, incoming traffic to a
gateway node can be forwarded immediately except for the case of
internal delay in the gateway node....................................................... 138
Figure 46: Worst-case latency comparison between usage and no usage of
the proposed synchronization scheme. No slot reservation is assumed.
The x-axis represents the number of ordinary nodes and the slot length
is assumed to be γ = 1.0 µs..................................................................... 139
Figure 47: (a) Bi-directional fiber-ribbon link. (b) Unidirectional ring
network built up of M/2 bi-directional links. ........................................ 141
Figure 48: The role of being slot initiator is cyclically repeated. Each of the
M nodes is the slot initiator in one slot per cycle. ................................ 144
Figure 49: The node succeeding the slot initiator initiates the control packet
transmission........................................................................................... 145
Figure 50: In each slot, a node passes/transmits one control packet and one
data packet, where the control packet is used for the arbitration of the
next slot. ................................................................................................. 146
Figure 51: A control packet contains a start bit, a link reservation field, and
a destination field. ................................................................................. 146
Figure 52: Conceptual view of the control channel part of the transceiver.
................................................................................................................ 147
Figure 53: Conceptual view of the data channel part of the transceiver.... 147
Figure 54: A control packet travels around a network with five nodes. Node
1 is the slot initiator. ............................................................................. 148
13
Figure 55: An example: Node 1 sends a single destination packet to Node 3,
while Node 4 sends a multicast packet to Node 5 and Node 1. ........... 149
Figure 56: The bandwidth utilization depends on the ratio of the total
propagation delay around the ring to the cycle length. The boxes with
bold text show the link through which each slot first propagates....... 150
Figure 57: Detailed description of the control packet contents................... 152
Figure 58: Data flow between the modules in the radar signal processing
chain. ...................................................................................................... 156
Figure 59: One cycle of 45 slots where Slots 16 through 45 are reserved.
Each number in the table indicates, for Slots 16 through 45, the owner
of the corresponding link. Neighboring segments in each slot have
different background shading. The slot initiators are indicated in Slots
1 through 15. .......................................................................................... 157
Figure 60: Slot distribution according to Cases A and B. In the example, Nr
= 2, No = 1, and M = 3. ........................................................................... 158
Figure 61: Worst-case latency when the only slots a node gets are those
ordinary slots for which it is the slot initiator. Case A is assumed. .... 159
Figure 62: Worst-case latency when the only slots a node gets are those
ordinary slots for which it is the slot initiator. Case B is assumed. .... 160
Figure 63: Comparison of worst-case latency for Cases A and B................ 161
Figure 64: Maximum aggregated throughput. Case A is assumed............. 162
Figure 65: Maximum aggregated throughput. Case B is assumed............. 163
Figure 66: Example of an allocation scheme for the links in a five-node
system. The slot initiators are in bold type, and different segments have
different background shading................................................................ 166
Figure 67: Network demonstrator with two nodes. The cable bundle in the
upper part of the figure contains two fiber-ribbons ending in each
node´s OPTOBUS module. .................................................................... 172
14
/LVWRI7DEOHV
Table 1: Experimental single-hop WDMA star networks. M is the number of
nodes. ........................................................................................................ 50
Table 2: Non-control channel based protocols for single-hop broadcast-and-
select WDMA networks. M is the number of nodes, and N is a positive
integer greater than zero. ........................................................................ 50
Table 3: Control channel based protocols for single-hop broadcast-and-select
WDMA networks. M is the number of nodes, and N is a positive integer
greater than zero...................................................................................... 51
Table 4: Other proposed single-hop WDM star networks. M is the number of
nodes, and N is a positive integer greater than zero.............................. 52
Table 5 : Classification of system sizes and communication distances. ....... 84
Table 6 : A feasible allocation scheme of the wavelength channels in a WDM
star network for the sample radar system.............................................. 88
Table 7: Performance summary of some of the networks discussed with
respect to pipeline and SPMD mapping.................................................. 99
Table 8 : Performance summary of some of the networks discussed with
respect to pipeline and SPMD mapping when several concurrent modes
run in parallel, each on dedicated hardware. ....................................... 100
Table 9: Suitability in different system sizes. Empty cells in a column mean
that the technology/network is not suitable for the corresponding
system sizes............................................................................................ 101
Table 10: Notations for descriptions of the network architecture and
protocol. .................................................................................................. 111
Table 11: Additional notation when describing the star-of-stars network
architecture. ........................................................................................... 129
Table 12: Node bandwidth in number of high priority slots per total number
of slots in a cycle. ................................................................................... 135
15
3DUW,,QWURGXFWLRQDQG7XWRULDO6XUYH\V
17
,QWURGXFWLRQ
Fiber-optic communication systems with multiple channels obtained
through the use of WDM (Wavelength Division Multiplexing) have reached
the commercial stage in the area of telecommunications (see Chapter 2 for a
survey of fiber-optic communication technology). In the future, the WDM
technique is expected also to be applicable in local multiple-access networks.
At the same time, bandwidth demands in parallel and distributed
computing systems are increasing because of new, evolving applications.
Fiber-optic multiple-channel communication systems can give new
perspectives in these systems.
Another technology that can be very useful for the development of new
parallel and distributed computing systems is the use of fiber-ribbon links.
Commercially available products have appeared very quickly, and a good
price/performance ratio is expected.
The contributions of the present work are summarized in Section 1.1, while
Section 1.2 states which papers the thesis is based on. Introductions to
parallel computers, real-time communication, and optical interconnections
in parallel computers are given in Sections 1.3, 1.4, and 1.5, respectively.
The disposition of the thesis is given in Section 1.6.
&RQWULEXWLRQV
One of the motivating factors for this work was the small amount of
research done on high-performance fiber-optic networks for hard real-time
computer systems (we do not focus on telecommunications and do not treat,
e.g., FDDI, Fibre Distributed Data Interface, networks and extensions of
FDDI networks for high-performance networks in the framework of this
thesis). There is a need for this kind of network in, e.g., embedded
supercomputers. The radar signal processing system described in Chapter 4
is a typical example of an application and computer system which we place
in the category of embedded supercomputing.
The approach has been to develop network protocols with support for
dynamic real-time services, out of traditional time-deterministic static
TDMA systems (see, e.g., [Kopetz and Grünsteidl 1994] [Kopetz et al. 1989]
19
[Nilsson et al. 1993] [Nilsson et al. 1993B] [Nilsson 1994] [Wiberg 1993] for
time-deterministic communication systems for hard real-time systems),
while considering emerging promising, fiber-optic technologies. The focus
has been on functionality more than pure performance figures, mostly on
real-time features but, later in the work, on other types of functionality for
parallel and distributed systems as well. The focus of the analyses made has
been on predictable performance, including case studies with feasibility
tests.
The WDM star network proposed in Chapter 7, and Chapter 8 for the star-
of-stars variant, was developed to fill the gap in high performance real-time
networking and supports real-time traffic for dynamic hard real-time
systems where each node can have a sustained output dataflow of several
Gbit/s. This is offered in combination with other desired, or even required,
properties such as broadcast capabilities, scalability in terms of number of
nodes, and scalability in terms of aggregated system bandwidth.
20
networks, one for circuit-switched traffic and one for packet-switched traffic
(see Chapter 10).
3XEOLVKHG3DSHUVDQG5HVHDUFK5HSRUWV
The papers on which the thesis is based can be divided into four groups: (L)
papers concerning the proposed WDM star and star-of-stars networks, (LL)
papers concerning the two proposed fiber-ribbon ring networks, (LLL) surveys
of optical networks (mainly fiber-optic) and network components, including
evaluations of network architectures for signal processing systems, and (LY)
papers reporting research on radar signal processing systems. The papers in
the first three groups have been written mainly by me, while my work
represents a minor part of the papers in the fourth group. The papers are
listed by group below.
21
*URXS
Initial ideas for how to achieve a WDM star network with support for
both hard real-time traffic and non real-time traffic by dividing a cycle
of time slots into different parts with different roles.
*URXS
22
$OJRULWKPV DQG 1HWZRUNV ,63$1ª Taipei, Taiwan, Dec. 18-20,
1997, pp. 138-143.
[H] M. Jonsson, “Two fiber-ribbon ring networks for parallel and distributed
computing systems,” 2SWLFDO(QJLQHHULQJ, vol. 37, no. 12, pp. 3196-3204,
Dec. 1998.
Services in the CC-FPR network for, e.g., real-time traffic and group
communication are briefly described (work to be continued beyond the
scope of this thesis). A worst-case analysis is also provided for the
network.
23
*URXS
*URXS
24
'LVWULEXWHG 3URFHVVLQJ ,33663'3ª, Orlando, FL, USA, Mar. 30 -
Apr. 3, 1998, pp. 226-232.
3DUDOOHO&RPSXWHUV
Two categories, according to Flynn’s classification [Flynn 1966] [Flynn
1972], into which parallel computers may be classified are MIMD (Multiple
Instruction streams Multiple Data streams) [Hord 1993] and SIMD (Single
Instruction stream Multiple Data streams) [Hord 1990]. MIMD computers
are typically coarser grained than SIMD computers, which instead have
more tightly coupled processing elements that are all controlled by the same
instruction flow. MIMD computers may be further divided into distributed-
memory multicomputers and shared-memory multiprocessors. Shared-
memory multiprocessors of today normally employ mechanisms to retain
cache coherency among the processing elements [Stenström 1990]
[Tomaševiü and Milutinoviü 1994] [Tomaševiü and Milutinoviü 1994B].
Parallelism can exist on a lower level as well (i.e., on a more fine grained
level), e.g., instruction level parallelism (ILP).
Most, but not all of the interconnection networks mentioned in this thesis
are more suitable for coarse-grained systems. Referring to a node in this
thesis means a computer system with one interface to the network. A node
can actually be a parallel computer itself. For example, it can be a SIMD
computer, i.e., the system is an MIMD computer on the high level,
connecting multiple autonomous SIMD computers [Taveniku et al. 1998].
This configuration is related to the MSIMD (Multiple-SIMD) architecture of
PASM [Siegel et al. 1981]. Optical interconnection technology can actually
be employed on several levels in a system, e.g., fiber-optics connecting nodes
on a coarse level, where each node contains free-space optics to connect
processing elements internally in the node.
5HDO7LPH&RPPXQLFDWLRQ
In a real-time computer system, the correct functioning of the system
depends on the time at which a result is produced as well as the correctness
of the result [Stankovic 1988]. In many real-time systems, timing must be
guaranteed in order to avoid life-threatening situations. An example is
control systems for nuclear power plants. Other real-time systems include
flight control systems, radar systems, robotics, and industrial control
25
systems. In distributed real-time systems, the interconnection network
plays a very important role in fulfilling the system functioning
requirements. Often, guaranteeing real-time services is much more
important than performance such as throughput or average latency in these
systems. Some classifications and concepts important for the area of real-
time communications are briefly described below.
Real-time systems are often classified into VRIW or KDUG real-time systems
depending on how critical the (timing) correctness of their behavior is. In
terms of deadlines for processes or messages, we can introduce a third term,
ILUP [Kopetz 1997]. If, e.g., a result of a calculation or an arriving message
also has utility after its deadline, the deadline is classified as soft. It is
otherwise classified as firm. If the missing of a deadline may end in a
catastrophe, the deadline is classified as hard. Services for real-time traffic
with hard and soft deadlines are often classified into guarantee-seeking and
best-effort, respectively. This work has focused on networks with support for
both guarantee-seeking and best-effort traffic.
26
With regard to (packet-) switched real-time networks [Zhang 1995], they are
here assumed to be constructed of switches interlinked with point-to-point
interconnections, i.e., so called mesh networks. Most reported switched real-
time networks are wide area networks and are therefore not in focus in this
thesis (which is focused on short distance systems). Real-time message
streams, e.g., multimedia streams, are normally supported in the form of
virtual circuits or the like. The handling of these message streams in the
switches during the lifetime of a connection and at setup can be done in a
deterministic or a statistical way. A deterministic service guarantees a
certain performance even in the worst-case, while a statistical service does
not have a hundred percent guarantee for this. Instead, a statistical service
might have a specified probability of packet loss and deadline or jitter
violation.
2SWLFDO,QWHUFRQQHFWLRQVLQ3DUDOOHO&RPSXWHUV
Optics have been used for a while in communication systems where optical
fibers each acts as a single point-to-point channel. However, the broad field
of optical interconnections does not only include traditional single-channel
fiber-optics. Novel technologies allow for features such as multiple high-
speed channels in a single fiber and two-dimensional arrays of optical free-
space channels. Some work on comparative technology studies and
classifications are reviewed below.
27
• light beams do not suffer from signal frequency-dependent
attenuation
• parallel global high speed interconnects are offered
• the possibility of non-planar interconnections is offered
• multiplexing can be done in multiple domains, e.g., the time and
wavelength domains
• high-density interconnects where light beams can cross each other are
possible
• large electrical backplane contacts can be avoided, thereby reducing
the chassis size
• optical interconnects offer low energy per bit and high speed-power
product
• low skew is possible
Of course, there are also arguments for not using optics [Bohr 1998]. For
example, it is argued that the performance of electrical interconnections will
continue to scale [Horowitz et al. 1998], and it has been demonstrated that a
bit rate of 4 Gbit/s over 100 m of twisted-pair copper cable is possible [Dally
et al. 1998]. Comparative studies of optical and electrical interconnections
have been reported, e.g., comparing energy consumption and system speed
[Feldman et al. 1987] [Tooley 1996] [Yayla et al. 1998]. The best thing,
however, is not to choose either electronics or optics but to use both
technologies in the respective areas to which they are best suited [Caulfield
1998].
28
as clusters of workstations. Many references to work related to free-space
systems are found in [Ozaktas 1997] [Ozaktas 1997B]. One possible
classification of free-space optical interconnection systems is [Yatagai et al.
1996]:
• stacked optics
• planar optics
• stacked and planar optics
'LVSRVLWLRQRIWKH7KHVLV
The dissertation is divided into three parts as described below:
• Part III: the main contributions of the thesis work presented. Four
chapters present the WDM star network, the star-of-stars network, the
control channel based fiber-ribbon ring network, and the simplified fiber-
ribbon ring network, respectively. The part then ends with a concluding
chapter of the thesis.
Because of the disposition chosen, and the fact that Parts II and III contain
enough overlapping background information, one may begin reading in
Parts II or III, especially if the contents of earlier chapters are well known
to the reader.
29
+LJK3HUIRUPDQFH)LEHU2SWLF1HWZRUNV
,QWURGXFWLRQ
Optical fibers for communication systems offer a bandwidth of more than 30
THz per fiber [Brackett 1990], giving them great potential for future
computer communication networks for data intensive applications. This
survey describes representative examples of high-performance fiber-optic
networks as well as MAC (Medium Access Control) protocols [Rom and Sidi
1990] for those networks. WDMA (Wavelength Division Multiple Access,
described below) networks and passive optical networks (PONs) are given
special attention. Previous reviews of high-capacity fiber-optic networks for
data communication are found in [Acampora and Karol 1989] [Chatterjee
and Pawlowski 1999] [DeCusatis 1998] [Green 1991] [Modiano 1999]
[Ramaswami 1993] [Senior et al. 1998]. The review given here has a focus
on optical LANs [Kazovsky et al. 1994] and similar networks that can be
used for flexible communication in parallel and distributed computing
systems. Many proposed fiber-optic networks, although developed with a
specific application in mind, can be used in a wide range of applications;
thus we do not necessarily sort those networks out. Nevertheless, e.g., pure
telecommunication networks are not treated here.
0XOWLSOH$FFHVV0HWKRGV
Accessing and sharing the huge amount of optical bandwidth among the
multiple nodes in a fiber-optic network is a challenging problem. Four
multiple access methods are described below: WDMA, TDMA (Time Division
Multiple Access), SCMA (SubCarrier Division Multiple Access), and CDMA
(Code Division Multiple Access). The most popular one for achieving high
aggregated bandwidth seems to be WDMA, especially in end-user systems
owing to their simple transceiver designs. Additional information on
multiple-access methods is found in [Gagliardi and Karp 1995].
:DYHOHQJWK'LYLVLRQ0XOWLSOH$FFHVV
When using WDMA, multiplexing is done in the spectral domain of the light
[Hill 1989] [Kaminow 1989] [Green and Ramaswami 1990] (WDMA systems
use the WDM technology, and the two terms are therefore often used inter-
changeably here). In this way, several optical carriers, or channels, are
31
implemented in the network. For systems with denser wavelength spacing
than 1 nm, the WDMA technique is also referred to as optical FDM
(Frequency Division Multiplexing) [Agrawal 1992]. One way to use the
wavelength channels is to allow several nodes to transmit simultaneously
on different channels. Data are then sent bit-serially between nodes tuned
to the same channel. The channel to which each transmitter and receiver
are tuned at any time is controlled by the MAC protocol. Another way is to
synchronously transmit on several channels in parallel, i.e., bit-parallel
byte-serial transmission [Loeb and Stilwell 1988] [Loeb and Stilwell 1990].
However, compensation for bit-skew caused by group delay dispersion
(different wavelength channels travel at different speeds in the fiber) may
be needed in these systems [Jeong and Goodman 1996].
• A system with 100 channels, each carrying data at a bit rate of 622
Mbit/s [Toba et al. 1990].
32
7LPH'LYLVLRQ0XOWLSOH$FFHVV
TDMA networks can be divided into two groups, those using SDFNHW
LQWHUOHDYLQJ and those using ELWLQWHUOHDYLQJ. Networks using packet-
interleaving, for example [Barry et al. 1996], divide the access to the
medium into time slots. The length of each slot normally equals the
transmission duration of a packet. Simple MAC protocols can be used, but
each node must work at the speed of the aggregated bit rate in the network.
With bit-interleaving, each node sends only one bit at a time at regular
intervals. If 1 nodes transmit simultaneously, the bit stream from one node
will have a bit-to-bit interval of 1 bit slots. The width of the optical pulse
is, however, normally several times shorter than the duration of a bit slot.
In a bit-interleaved TDMA network, clock synchronization must be
maintained with a precision in the order of less than a bit period, which is
difficult when a bit rate of multiple Gbit/s is used. Experiments with a 250
Gbit/s network were reported in [Prucnal et al. 1994]. The optical pulse
width was 1 ps, the bit slot was 4 ps and the bit-to-bit interval from one
node was 10 ns. This resulted in the capacity to support a 100 Mbit/s bit-
stream to each of 2 500 nodes. Bit-interleaved TDMA networks and
technologies to implement such networks are reviewed in [Spirit et al. 1994]
[Seo et al. 1996].
6XEFDUULHU'LYLVLRQ0XOWLSOH$FFHVV
The data from each node in a SCMA network are used to modulate node-
specific microwave subcarriers, i.e., multiplexing is done in the microwave
frequency domain [Darcie 1987]. The subcarrier then modulates an optical
carrier. All subcarriers are detected at the receiver photo-diode, but only the
desired one is demodulated using conventional microwave techniques.
SCMA can be contrasted with WDMA, where multiplexing is also done in
the spectral domain but on the lightwave carrier, about 1014 Hz, while the
subcarrier microwave frequency is at 108 to 1010 Hz [Mestdagh 1995].
33
&RGH'LYLVLRQ0XOWLSOH$FFHVV
Several CDMA methods for fiber-optic communication have been proposed.
The majority of work is being done on the method having a set of orthogonal
code sequences of FKLSV, where each chip has a value of either “1” or “0”
[Salehi 1989] [Salehi and Brackett 1989] [Marhic 1993]. Each code sequence
then corresponds to the destination address of a specific node. During
transmission, each “1” bit is encoded with the destination node’s code
sequence of chips, while a “0” bit is encoded with a sequence of only “0”
chips. The drawback of this scheme is that each transceiver must work at a
speed 1 times higher than the bit rate, where 1 is the length of the code,
i.e., the number of chips. In wireless communication systems, this method is
referred to as Direct Sequence Spread Spectrum (DS-SS) [Bantz and
Bauchot 1994].
Instead of having the code sequence spread in time, spectral encoding can be
used. In [Zaccarin and Kavehrad 1993], the light from a LED (Light
Emitting Diode) is spatially split into a discrete number of spectral
components by a diffraction grating. The light is then passed through a
spatial amplitude mask with the code sequence and combined in the fiber.
In this way, each of the spectral components is encoded with one of the
chips, and the LED need only to be modulated with the bit rate. A similar
technique is used in the receiver. In [Weiner et al. 1988], the code sequence
is also applied in the spectral domain but with a phase mask instead of an
amplitude mask.
'LVFXVVLRQ
The different properties of the multiple-access methods make them suitable
for different applications/systems. In the kind of systems in focus in this
thesis, a limited number of users share the network cost (the opposite of
34
telecommunication backbone networks). CDMA and bit-interleaved TDMA
are therefore considered too complex. When a bit rate of several Gbit/s is
needed from each node in the network simultaneously, packet-interleaved
TDMA will also be too complex, because every node must work at the
aggregated bit rate of the network. SCMA is also excluded because of the
limited network capacity [Mestdagh 1995]. What is left is WDMA, which
gives easy access to the channels when tunable components reach
reasonable costs, and where each node need only work at the speed of its
own bit rate.
&RPSRQHQWVIRU:'0$1HWZRUNV
A fiber-optic communication system typically consists of a transmitter, a
receiver, and some form of transmission medium based on optical fibers.
Common optoelectronic components used in the transmitter are LEDs and
laser diodes [Ettenberg 1998], where laser diodes can be divided into
multimode and single-mode laser diodes. Single-mode laser diodes have
narrower spectral widths than multimode laser diodes by the incorporation
of a grating filter inside or outside the cavity. The conventional multimode
laser diode is called the Fabry-Perot laser, while two common single-mode
laser diodes are the DBR (Distributed Bragg Region) and the DFB
(Distributed FeedBack) laser diodes [Carroll et al. 1993]. In the receiver, the
commonly used components are PIN (P-insulator-N) diodes and APDs
(Avalanche Photo Diode). The three classic types of optical fibers are
multimode step-index fiber, multimode graded-index fiber, and single-mode
step-index fiber, each with its own core/cladding design and dispersion
characteristics. Further reading on fiber-optic communication systems and
their components is found in [Keiser 1991], while an overview of components
specially needed in WDMA networks is presented here.
35
parallel and distributed computing systems, the longest distance between
two end-nodes is rather short. Topics related to long distance
communication are therefore not described here, e.g.:
7XQDEOH5HFHLYHUV
A receiver in a WDMA network must tune in one of all the wavelengths
from the incoming fiber. Since a photo diode itself detects a broad band of
wavelengths, a wavelength filter must be used. Tunable optical filters
include interferometer filters, filters based on mode coupling, filters based
on resonant amplification, and grating based filters. Fast tuning and a
broad tuning range is desired while the pass-bandwidth should be adapted
to the incoming channels, i.e., it should pass the whole energy of one
channel while preventing energy from other channels from being detected.
The Mach-Zehnder interferometer filter splits the incoming light into two
beams, delays one of the beams slightly more than the other and then lets
the two beams interact with each other [Hecht 1987]. Tuning is obtained
when the delay is changed. By cascading several Mach-Zehnder
interferometer filters, a greater wavelength selectivity is achieved. In
[Wooten et al. 1996], a three-stage electro-optically tunable Mach-Zehnder
interferometer with a 50-ns tuning latency over a tuning range of eight
channels is demonstrated. Experiments with a 16-channel (four stages)
interferometer was reported in [Oda et al. 1989], while a 128-channel (seven
stages) interferometer were used in the 100-channel transmission
experiment reported in [Toba et al. 1990].
36
the mirrors. The distance must be short to limit the number of resonating
wavelengths to one in the working range of the communication system.
Because of the need of mechanically moving one of the lenses, the tuning
latency is large. Piezoelectric transducers are commonly used to change the
distance [Miller and Janniello 1990], while another method is to rotate
(relative to the incoming and outgoing beams) a Fabry-Perot filter with a
fixed distance [Frenkel and Lin 1988]. Experiments with cascaded Fabry-
Perot filters to enhance performance have also been reported [Kaminow et
al. 1989]. The two-stage filter described in [Miller and Miller 1992],
supports 1000 channels in a 40 nm range. A tunable Fabry-Perot filter
without moving parts is described in [Patel et al. 1990]. The cavity was
filled with liquid crystals, and wavelength tuning was obtained by applying
an electric field to change the refractive index of the liquid crystals.
In a grating based filter, the incoming light is spatially split into its
wavelength components. If a photo diode array is used to detect all of the
wavelength components (one wavelength per photo diode), as proposed in
[Kirkby 1990], a multi-channel receiver is achieved in which the electronic
switching time sets the tuning latency, i.e., very fast. A grating integrated
monolithically with a photo diode array detecting 42 wavelength channels
spaced by 4 nm was reported in [Cremer et al. 1992]. In [Parker and Mears
1996], one digitally tunable wavelength is steered to the output fiber by
using an SLM (Spatial Light Modulator) together with a fixed grating.
37
Tuning to discrete wavelengths spaced by 1.3 nm over a tuning range of
38.5 nm was demonstrated. The demultiplexing of 120 channels by the use
of a concave grating was reported in [Sun et al. 1998].
7XQDEOH7UDQVPLWWHUV
With wavelength-tunable transmitters, a certain transmission wavelength
can be chosen, either from a continuous range or from a number of discrete
wavelengths. From a system engineer’s point of view, an ideal tunable laser
diode has a tuning range of about 100 nm and a tuning latency in the order
of nanoseconds [Mestdagh 1995]. Many components aiming to meet one of
the two wishes have been proposed, and some meet both rather well. The
linewidth (spectral width; normally measured as the FWHM, Full Width
Half Maximum) of the emitted light gives a hint of the possible number of
channels, but other parameters such as receiver selectivity must be
considered when designing a WDM communication system. Wavelength-
tunable laser diodes are reviewed in [Lee and Zah 1989].
38
range between the two rows but in which each laser was capable of 5 Gbit/s
operation was reported in [Maeda et al. 1991]. Other WDM VCSEL arrays
include a 2 × 2 densely-packed array [Huffaker and Deppe 1996], and other
WDM array components include a DFB laser array with 20 lasers emitting
at ten 2-nm spaced wavelengths (two diodes per wavelength) [Lee et al.
1996].
Several external cavity laser diodes with external filters for wavelength
tuning have been reported. By antireflection coating one of the output facets
on a laser diode and having an external mirror, the cavity length is
extended. If a diffraction grating is used as the external mirror, tuning is
obtained by moving the grating (fine-tuning by changing the cavity length)
or rotating the grating (a different longitudinal mode is selected from the
grating) [Mellis et al. 1988]. Tuning ranges exceeding 240 nm have been
demonstrated with external-cavity grating-based laser diodes [Bagley et al.
1990] [Tabuchi and Ishikawa 1990]. The drawback of these grating-based
lasers is the large tuning latency caused by the mechanical movement of the
grating. However, devices with shorter tuning latencies have been
demonstrated. One way to avoid the mechanical moving mechanism is to
exchange the diffraction grating with an acousto-optic filter that selects the
lasing wavelength and allows for tuning latencies in the range of
microseconds [Coquin and Cheung 1988]. Another fully electronically
tunable laser diode is the MAGIC (Multistripe Array Grating Integrated
Cavity) laser diode [Soole et al. 1992]. The MAGIC laser chooses one
wavelength for lasing by activating one of several waveguide stripes. A fixed
diffraction grating couples a specific wavelength into each stripe.
In [Larson and Harris 1996], a 15-nm tuning range and a 0.14 nm linewidth
are demonstrated for a single VCSEL laser. A movable mirror was placed on
top of the laser with an air gap between. By moving the mirror, the cavity
length and, hence, the wavelength of the emitted light are changed. A
similar tunable VCSEL but with a tuning range of 31.6 nm was recently
reported [Sun et al. 1998]. An LED with the same tuning mechanism, a
39
tuning range of 39 nm, and a linewidth of 1.9 nm is presented in [Larson
and Harris 1995].
2WKHU:'0&RPSRQHQWV
Essential system components when building fiber-optic networks like those
described in the next section include splitters, combiners, stars, wavelength
converters [Nesset et al. 1998], WDM demultiplexers, and WDM
multiplexers. Several of the components mentioned are commercially
available with limited features, e.g., a small number of input/output ports.
A 1×1 fiber-optic splitter splits the light from one input fiber to 1 output
fibers. Both symmetric and asymmetric splitters are available, where an
asymmetric splitter splits an unequal amount of light to the different output
ports. A combiner works in the opposite way combining input signals from
several fibers to one output fiber. An 1 × 1 fiber-optic star (often referred to
as a passive optical star) can be viewed as an 1 × 1 combiner followed by an
1×1 split. The conventional way to build a star is to use a number of 2 × 1
combiners and 1 × 2 splitters. However, it is difficult to build large stars
using this technique, and other techniques have been proposed to overcome
the problem [Okamoto et al. 1992] [Yun and Kavehrad 1992]. A 144 × 144
passive optical star is described in [Okamoto et al. 1992B] and [Kato et al.
1993].
40
The ADM (Add-Drop Multiplexer) is another component in which one or
several wavelengths may be added/dropped to/from a bypassing fiber. The
component can be used in, e.g., WDM ring networks (described in the next
section). An ADM based on two fixed wavelength, multi-layer Fabry-Perot
filters is described in [Hamel et al. 1995]. A component in which the
wavelength to be added/dropped can be tuned over a discrete number of
wavelengths is described in [Glance 1996].
,QWHUFRQQHFWLRQ$UFKLWHFWXUHV
The simplest fiber-optic transmission system is to have a point-to-point link
between two nodes. A network can be built by using several point-to-point
links. These so-called point-to-point linked networks have been
commercially available for a while and are described in Section 2.4.1.
However, all-optical multi-access networks are expected to be popular in the
future (described in Section 2.4.2).
3RLQWWR3RLQW/LQNHG1HWZRUNV
In a fiber-optic point-to-point linked network, a number of optically-isolated
links connect the nodes in the network. If, for example, wavelength division
multiplexing should be used to increase the bandwidth, the technology must
thus be implemented separately in each point-to-point link. The main types
of point-to-point linked networks that fall into the scope of this report are
ring networks [Davies and Ghani 1983] and switched networks.
In the FDDI network [Ross 1989] [Jain 1993], the nodes are connected in a
unidirectional primary ring where each node receives from one link and
transmits on another. A node acts like a repeater for all incoming messages
except those addressed to the node and those sent by the node and should be
removed from the ring after one round. In addition to the primary ring,
some or all nodes are connected to a secondary ring which, for example, can
be used as a backup ring. A transmission rate of 125 Mbaud is used, which
gives a data rate of 100 Mbit/s because of the 4b/5b encoding that is used. A
token protocol is used for medium access control. Guarantees can be given
41
for both latency and bandwidth in the FDDI network and its successor,
FDDI-II [Ross 1989] [Jain 1993].
• Constant Bit Rate (CBR) with minimal and constant delay between
the end-nodes. Can be used for, e.g., voice over ISDN.
• Variable Bit Rate (VBR) with minimal and constant delay between
the end-nodes. Designed to handle video and audio where, e.g., the
compression ratio can vary.
• Unspecified Bit Rate (UBR), e.g., for data transmissions without real-
time requirements.
As indicated above, real-time services are supported, but guarantees are not
made for separate cells (data units in ATM each consisting of 48-byte data
and a 5-byte address) without a setup phase, due to the fact that ATM is a
connection-oriented network.
42
converting the optical signal first into electronic form and then back to
optical again, but at another wavelength.
A system component that has reached the market recently [Bursky 1994]
[Fibre Systems 1998] is the fiber-ribbon link [Buckman et al. 1998]
[Engebretsen et al. 1996] [Hahn 1995] [Hahn 1995B] [Hahn et al. 1996]
[Hartman et al. 1990] [Jiang et al. 1998] [Karstensen et al. 1995]
[Karstensen et al. 1998] [Kuchta et al. 1998] [Nagarajan et al. 1998]
[Nishimura et al. 1997] [Nishimura et al. 1998] [Schwartz et al. 1996] [Siala
et al. 1994] [Wickman et al. 1999] [Wong et al. 1995]. Several links can be
used to build high bandwidth point-to-point linked networks [Hahn et al.
1995]. With ten parallel fibers, each carrying data at a bit rate of 400
Mbit/s, an aggregated bandwidth of 4 Gbit/s is achieved [Schwartz et al.
1996]. Bi-directional links, with some fibers in the fiber-ribbon cables
dedicated for each direction, are also possible [Jiang et al. 1995]. Further
discussions on fiber-ribbon aspects are presented in Section 9.5, page 153,
while more references to reports on fiber-ribbon links are found in [Tooley
1996].
Modules that support multiple high-speed channels but are not specifically
optimized for fiber-ribbons have been reported, e.g., receiver and
transmitter modules with five channels, each channel with a bit rate of 2.8
Gbit/s [Nishikido et al. 1995].
$OO2SWLFDO0XOWL$FFHVV1HWZRUNV
In an all-optical network, the data stream remains in the optical form all
the way from the transmitter to the receiver. Three basic architectures for
all-optical multi-access networks are the ring, the bus, and the star (see
Figure 1). These network architectures will be discussed below.
In the WDMA ring network described in [Irshid and Kavehrad 1992], each
node is assigned a node-unique wavelength on which to transmit.
43
(a)
(b)
(c)
)LJXUH7KUHHSDVVLYHRSWLFDOQHWZRUNDUFKLWHFWXUHVDULQJ
EGXDOEXVDQGFVWDU
44
)LJXUH)ROGHGEXV
In an optical bus, the light travels only in one direction, making it necessary
to have two buses (upper and lower), one for each direction (higher or lower
node index of destination nodes). This kind of bus architecture is called GXDO
EXV. The disadvantage of the dual bus is that two transceivers are needed in
each node. This is avoided in the IROGHG EXV, where the two buses are
connected with a wrap-around connection at one end of the buses (see
Figure 2) [Tseng and Chen 1982]. In the folded bus, transmitters are
connected to the upper bus while receivers are connected to the lower bus.
Several bus architectures and hybrids in which the bus is part of the
architectures are discussed in [Nassehi et al. 1985]. A WDMA dual bus
network is described in [Cheung 1992], and a WDMA folded bus network is
described in [Chlamtac and Ganz 1988]. WDMA dual bus networks in which
messages do not always remain unchanged until the end of the bus, but in
which the protocols support several transmissions on the same wavelength
channel simultaneously (wavelength reuse), are presented in [Huang and
Sheu 1996] [Huang and Sheu 1997].
In a star network, the incoming light waves from all nodes are combined
and uniformly distributed back to the nodes. In other words, the optical
power contained in the middle of the star is equally divided between all
nodes. WDMA star networks are described in numerous papers and are
specially treated in Section 2.6. In addition to LANs and similar networks,
WDMA star networks have also been proposed for internal use in packet
switches [Eng 1988] [Brackett 1991].
45
the tree-of-stars network (called LIGHTNING) that has wavelength routing
elements between each level [Dowd et al. 1996], the star-of-stars network
that has an electronic gateway node between each cluster and the backbone
star [Jonsson and Svensson 1997] and the multiple star network where each
node is directly connected to both a local star and a remote star [Ganz and
Gao 1992B].
All of the three basic network architectures have different advantages. The
ring has the least amount of fibers, a bus network’s medium access protocol
can utilize the linear ordering of the nodes [Nassehi et al. 1985] and the
attenuation for an (ideal) star only grows logarithmically with the number
of nodes. However, star networks are the most popular, judging from the
number of published papers.
&ODVVLILFDWLRQVRI:'0$1HWZRUNV
The classifications below give the terminology and the background
knowledge assumed in the overview of proposed networks and protocols
(medium access control protocols) in the next section. If not otherwise
stated, multi-access, single-hop broadcast-and-select networks (explained
below) are assumed in the following sections. More information on WDMA
networks is found in [Borella et al. 1998] [Brackett 1990] [Green 1993]
[Mestdagh 1995]. Predictions for future directions in WDMA networking are
found in [Brackett 1996] [Green 1996].
%URDGFDVWDQG6HOHFWDQG:DYHOHQJWK5RXWLQJ
The difference between broadcast-and-select networks and wavelength
routing networks is whether or not there exists any wavelength dependent
routing in the network [Gerstel 1996]. The path followed from a transmitter
to a receiver in a wavelength routing network is determined by the selected
wavelength. As long as two paths do not have any common fiber links, they
can use the same wavelength simultaneously (wavelength reuse). This is
not possible in a broadcast-and-select network, where all receivers always
have all transmitted channels available on the incoming fiber. However, a
broadcast-and-select network does not need any wavelength selective
devices out in the network. Instead, the receivers decide (on the basis of the
protocol used) when to receive and/or which channel to tune in.
Several wavelength routing networks have been proposed, for example, the
three-level hierarchical network described in [Alexander et al. 1993].
Another hierarchical wavelength routing network is LIGHTNING [Dowd et
al. 1995] [Dowd et al. 1996]. The topology in LIGHTNING is a tree-
structure of passive optical stars where each level is separated by a ODPEGD
SDUWLWLRQHU. The lambda partitioner selects the wavelengths that are to stay
46
in the levels beneath the partitioner and the wavelengths that will pass to
the levels above. In this way, the number of wavelengths in each fiber has
its minimum at the top level (only top level wavelengths exist) and
maximum at the bottom level (wavelengths for all levels exist). All
wavelengths that are not passed to a certain level can be reused in each
cluster on the level below.
6LQJOH+RSDQG0XOWLKRS1HWZRUNV
In a single-hop network, all nodes can reach any other node in a single hop.
This means that the transmitted data are not passed through any
intermediate routing stages and remain in optical form all the way from the
source node to the destination node. A disadvantage of single-hop networks
is that each transmitter and/or receiver must be equipped with tunable
components. Advantages are flexibility and the absence of extra latency at
intermediate nodes.
1HWZRUNVZLWKDQGZLWKRXW&RQWURO&KDQQHO
A number of protocols for WDMA networks assume a separate control
channel. This control channel is normally used to reserve access on the data
channels. Hence, these protocols are sometimes called UHVHUYDWLRQ protocols.
Protocols for networks with more than one control channel have also been
proposed, for example, 1-DT-WDMA [Humblet et al. 1992] [Humblet et al.
1993]. Non-control channel based networks often assign a fixed home
channel to each node, either on the transmitter side or on the receiver side.
47
Protocols for these networks are then called SUHDOORFDWLRQ protocols.
Protocols for networks without control channel but where access to the data
channels is divided into a control phase and a data phase are found in
[Sivalingam and Dowd 1995] [Jonsson et al. 1996]. These protocols are
called K\EULG protocols.
5HFHLYHUDQG7UDQVPLWWHU7XQDELOLW\
The most flexible networks are those with both tunable transmitters and
tunable receivers. However, protocol complexity and node cost decrease
when fixed wavelength units are used in either the transmitter or the
receiver. Using the classification scheme given in [Mukherjee 1992], WDM
networks can be divided into:
Some networks have more than one transmitter and/or receiver in each
node. A network with L fixed transmitters, M tunable transmitters, P fixed
receivers, and Q tunable receivers can be described as:
FTLTTM-FRPTRQ
CC-FTLTTM-FRPTRQ
In a packet switched network, the tuning latency is critical for the network
performance. This issue was addressed in the discussion on tunable
48
components in Section 2.3. Also, fast locking clock recovery circuits must be
used instead of the slow PLL based circuits used in point-to-point links.
Several methods to recover the clock signal on one or a few bits have been
proposed [Banu and Dunlop 1992] [Jonsson and Moen 1994] [Cerisola et al.
1995].
5DQGRP$FFHVV3URWRFROV
In a network using a random-access protocol, all nodes compete for a
transmission channel in a random (uncontrolled) way [Halsall 1995].
Random-access protocols can be contrasted with protocols with dynamic or
static pre-assigned transmission patterns. The original single-channel
random-access protocol is the Aloha protocol, where transmission is done
with no regard to other nodes. If two messages from different nodes overlap
in time, both are corrupted. Several protocols are more or less pure
improvements of the Aloha protocol, e.g., Slotted Aloha, CSMA (Carrier-
Sense Multiple-Access), and CSMA/CD (CSMA with Collision Detection). In
slotted Aloha, all nodes are synchronized and transmissions are allowed to
be started only at the beginning of a time slot. In this way, the probability of
collision is reduced. In CSMA, the transmission medium is sensed before
transmission starts. If a carrier is sensed, the transmission is postponed
until the current message has reached its end. If two nodes sense the
medium at the same time and find it free, they both begin to send and a
collision will take place. In this case, the medium is busy for the whole
duration of the corrupted transmissions. This is avoided in the CSMA/CD
protocol, where a collision can be detected during transmission. If a collision
is detected by two nodes, both nodes stop and wait for a random time before
trying again, beginning with the carrier-sense mechanism. Common to the
random-access protocols is that they do not perform well at high traffic-
loads owing to the increased probability of collision but experience low
latency at light loads.
7LPH:DYHOHQJWK$VVLJQPHQW6FKHPHV
Several algorithms have been proposed to solve different kinds of time
wavelength assignment problems [Rouskas and Ammar 1995]. One specific
problem is to schedule a traffic matrix (containing the communication
demands between every possible pair of nodes) on a limited number of
wavelength channels, minimizing the number of time slots needed [Pieris
49
Name/ Tunability Number of Channel
Reference classification wavelengths bandwidth Organization/other info.
FOX TT-FR 2 1 Gbit/s Bellcore; two stars, one for
[Arthurs et al. 1988] each direction
HYPASS TT-FR & 8 1.2 Gbit/s Bellcore; two stars, one for
[Kobrinski et al. 1988B] FT-TR each direction
[Kaminow et al. 1988] FT-TR 2 45 Mbit/s AT&T Bell Lab.
Lambdanet FT-FR 0 18 1.5 Gbit/s Bellcore
[Kobrinski et al. 1987]
Rainbow-I FT-TR 6 300 Mbit/s IBM T. J. Watson; circuit
[Janniello et al. 1992] switched traffic
7DEOH([SHULPHQWDOVLQJOHKRS:'0$VWDUQHWZRUNV0LVWKHQXPEHURI
QRGHV
and Sasaki 1994]. Other aspects may also be considered, for example,
minimizing the number of tuning periods, that is, the periods in which the
tuning of the transmitters and the receivers are not changed [Ganz and Gao
1992] [Ganz and Gao 1992B].
6LQJOH+RS:'0$6WDU1HWZRUNVDQG3URWRFROV
A number of single-hop broadcast-and-select WDMA star networks and
protocols for these kinds of networks have been proposed. They are
suggested for a wide range of applications but in general can all be used in
LANs and distributed computing systems. Several of these networks are
summarized in Table 1, Table 2, Table 3, and Table 4. Table 1 lists network
experiments, while Table 2 and Table 3 list MAC protocols for non-control
channel based and control channel based networks, respectively. Although
the protocols listed are for packet-switched traffic, protocols for circuit-
switched traffic have been proposed [Dono et al. 1990]. Some papers
7DEOH1RQFRQWUROFKDQQHOEDVHGSURWRFROVIRUVLQJOHKRSEURDGFDVWDQG
VHOHFW:'0$QHWZRUNV0LVWKHQXPEHURIQRGHVDQG1LVDSRVLWLYHLQWHJHU
JUHDWHUWKDQ]HUR
50
Name/ Tunability Num. of Control ch. Data ch. Packet
Reference classification chan. protocol protocol length
Aloha/Aloha CC-TT-TR 1 Aloha Aloha Fixed
[Habbab et al. 1987]
Slotted Aloha/Aloha CC-TT-TR 1 Slotted Aloha Aloha Fixed
[Habbab et al. 1987]
Aloha/CSMA CC-TT-TR 1 Aloha CSMA Fixed
[Habbab et al. 1987]
CSMA/Aloha CC-TT-TR 1 CSMA Aloha Fixed
[Habbab et al. 1987]
CSMA/N-Server Switch CC-TT-TR 1 CSMA N-Server Fixed
[Habbab et al. 1987] Switch
DT-WDMA CC-FT2-FRTR 0+ 1 Uniform static Distributed Fixed
[Chen et al. 1990] TDMA algorithm
Improved Slotted Aloha/ CC-TT-TR 1 Slotted Aloha Aloha Fixed
Aloha [Mehravari 1991]
Slotted Aloha/N-Server CC-TT-TR 1 Slotted Aloha N-Server Fixed
Switch [Mehravari 1991] Switch
Slotted Aloha CC-TT-TR 1 Slotted Aloha Aloha Fixed
[Sudhakar et al. 1991]
Reservation Aloha CC-TT-TR 1 Slotted Aloha Aloha Fixed
[Sudhakar et al. 1991]
DAS CC-FT2-FRTR 0+1 Uniform static Distributed Fixed
[Chipalkatti et al. 1992] TDMA algorithm
Hybrid TDM CC-FTTT1-FR1+1 0 + 1 Uniform static Two different Fixed
[Chipalkatti et al. 1992] TDMA protocols
TDMA-C [Bogineni and CC-TT-FRTR 0+1 Uniform static Distributed Var.
Dowd 1992] TDMA algorithm
1-DT-WDMA CC-FTTT-FRTR 20 slotted slotted Fixed
[Humblet et al. 1993]
[Yan et al. 1996] CC-FTTT1-FR1+1 0 + 1 Token Distr. alg. Var.
MultiS-Net [Jia and CC-TT-TR 1+1 Slotted Distributed Fixed
Mukherjee 1996] random access algorithm
7DEOH&RQWUROFKDQQHOEDVHGSURWRFROVIRUVLQJOHKRSEURDGFDVWDQGVHOHFW
:'0$QHWZRUNV0LVWKHQXPEHURIQRGHVDQG1LVDSRVLWLYHLQWHJHU
JUHDWHUWKDQ]HUR
describing single-hop WDM star networks that do not fit into the scope of
Table 1, Table 2, or Table 3, are listed in Table 4. Several of these papers
are application suggestions and early contributions of conceptual thoughts.
51
Name/ Tunability
Reference classification Other features
Photonic knockout switch FT-FR1 $SSOLFDWLRQ: centralized packet-switches.
[Eng 1988] A separate electrical control network is used.
Broadcast SYMFONET FT-FR0 $SSOLFDWLRQ: shared memory multiprocessors.
[Westmore 1991] Synchronization issues and timing are discussed.
MCA TT1-TR1 $SSOLFDWLRQ: massively parallel computers.
[Wailes and Meyer 1991]
[Brackett 1991] FT-TR $SSOLFDWLRQ centralized packet-switches.
7DEOH2WKHUSURSRVHGVLQJOHKRS:'0VWDUQHWZRUNV0LVWKHQXPEHURI
QRGHVDQG1LVDSRVLWLYHLQWHJHUJUHDWHUWKDQ]HUR
Of the protocols for non-control channel based networks (Table 2), several
are TDMA variants. For example, I-TDMA [Sivalingam et al. 1992] and I-
TDMA* [Bogineni et al. 1993] are multiple-channel variants of the
traditional static TDMA. Random access protocols have also been proposed
for non-control channel based networks. One example described in [Dowd
1991] is a variant of the Slotted Aloha protocol.
Numerous protocols for control channel based networks have been proposed
(Table 3). One of the first papers, [Habbab et al. 1987], describing protocols
for WDMA networks presents five random access protocols utilizing a
control channel. Several variants and improvements of these protocols have
later been proposed, for example, those described in [Mehravari 1991].
Several of the protocols for control channel based networks use uniform
static TDMA on the control channel and some form of distributed algorithm
to schedule access to the data channels. Some of the protocols support
variable sized packets, for example, the TDMA-C protocol [Bogineni and
Dowd 1992]. Protocols with other noteworthy features include: protocols
utilizing multiple control channels, for example, 1-DT-WDMA [Humblet et
al. 1993], where 1 is the number of nodes and control channels; tell-and-go
protocols (data are sent immediately after the control information, without
waiting for the control information from the own and/or other nodes to
return), for example, the Aloha/Aloha protocol [Habbab et al. 1987]; and
protocols not requiring dedicated hardware for the control channel, for
example, MultiS-Net [Jia and Mukherjee 1996]. Protocols for single-hop
networks are reviewed in [Mukherjee 1992].
52
focus on single-hop broadcast-and-select networks is motivated by the
simple architecture needed to support flexible traffic between end-nodes.
More WDM networks and protocols have been reported, e.g., those described
in [Jia et al. 1995] [Levine and Akyildiz 1995].
)2;
The FOX network reported in [Arthurs et al. 1988] is an interconnection
architecture for parallel computers with shared memory. Two stars are
used, one for communication from the processing elements and to the
memory modules and one for the opposite direction. Both the star networks
are TT-FR networks with a unique wavelength for each receiver. If two
processing elements transmit to the same memory module at the same time,
a collision will occur. With the motivation of a low cache miss rate, collision
detection and retransmission are proposed to be a feasible solution to this
problem. Experiments with two wavelengths and less than 20 ns tuning
latency were reported.
/DPEGDQHW
The Lambdanet WDMA star network presented in [Kobrinski et al. 1987] is
classified as FT-FR0, where 0 is the number of nodes. Although Lambdanet
was designed to have one specific wavelength per node as a transmitter
home channel, two extra wavelengths were used in the 16-node experiment,
i.e., a total number of 18 wavelengths. Commercial DFB laser diodes were
selected to obtain a channel spacing of 2 nm, while a grating wavelength
demultiplexer was used in the receiver to select one of the incoming
wavelengths. Further experiments was reported in [Goodman et al. 1990],
where the channel bit-rate was increased from 1.5 Gbit/s to 2 Gbit/s.
'7:'0$
The DT-WDMA (Dynamic Time-Wavelength Division Multi Access) protocol
for CC-FT2-FRTR networks, described in [Chen et al. 1990], divides the
access to both control channel and data channels into slots of equal size.
However, slots on the control channel are further divided into mini-slots,
one to each transmitter. When node L wants to transmit to node M, it waits
for its next mini-slot on the control channel. An address field in the control
packet is set with the address of node M. Another field is set with a delay
value related to the generation time of the message. The data message is
then sent in the data slot succeeding the control slot. After each slot when
53
all mini-slots are received, a deterministic distributed algorithm (separately
computed in each node with the same outcome) is run in each node. On the
basis of the distributed algorithm, node M will choose to tune in the node
with the largest delay value of all the nodes that have the address of node M
in the address field. All other messages destined to node M will get lost. In a
pipeline fashion, several messages can be sent before success or not of the
first message is known.
$'LVWULEXWHG$GDSWLYH3URWRFRO
In [Yan et al. 1996] and [Yan et al. 1996B], a protocol, not named in the
papers, supporting soft real-time traffic is described. The QoS (Quality of
Service) associated with a real-time packet in the network is the probability
of missing the deadline. The distributed algorithm used tries to globally
minimize the number of packets not managing the QoS by adaptively
changing the priority of the queued packets. The network architecture is
CC-FTTT1-FR1 + 1, where the fixed transmitter and one of the fixed
receivers are dedicated to the control channel. The 1 tunable transmitters
and 1 of the fixed receivers are dedicated to the 1 data channels.
Even though the protocol has sophisticated methods for real-time messages,
it is targeted only for soft real-time traffic. Guarantees can not be given that
a message will meet its deadline.
,QWHUOHDYHG7'0$
The I-TDMA (Interleaved TDMA) protocol described in [Sivalingam et al.
1992] is an extension of the traditional static uniform TDMA protocol. The
54
protocol assumes a non-control channel based network with tunable
transmitters and fixed receivers. The access to each channel is divided into
0 slots, where 0 is the number of nodes. The assignment is fully static and
gives each node one slot per cycle and channel in which to transmit. Each
node has access to a maximum of one channel at a time. If there are 0
channels in the network, each node always has access to exactly one
channel. An extension of I-TDMA, called I-TDMA*, is described in [Bogineni
et al. 1993]. The only difference is that I-TDMA* has C queues, where C is
the number of channels, for outgoing messages in each node instead of one
queue. Head-of-line problems are hence avoided. The head-of-line problem
means that a node has a packet to send but can not reach it in the queue
because there are other packets (whose destination it is not possible to
transmit to for the moment) in front of it. As with traditional TDMA, these
protocols reach high bandwidth utilization but have relatively large
latencies at low traffic intensities.
)DW0$&
The FatMAC protocol is proposed for use in distributed shared memory
multiprocessors [Sivalingam 1994] [Sivalingam and Dowd 1995]. No control
channel is used, and the network is classified as TT-FT. By choosing a laser
diode array as the tunable transmitter, broadcast is made possible through
simultaneous activation of all laser diodes in the array. The access to the
channels is divided into cycles of variable length. The cycles have two parts,
a control phase followed by a data phase. Each node has a slot with
broadcast capability in the control phase, where it transmits its
transmission demands for the cycle. The packet is scheduled in the data
phase among other demanded transmissions, following the same order as
the control slots. The length of the data phase depends on the number of
demanded transmissions. Positive features of the network include: support
for variable length packets, no need for control channel, and collisionless
transmission.
A related protocol is TD-TWDMA (see Chapter 7), which also has a control
phase and a data phase instead of a separate control channel [Jonsson et al.
1996] [Jonsson et al. 1997]. The TD-TWDMA protocol is developed for
distributed real-time systems, however, and has features for those systems.
&RQFOXVLRQV
This chapter has presented an introduction to high-performance fiber-optic
networks. The emphasis has been on multiple-channel passive optical
networks, especially WDMA networks, because these networks have the
highest potential for meeting future bandwidth demands at a reasonable
cost. Networks for numerous applications have been proposed in the
55
literature, but most of the networks and protocols referred to in this chapter
can be used in distributed computing systems too. WDMA star networks are
foreseen to have a key role in future high-performance computer
communication networks, and examples of such networks and protocols for
them have been described. Many of the components demonstrated for
WDMA networks, as reviewed in this paper, further indicate that high-
performance WDMA networks for end-user systems will be available in the
near future.
56
,QWHUFRQQHFWLRQVLQ3DUDOOHO&RPSXWHUV
Interconnection networks are often divided into static (also called direct)
and dynamic (also called indirect) networks. Static networks have a defined
static topology, where the nodes are directly connected to nearest neighbors
via point-to-point links. This forms a static topology of the network, e.g., a
two-dimensional mesh. In dynamic networks, the traffic is routed through a
switched-based network. Therefore, we can say that the nodes are
indirectly connected to each other.
Not all networks fall into one of the two categories given above. Thus two
more categories can be added: shared-medium networks and hybrid
networks [Duato et al. 1997]. After a presentation of different parameters of
interconnection networks in Section 3.1, static, dynamic, shared-medium,
and hybrid networks are described in Sections 3.2, 3.3, 3.4, and 3.5,
respectively. Section 3.6 then presents different kinds of group
communication. A discussion of routing is given in Section 3.7, and the
chapter ends with an overview of different high-performance networks for
coarse grained systems in Section 3.8.
'HVLJQDQG3HUIRUPDQFH3DUDPHWHUV
The terminology used in the litterature on interconnection networks can
vary a bit. This chapter states definitions of terms for which some are used
in the thesis, while others are included just to give a hint of which
parameters that can characterize interconnection networks. Various design
and performance parameters and desirable features of interconnection
networks for parallel computers are found in textbooks covering the area of
parallel computing [Almasi and Gottlieb 1994] [Casavant et al. 1996]
[Decegama 1989] [Hockney and Jesshope 1988] [Hwang 1993] [Hwang and
Briggs 1985] [Lawson et. al. 1992], compilations of papers [Scherson and
Youssef 1994] [Varma and Raghavendra 1994], and tutorial texts on
interconnection networks for parallel computers [Bhuyan et al. 1989] [Duato
et al. 1997] [Reed and Grunwald 1987] [Siegel 1990]. Furthermore, more
general network discussions and concept definitions are found in computer
communication textbooks [Halsall 1995] [Peterson and Davie 1996]
[Stallings 1997] [Tanenbaum 1996]. We will explain some of these below, of
which several have an influence on the analysis.
57
Latency and delay
Latency and delay are terms that can be defined in a
number of ways. One common definition, normally
called message delay, is the time from message
generation in the source node until the message is
fully received at the destination node and available
for use by the application running on it.
Transmission capacity
Transmission capacity is measured in bit/s or Byte/s
and denotes the maximum amount of data that can
be transferred per time unit over a link or
aggregately over a whole interconnection network.
Transmission capacity is often, but somewhat
misleadingly [Freeman 1998], called bandwidth.
Bisection bandwidth
The bisection of a system is the section that divides
the system into two halves with an equal numbers of
nodes. The bisection bandwidth is the aggregated
bandwidth over the links that cross the bisection. In
asymmetric systems, the number of links across the
bisection depends on where the bisection is drawn.
However, since the bisection bandwidth is a worst-
case metric, the bisection leading to the smallest
bisection bandwidth should be chosen [Hennessy and
Patterson 1996].
58
network are avoided. Conflicts caused by, e.g., a
limited amount of buffers must still be considered,
however. Since 12 connections (or cross points in a
crossbar) are needed, a conflict-free network will be
too expensive in larger systems.
59
(transmitted) order, each packet is stamped with a
sequence number. Connectionless service does not
involve connection establishment and is not an error-
free service. Because the protocol entity in a
destination PE is not aware of forthcoming packets,
it cannot ask for retransmission of packets that are
corrupted and discarded before arriving to it.
Incremental expandability
It is often desirable to be able to expand a system
with another small subsystem instead of, e.g., being
forced to double the number of nodes to maintain a
certain topology. It can also be the case that a
parallel computer system is optimized for a certain
size which can lead to, e.g., unused communication
links at smaller system sizes. The term PRGXODULW\ is
sometimes used instead of, or at least related to, the
term incremental expandability.
60
(a)
(b)
(c) (d)
(e) (f)
6WDWLF1HWZRUNV
A certain static topology is chosen in static networks. Some common
topologies are linear array, ring, 2-dimensional mesh, 2-dimensional torus,
and binary hypercube (Figure 3). The parameters below are used to describe
static networks.
61
number of hops over the longest of these shortest
paths.
Node degree The node degree, the number of links that connect a
node to its nearest neighbors, can either be constant
for the whole network or differ between the nodes. As
an example, the boundary nodes in a two-
dimensional mesh have a node degree of 3 and corner
nodes a node degree of 2, while the rest of the nodes
have a node degree of 4. A constant node degree is an
example of a feature that might make expansions of
the system easier.
If we let 1 denote the number of nodes or PEs in a system and assume bi-
directional links, we can, with these parameters, characterize the different
topologies in the figure. The linear array has a node degree of two, except
for the end nodes, and a diameter of 1 − 1. The linear array topology is
employed in, e.g., REMAP-β [Bengtsson et al. 1993].
The node degree of the two-dimensional mesh is discussed above, while the
diameter is 2(√N − 1). If a mesh network is extended with wrap-around
connections, it is called a torus. A torus has lower diameter than the mesh,
√N for the two-dimensional case, and a uniform node degree. Examples of
parallel computers with a two-dimensional mesh interconnection network
are the distributed shared memory multiprocessor from MIT, the MIT
Alewife [Agarwal et al. 1995], and the MPP SIMD computer (with the
extension of reconfigurable function of the boundary nodes) [Batcher 1980]
[Batcher 1980B]. A three-dimensional mesh network is used in the J-
machine at MIT [Dally et al. 1993].
The MasPar MP-1 [Blank 1990] [Nickolls 1992], MP-2 [MasPar 1992], and
the embedded version by Litton/MasPar [Smeyne and Nickolls 1995] has a
two-dimensional torus network where each node is connected to its eight
nearest neighbors by the use of shared X connections. Moreover, the Fujitsu
AP3000 distributed memory multicomputer has a two-dimensional torus
network [Ishihata et al. 1997], while the CRAY T3D [Kessler and
Schwarzmeier 1993] [Koeninger et al. 1994] and T3E both use the three-
dimensional torus topology.
62
A binary hypercube has two nodes along the side in each dimension, i.e., a
total of 2N nodes, where N is the dimension. The diameter is k since the
maximum distance to be travelled is one hop in each dimension. The
constant node degree of a binary hypercube is also N. Examples of computers
with hypercube interconnection networks are the Cosmic Cube with a 6-
dimensional hypercube [Seitz 1985] and the Connection Machine CM-2 with
a 12-dimensional hypercube, where each node in the hypercube consists of
16 PEs [Thinking Machines 1991]. Replacement of 1024 wires in the CM-2
by two optical fibers has been demonstrated [Lane et al. 1989].
'\QDPLF1HWZRUNV
The crossbar is the most flexible dynamic network and can be compared
with a fully connected topology, i.e., point-to-point connections between all
possible combinations of two nodes. The drawback, however, is the increase
by 1 in cost/complexity of the switch, where 1 is the number of nodes.
Systems with a single true crossbar are therefore limited to small systems.
Starfire from Sun Microsystems is a symmetric multiprocessor system with
a 16 x 16 crossbar network for data transactions [Charlesworth 1998] (for
the snoopy cache-coherence protocol, a bus is also used in the system). Other
computers with a crossbar network include the VPP500 [Miura et al. 1993]
and VPP700 [Uchida 1997] from Fujitsu.
63
Transmitter side Receiver side
)LJXUH 2PHJD QHWZRUN IRU DQ HLJKWQRGH V\VWHP 2QH SDWK WKURXJK WKH
QHWZRUNLVKLJKOLJKWHG
)LJXUH3RVVLEOHVWDWHVRID×VZLWFK
64
)LJXUH$UHDUUDQJHEOH%HQHVQHWZRUN
computer with a fat-tree network [Hillis and Tucker 1993] [Leiserson et al.
1992]. The fat-tree topology is also used in the Meiko CS-2 [Beecroft et al.
1994].
6KDUHG0HGLXP1HWZRUNV
A common way of implementing a shared-medium network is to use the bus
topology, but it can also be, e.g., a ring where only one node is allowed to
send at a time. The great advantage of a shared-medium network is the
easy implementation of broadcast, which is useful in many situations. The
disadvantage is that the bandwidth does not scale at all with the number of
nodes. The bus is commonly used in small systems, e.g., in Silicon Graphics
Power Challenge [Silicon 1994]. Multiple buses can be used to enhance
performance relative to single-bus systems [Mudge et al. 1987].
)LJXUH)DWWUHHRIVZLWFKHVZKHUHQRGHVDUHOHDYHVLQWKHWUHH
65
+\EULG1HWZRUNV
An example of a hybrid network is the hierarchical network in the Stanford
University DASH [Lenoski et al. 1992] [Lenoski et al. 1992B] [Lenoski et al.
1993]. A two-dimensional mesh network connects bus-based clusters. The
aim of the configuration is to get a scalable cache-coherent shared memory
multiprocessor. The cache-coherence protocols used are snoopy-on-the-bus
inside each cluster and a distributed directory-based protocol between the
clusters. The Paradigm instead uses a hierarchical bus network in its cache-
hierarchy implementation [Cheriton et al. 1991]. The CRAY APP in turn
groups the processing elements in groups of up to 12 processing elements
connected to a common bus. Up to seven such buses of processing elements
can be connected, via a crossbar, to a globally shared memory [Carlile 1993].
*URXS&RPPXQLFDWLRQ
Many parallel programs can take advantage of special support for group
communication, or collective communication, i.e., communication where
many nodes (or processes) are collectively involved. The nodes involved in a
group communication operation are said to be members of a group. Some
kinds of group communication are:
• 0XOWLFDVW One-to-many communication where one node sends the
same message to all members of the group. The special case in which
all nodes in the system are members of the group is called EURDGFDVW.
• 6FDWWHU One-to-many communication where one node sends different
messages to different members of the group.
• 5HGXFWLRQ JOREDO FRPELQLQJ Many-to-one communication where
different messages from different members of the group are combined
into one message for delivery to one destination node. Some common
operators used when combining are SUM, OR, and AND.
• *DWKHU Many-to-one communication where different messages from
different members of the group are concatenated in a defined order
for delivery to one destination node.
• 5HGXFH DQG VSUHDG Variant of the reduction operator, where the
result is spread to all group members.
• %DUULHU V\QFKURQL]DWLRQ A synchronization point is defined in the
program code, at which all members must arrive before any of the
members may continue beyond the synchronization point. This is a
special case of "reduce and spread" where no data is involved.
• 6FDQ For each member of the group, PL, where 1 ≤ L ≤ 0 and 0
denotes the number of nodes, a reduction is made where the node is
66
chosen to be the destination node. Each such reduction is made from
a sub-group of 1 ≤ 0 nodes, e.g., nodes PM, where L − 1 + 1 ≤ M ≤ L.
The possibility to define groups and call group communication routines is
supported in, e.g., PVM (Parallel Virtual Machine) [Geist et al. 1994]. The
underlying group communication mechanisms can be implemented in
several different ways: (L) in software using the same network as for
ordinary traffic, (LL) by the use of a more or less general network dedicated
for group communication as in the CM-5 [Leiserson et al. 1992], and (LLL) by
dedicated hardware specialized for, e.g., barrier synchronization [O’Keefe
and Dietz 1990] [O’Keefe and Dietz 1990B].
5RXWLQJ
The routing decision, or path selection, in a packet-switched interconnection
network can be made either inside the network by routers or by the end
nodes, so called source routing. Source routing can be more easily used in
parallel computers than in, e.g., internet communication [Comer 1995],
because the topology does not change as often and is not normally so
complex and/or irregular. Source routing is used in, e.g., Myrinet [Boden et
al. 1995] and the IBM SP2 communication system [Stunkel et al. 1995],
where the header of a packet includes the desired switch-setting of each
router on the way from source to destination. Each router drops its
corresponding switch-setting field in the header when the packet is
forwarded.
When a packet arrives at a router, it can be stored and error checked before
an attempt is made to forward it. This method is called store-and-forward
and may be simple to implement but requires some buffer memory and adds
significant latency for each router that is passed on the path from source to
destination. The alternative is to use cut-through switching, i.e., only the
header of the packet with the destination address must arrive before the
packet can be forwarded to an output port [Kermani and Kleinrock 1979].
This means that it is not necessary to store the whole packet and it will only
experience a low latency because the router begins to forward the packet
67
before it is fully received. If there is no suitable free output port, the rest of
the packet can be received and stored as in store-and-forward.
+LJK3HUIRUPDQFH1HWZRUNVIRU&RDUVH*UDLQHG
3DUDOOHO&RPSXWHUV
There are several different general high-performance networks that can be
used to connect rather powerful and possibly heterogeneous computing
nodes, which might be physically separated by several tens of meters or
more. Both standards and ongoing research projects exist. Often, these
kinds of networks are used in NOWs (Networks of Workstations) or COWs
(Clusters of Workstations) [Anderson et al. 1995]. Another possibility is to
have a heterogeneous system of both workstations and supercomputers.
68
An interconnection system similar to Myrinet, but especially developed for
embedded systems, is RACEway from Mercury Computer Systems
[Kuszmaul 1995] [Einstein 1996] [Isenstein 1994] [Mercury 1998]. A
RACEway system is built up of six-port (bi-directional) crossbar switches to
get an active backplane. Several different topologies can be chosen, but the
typical one is a fat-tree of switches, where each switch has four children and
two parents. Circuit switching with source routing are used. Support for
real-time traffic is obtained by using priorities, where a higher priority
transmission preempts a lower priority transmission. The link bandwidth is
160 MByte/s.
Other high performance networks for rather coarse grained systems include:
• 1HFWDU switched-based source-routing network [Arnould et al. 1989]
[Steenkiste 1996]
• )LEUH&KDQQHOstandardized network supporting different link speeds
and topologies, e.g., switch-based [Anderson and Cornelius 1992]
[Boisseau et al. 1994] [Emerson 1995] [Sachs and Varma 1996]
[Saunders 1996]
• +,33, standardized network where switches can be used to switch
point-to-point links, each with 800 Mbit/s or 1.6 Gbit/s, simplex or
duplex [Saunders 1996] [Tolmie and Renwick 1993]
• 71HW switch based wormhole routing network [Horst 1995]
• 6&, standardized network supporting cache coherence in different
topologies of the network, e.g., ring or switch based [Gustavson and
Li 1996] [IEEE 1993]. Used in, e.g., a system from Sequent [Lovett
and Clapp 1996]
• 6SLGHU short-distance (few meters) switch based network with 2 × 1
GByte/s full duplex links [Galles 1997], used in SGI’s Origin
computer systems [Laudon and Lenoski 1997]
• +$/
V0HUFXU\,QWHUFRQQHFW$UFKLWHFWXUH network based on crossbars
with six 1.6 + 1.6 Gbyte/s full duplex ports [Weber et al. 1997]
• *LJD5LQJ a 600 + 600 MByte/s dual ring network developed by Cray
Research for use as a supercomputer interconnect [Scott 1996].
Experiments with ATM networks in parallel and distributed computing
systems have also been reported [Eicken et al. 1995].
69
3DUW,,5HYLHZRI5DGDU6LJQDO3URFHVVLQJ
6\VWHPVDQG6XLWDEOH,QWHUFRQQHFWLRQ
1HWZRUNV
71
$6DPSOH5DGDU6LJQDO3URFHVVLQJ6\VWHP
The signal processing system under consideration is primarily developed for
use in applications with a phased array antenna, i.e., an antenna with
multiple fixed antenna elements (and digital beam forming) instead of a
moveable antenna. The system has a number of different requirements
depending on the application, but the algorithms are usually well known.
They comprise mainly linear operations such as matrix-by-vector
multiplication, matrix inversion, FIR-filtering, DFT etc. In general, the
functions will work on relatively short vectors and small matrices, but at a
fast pace and with large sets of vectors and matrices.
One of the goals of our research is to find a good scalable architecture which
can give sufficiently high computing speed for this application without too
great a loss in generality, i.e., the use of efficient programmable computers
is preferred. A solution to this is to use the two-dimensional array SIMD
machine as a building block. The two-dimensional array is well known in
the literature, and a number of machines have been built, e.g., MasPar
[Blank 1990] [Nickolls 1992] [MasPar 1992], Connection Machine [Thinking
Machines 1991], and DAP [Hord 1990]. Numerous successful mappings of
algorithms on these machines have been done. However, these machines are
not efficient when the calculations are too small for the machine size, i.e., it
is difficult to fit a 32 by 32 matrix problem on a 65k processor machine.
Thus, it can be noted that the mesh is a promising architecture, but that the
size of the mesh should be in the order of the size of the data structures in
the calculations, i.e., the size of the matrices in the data set. To cope with
the large number of matrices in the data, many computation modules, each
with the two-dimensional array topology, can be interconnected to share the
load. A very powerful interconnection network is needed for this, because
each computation module (hereafter also called node) can produce a
sustained data flow of several Gbit/s. Guaranteed bandwidth must also be
supported in order not to disturb the dataflow.
This chapter briefly describes our proposal for a computer system, which is
a MSIMD mesh system, intended to meet the imposed requirements in
terms of computing power, generality, size, and power consumption. The
computation module architecture is presented in Section 4.1, and system
design issues are discussed in Section 4.2. The communication demands are
treated in Section 4.3. More detailed descriptions of the radar signal
processing systems developed are found in other publications by our
research group [Taveniku et al. 1996] [Taveniku et al. 1998] [Ahlander
1996] [Taveniku 1997].
73
)LJXUH&RPSXWDWLRQPRGXOHZLWKHLJKW×PHVKHVRI3(V
&RPSXWDWLRQ0RGXOH'HVLJQ
Several similar computation module designs have been developed and
evaluated in the scope of the projects of which this thesis has been part. One
is the computation module shown in a simplified form in Figure 8 [Taveniku
et al. 1996]. Each module consists of eight 8-by-8 meshes working in an
SIMD fashion. Internally in the meshes, the processors (PEs) are connected
with nearest neighbor connections (x-grid), together with row and column
broadcast lines. Computations and inter-PE communication can be
performed simultaneously. In addition to the PE meshes, the module holds
I/O buffers, memory, and control units.
6\VWHP'HVLJQ,VVXHV
The overall system design is a scalable, moderately parallel MIMD system
with moderately parallel SIMD modules interconnected with a scalable,
optical real-time interconnection network. The applications are
implemented on the system as follows. A multi-mode radar application can
in general be described as a set of independent modes of operation.
Furthermore, each mode is described as a series of transformations on the
sampled data stream. In addition to this, there are control functions and
registration functions controlling and monitoring the system.
74
f4
f0 f1 f2 f3 f0 = FFT 4 * 8
Control
CM0 p p p p
CM 4
p p p p
CM2 buff
p p p p
p p p p
CM1
CM 3 mem
,QWHU0RGXOH&RPPXQLFDWLRQ'HPDQGV
As an example, we show how signal processing in a ground based
surveillance radar system is implemented using the approach described in
Section 4.2 [Taveniku et al. 1996]. In the system, 64 lobes are created that
use 64 receiver channels. The data flow in one mode of operation can be
described as shown in Figure 10, while the total computational demand is
40 GFLOPS. The nodes in this system are SIMD computers of the array-of-
meshes type (see Section 4.1) with a sustained performance of 4 GFLOPS.
The functions are mapped onto the nodes as shown, together with the
bandwidth demands, in Figure 11. The algorithms are then individually
mapped onto the specific processor array, in this case an 8-by-8-by-8 mesh
PE array as shown in Figure 8. The chain will only figure as a sample
75
)LJXUH 'HVFULSWLRQ RI RQH PRGH RI RSHUDWLRQ LQ WKH FKDQQHO JURXQG
EDVHGUDGDUV\VWHP
The figure shows how the work is split in a coarse grained MIMD fashion,
where each computational module is itself powerful (containing multiple
processors in this case). A data cube that initially comes from the antenna
contains data in three dimensions (channel, pulse, and distance). After the
processing of one stage, the new data cube is forwarded to the next node in
the chain. As a pipelined system, a module can start processing new data as
soon as it has sent the results of the former calculation to the next node. The
aggregated throughput demand is about 45 Gbit/s, including the data from
the antenna (Node 1) feeding the chain. As seen in the figure, the chain
contains both multicast, one-to-many, and many-to-one communication
patterns.
Corner-turning of the data cube is done when the PEs must process data
along another dimension of the data cube. The memory modules are used for
this task. A memory module stores incoming messages from the
communication system in such a way that the whole corner-turned data
cube is funally in memory. The memory modules may not be needed if the
computational modules have enough memory to store a whole data cube.
Also in this case, the communication system does a main part of the corner-
turning.
76
Beam 2 x 3.0 Gb/s
Pulse Compression
Forming
4 x 750
Mb/s
10 11 12 13 14
)LJXUH'DWDIORZEHWZHHQWKHPRGXOHVLQWKHVLJQDOSURFHVVLQJFKDLQ
Although control I/O and high speed data I/O are logically separated in the
module, a single network interface handles all intermodule communication.
Since each module itself can have a sustained output data rate of several
Gbit/s, a powerful interconnection network is needed. Another important
feature of the network is the ability to guarantee that the time constraints
are met, i.e., the data flow must not be disturbed by, for example, status
information that the network must also transport. A network that can
guarantee delivery of semi-static high bandwidth traffic at the same time
that it carries rapidly changing control and status traffic is therefore
valuable, or even required. We suggest two different ways of implementing
the communication network, both employing fiber-optic technology and
supporting guaranteed timely delivery. The first network is a WDM star
network which scales to large systems, and the second is a fiber-ribbon ring
network suitable for systems of a moderate size and/or systems with high
degrees of nearest neighbor communication. These two main network
architectures, together with different design alternatives, are described in
Chapters 7 through 10.
77
&RQILJXUDWLRQVDQG5HTXLUHPHQWVRI6LJQDO
3URFHVVLQJ6\VWHPV
Future radar signal processing systems will have high computational
demands, which implies that parallel computer systems are needed. In turn,
the performance of parallel computers is highly dependent on the
performance of their interconnection networks. Next chapter, reviews
optical interconnection networks from a signal processing perspective. The
networks are evaluated according to their suitability for mapping signal
processing chains in, e.g., radar systems. Different ways to map are
explained. These kinds of mappings are simplified in their nature to
facilitate an evaluation of the diverse range of networks. More detailed
discussions of algorithm mapping in similar signal processing systems are
found in [Liu and Prasanna 1998].
3XUH3LSHOLQH&KDLQ
The simplest case considered is a single, straightforward pipeline chain.
Such a system is shown in Figure 12, where each shaded box represents a
computational module and the cubes represent the data flow. Each
computational module runs one or several pipeline stages, and the arcs
represent only the dataflow between modules. The physical topology can,
e.g., be a ring or a crossbar switch. The chain is a purified form of the
sample chain in Figure 11 (Page 77), without, e.g., multicasting to simplify
the discussion of performance.
Computational
module
)LJXUH$SXUHSLSHOLQHFKDLQZKHUHRQHRUPRUHVWDJHVLQ
WKHFKDLQDUHPDSSHGRQHDFKPRGXOH
79
Computational
module
)LJXUH,IWKH630'PRGHOLVXVHGDOO3(VZRUNWRJHWKHURQ
RQHVWDJHLQWKHVLJQDOSURFHVVLQJFKDLQDWDWLPH
6DPH3URJUDP0XOWLSOH'DWD
A common way of mapping radar signal processing algorithms onto parallel
computers is to let each processor carry out the same operation but on
different sets of data, i.e., SPMD (Same Program Multiple Data). All the
PEs in Figure 13 then work together on one of the pipeline stages in Figure
11 (Page 77) at a time. After the processing of one stage, the data cube is
redistributed if necessary. Since this is not a pure pipelined system, it might
be harder to overlap communication and computation.
Although not considered here, there might occur communication during the
processing of one stage, e.g., nearest neighbor communication, if a static
topology like the two-dimensional mesh is used. The number of times it is
necessary to redistribute the data cube between the PEs may however be
reduced as compared with the number of times the data cube must be
transferred to the next module for processing in a pipeline chain. Instead, it
might be harder to overlap communication and computation, which implies
extra bandwidth requirements [Teitelbaum 1998]. When corner turning the
data cube, one half of the data cube must be transferred across the network
bisection [Teitelbaum 1998]. The bisection bandwidth is therefore an
essential performance parameter in this strategy.
80
Computational
module
)LJXUH6HYHUDOSDUDOOHOJURXSVRI3(VHDFKUXQQLQJDSLSHOLQHFKDLQ
SURJUDP
Incoming data from the antenna are not shown in Figure 13 but are
assumed to be fed into the PEs in such a way that it does not affect the
communication pattern shown in the figure. For example, special I/O
channels can be used instead of the communication system carrying the
traffic indicated by the arcs in the figure.
3DUDOOHO+DUGZDUH)RU0XOWLSOH0RGHV
When there are several concurrent operation modes of the radar instead of
only one, these can be mapped in a number of different ways. One way is to
have several groups of processing elements, each living its own life and
dedicated for one mode. Assuming one of the two kinds of process mapping
described in Sections 5.1 and 5.2, we have two ways of mapping concurrent
modes totally in parallel.
Figure 14 shows a system of parallel pipeline chains, chain with its own
computational modules. The incoming data from the antenna are multicast
to the first node in each chain, i.e., all modes operate on all data cubes
(instead of operating in an interleaved way, which is also possible).
Figure 15 shows a system with parallel groups of PEs. Each group executes
an SPMD program that corresponds to the dedicated working mode. In this
case also, some form of multicasting to spread the incoming data is
assumed.
81
Computational
module
)LJXUH6HYHUDOSDUDOOHOJURXSVRI3(VHDFKUXQQLQJDQ630'SURJUDP
&RQFXUUHQW0RGHVRQWKH6DPH+DUGZDUH
The system configurations in which several modes run concurrently on the
same set of nodes place a demand on the network to be reconfigurable. Two
different ways of mapping are considered here and described below.
The first way is to have concurrent modes where all nodes switch mode at
each new data batch. This means that the signal processing chain for each
mode is only fed with one of 1 data batches, where 1 is the number of
modes. Each node switches mode of operation in the order of once every ten
ms. It should hence be possible to reconfigure the network in a few hundred
microseconds since reconfiguration might be needed for each mode change.
The reason is that the communication pattern can differ from mode to mode.
As an example, a node concurrently running several pipeline chains like the
one shown in Figure 12 (but not pure pipeline) might perform a multicast in
one mode and one-to-many or one-to-one in the next mode. Similar
82
Computational
module
The second way to have concurrent modes on the same hardware is to let
each node switch mode several times per data batch. In this way, the signal
processing chain for each mode is fed with all data batches. Of course, in the
general case, this requires more processing power. Also, which is of greater
interest here, reconfiguration must be done faster. Depending on the
number of working modes and how the algorithms are mapped onto the
computational modules, reconfiguring the network in a few tens of
microseconds or less might be desirable.
2QH'DWD&XEHSHU3URFHVVLQJ(OHPHQW
One can allow the incoming data to be distributed such that each data cube
is given to a different processor, as shown in Figure 16. All calculations in
the whole signal processing chain are then performed by the same processor
for each data cube, therefore requiring no communication before the results
are gathered. Although this way of mapping minimizes communication it
will typically not be a good solution because the computational latency will
be too great. For example, consider a system in which the data are
distributed to 100 processors, one data cube at a time to each processor. If
100 percent processor utilization is assumed and the CPI (Coherent
Processing Interval, i.e., the time between the start of two subsequent data
cubes) is ten ms, then the computational latency will be one s, which is not
83
.LQGRI &RPPXQLFDWLRQ
FRPPXQLFDWLRQ GLVWDQFHV
Intra chip 0.1 - 2 cm
Intra MCM 1 - 10 cm
Intra board 5 - 50 cm
Inter board 0.1 - 1 m
Inter cabinet 0.5 - 10 m
Inter and intra room 10 - 100 m
Intra and inter building 100 m - 10 km
7DEOH&ODVVLILFDWLRQRIV\VWHPVL]HVDQG
FRPPXQLFDWLRQGLVWDQFHV
7UDIILF7\SHV
For many of the cases described above, circuit switching might work well for
the data flow in the signal processing chain because the mode is not
changed very often, e.g., not for every single data cube. However, packet
switching is needed to carry short messages like control and status
messages. There is no time to set up a circuit for this kind of messages, and
to always have circuits connected is not viable for sporadic traffic.
Nevertheless, packet switching can be supported by a totally different
subsystem, e.g., an optical subsystem for circuit switching and an electrical
subsystem for packet switching.
6\VWHP6L]HVDQG&RPPXQLFDWLRQ'LVWDQFHV
Different communication systems may fit different ranges of system sizes or
communication distances. One interconnection network might, for example,
not offer sufficient throughput over a long distance. Another might be
physically too large or too expensive compared with other solutions for small
computer systems. An evaluation is made to come to qualitative judgements
about the communication systems’ suitability for the different system sizes
and communication distances given in Table 5.
84
(YDOXDWLRQRI2SWLFDO,QWHUFRQQHFWLRQ6\VWHPV
A number of proposals for optical or optoelectronic communication systems
suitable in a radar signal processing system is given in Sections 6.1 through
6.9, some of which are hybrids of several other systems. Although there
exist many other optical interconnection architectures that might be
candidates, only some selected groups or concepts are selected here to give a
reasonably broad view of possible solutions. Somewhat more emphasis is
placed on pure fiber-optic solutions than, e.g., free space systems. The
survey ends with a summarizing evaluation in Section 6.10. The
classification made in the chapter is to some extent influenced by work
referred to in Section 1.5. There are some previous surveys and tutorial
texts in the field of optical interconnects [Goldberg 1997] [Kurokawa and
Ikegami 1996], the latter focusing on parallel computers and ATM switches.
)LEHU5LEERQ5LQJ1HWZRUN
Bit-parallel transfer can be utilized when fiber-ribbon cables/links are used
to connect the nodes in a point-to-point linked ring network. In such a
network, one of the fibers in each ribbon is dedicated to carry the clock
signal. Therefore, no clock-recovery circuits are needed in the receivers.
Other fibers can be utilized for, e.g., frame synchronization.
Node 1 Node 2
Node 0 Node 3
Node 5 Node 4
)LJXUH([DPSOHRIVSDWLDOEDQGZLGWKUHXVH1RGH0VHQGV
WR1RGHDWWKHVDPHWLPHDV1RGHVHQGVWR1RGHDQG1RGH
VHQGVDPXOWLFDVWSDFNHWWR1RGHVDQG
85
in ring networks with support for spatial bandwidth reuse (sometimes
called pipeline rings) [Wong and Yum 1994]. This feature can be effectively
used in signal processing applications with a pipelined dataflow, i.e., most of
the communication is to the nearest down stream neighbor. Two fiber-
ribbon pipeline ring networks have recently been reported [Jonsson 1998B].
The first network has support for circuit switching on 8+1 fibers (data and
clock) and packet-switching on an additional fiber [Jonsson et al. 1997B].
The second network is more flexible but is a little more complex, and has
support for packet switching on 8+1 fibers and uses a tenth fiber for control
packets (see Figure 18) [Jonsson 1998]. The control packets carries MAC
information for the collision-less MAC protocol with support for slot
reserving. Slot reserving can be used to get RTVCs (Real-Time Virtual
Channels) [Arvind et al. 1991] for which guaranteed bandwidth and a worst-
case latency are specified (compare with circuit switching).
:'06WDU1HWZRUN
A passive fiber-optic star distributes all incoming light on the input ports to
all output ports. A network with the logical function of a bus is obtained
Node 1 Node 2
Node 0 Node 3
= Packet-switched data
channel (8 fibers)
= Control channel (1 fiber)
Node 5 Node 4
= Clock channel (1 fiber)
)LJXUH&RQWUROFKDQQHOEDVHGQHWZRUNEXLOWXSZLWK ILEHUULEERQ
SRLQWWRSRLQWOLQNV
86
when connecting the transmitting and receiving side of each node to one
input and output fiber of the star, respectively. By using WDM, multiple
wavelength channels can carry data simultaneously in the network
[Brackett 1990]. In other words, each channel has a specific color of light. A
flexible WDMA network requires tunable receivers and/or transmitters, i.e.,
it should be possible to send/listen on an arbitrary channel [Mukherjee
1992].
λ 1 λ 2 λ Ν
)LJXUH:'0VWDUQHWZRUN
87
e.g., during the processing of a data cube in a radar system with a pipelined
mapping of the processing stages. To support some more rapidly changing
traffic patterns, the nodes can be extended with transmitters and receivers
fixed-tuned to a special broadcast channel. This configuration can be
compared to having support for both circuit switching and packet switching.
7DEOH$IHDVLEOHDOORFDWLRQVFKHPHRIWKH
ZDYHOHQJWK FKDQQHOV LQ D :'0 VWDU
QHWZRUNIRUWKHVDPSOHUDGDUV\VWHP
88
λ 1 λ 2 λ 3 λ 4
λ 2, λ 4 λ 1, λ 3 λ 2, λ 4 λ 1, λ 3
)LJXUH:'0VWDUPXOWLKRSQHWZRUN
:'05LQJ1HWZRUN
A WDM ring network utilizes ADMs in all nodes to insert, listen, and
remove wavelength channels to/from the ring. In the WDMA ring network
described in [Irshid and Kavehrad 1992], each node is assigned a node-
unique wavelength on which to transmit. The other nodes can then tune in
an arbitrary channel on which to listen. This configuration is logically the
89
Transmitter side Receiver side
λ λ2
Node 1 1 Node 1
λ4
λ1
Node 2 Node 2
λ2 λ3
λ2
Node 3 Node 3
λ3 λ4
λ1
Node 4 λ λ 3 Node 4
4
)LJXUH0XOWLKRSWRSRORJ\
same as that for the WDM star network with fixed transmitters and tunable
receivers. The distributed crossbar again gives good performance for general
communication patterns, such as in a radar system with the SPMD model.
As discussed for the WDM star network, components with long tuning
latencies (ADMs in the case of a ring) can be used for traffic patterns that do
not change rapidly. An additional broadcast wavelength dedicated for
packet switching keeps the network flexible. A single 6 Gbit/s channel (in
addition to the broadcast channel) is sufficient for the signal processing
chain shown in Figure 11, if the ADMs in Nodes 1, 2, 3, 6, 7, 8, and 9
terminate the channel for wavelength reuse.
,QWHJUDWHG)LEHUDQG:DYHJXLGH6ROXWLRQV
Fibers or other kinds of waveguides (hereafter commonly denoted as
channels) can be integrated to form a more or less compact system of
channels. Fibers can be laminated to form a foil of channels, for use as
intra-PCB (Printed Circuit Board) or back-plane interconnection systems
[Eriksen et al. 1995] [Robertsson et al. 1995] [Shahid and Holland 1996].
Fiber-ribbon connectors are applied to fiber end-points of the foil. An
90
Node 1
In Out
In
Node 2
Out
Out
Node 4
In
Out
Out In Clock
Node 3
)LJXUH$IRLORIILEHUVFRQQHFWVIRXUFRPSXWDWLRQDOQRGHV,QDGGLWLRQD
FORFNQRGHGLVWULEXWHVFORFNVLJQDOVWRWKHFRPSXWDWLRQDOQRGHV
Another way is to follow the proposed use of an array of passive optical stars
to connect processor boards in a multiprocessor system via fiber-ribbon
91
Fiber-ribbon
)LJXUH $UUD\ RI SDVVLYH RSWLFDO VWDUV FRQQHFWV D QXPEHU RI QRGHV YLD
ILEHUULEERQFDEOHV
links, for which experiments with 6 x 700 Mbit/s fiber-ribbon links were
done (see Figure 23) [Parker 1991] [Parker et al. 1992]. As indicated above,
such a configuration can be integrated by the use of polymer waveguides.
The power budget can, however, be a limiting factor to the number of nodes
and/or the distance. Advantages are simple hardware owing to bit-parallel
transmission (like other fiber-ribbon solutions) and the broadcast nature.
Many-to-many communication patterns, as used when corner-turning in
SPMD mode, map easily on the broadcast architecture as long as the star
array does not become a bottle-neck. In a similar system, the star array is
exchanged by a chip (with optoelectronics) that has one incoming ribbon
from each node and one output ribbon [Lukowicz et al. 1998]. The output
ribbon is coupled to an array of 1 × 1 couplers so that each node has a
ribbon connected to its receiver. The chip couples the incoming traffic
together in a way that simulates a bus. At contention, the chip can
temporarily store packets.
92
Node 1
Node 4
Node 2
Node 3
)LJXUH)XOO\FRQQHFWHGWRSRORJ\ZKHUHHDFKQRGHKDV
DQ DUUD\ RI 1 − ODVHU GLRGHV DQG DQ DUUD\ RI 1 −
SKRWRGLRGHVZKHUH1LVWKHQXPEHURIQRGHV
Other similar systems include the integration of fibers into a PCB for the
purpose of clock distribution [Li et al. 1998]. Distribution to up to 128 nodes
was demonstrated. The fibers are laminated on one side of the PCB, while
integrated circuits are placed on the reversed side. The end section of each
fiber is bent 90 degrees to lead the light through a so called via hole to the
reversed side of the PCB.
2SWLFDO,QWHUFRQQHFWLRQVDQG(OHFWURQLF&URVVEDUV
Communication systems such as in Myrinet [Boden et al. 1995], where
arbitrary switched topologies can be built by using electrical switches, can
support a number of different traffic patterns possible in radar systems.
Fiber-ribbons can be used to increase bandwidth, compared to electrical
systems, while still sending in bit-parallel mode. It is possible to have bit
rates in the order of 1 Gbit/s over each fiber in the ribbon over tens of
meters using standard fiber-ribbons. As noted in Section 6.4, foils of fibers
or waveguides (e.g., arranged as ribbons) can be used to interconnect nodes
and crossbars on the PCB and/or back-plane level.
93
cable) can also be dedicated to other purposes such as frame synchronization
and flow control. Significantly higher bandwidth distance products can be
achieved when using bit-parallel WDM over dispersion shifted fiber instead
of fiber-ribbons [Bergman et al. 1998] [Bergman et al. 1998B]. If, however,
there is only communication over shorter distances (e.g., a few meters), the
bandwidth distance product is not necessarily a limiting factor.
Transmission experiments with an array of eight pie-shaped VCSELs
arranged in a circular area with a diameter of 60 µm, to match the core of a
multimode fiber, have been reported [Coldren et al. 1998]. Other work on
the integration of components for short distance (non telecom) WDM links
has been reported, e.g., a 4 × 2.5 Gbit/s transceiver with integrated splitter,
combiner, filters, etc. [Aronson et al. 1998].
6\VWHPVZLWK6PDUW3L[HO$UUD\V
In smart pixel based systems, the interconnection network can normally not
be seen as a stand-alone subsystem. Instead, processors and optoelectronic
devices for communication are integrated on the same substrate [Neff et al.
1996]. Typically, smart pixels are organized in a two-dimensional array
(e.g., on a chip) where each smart pixel consists of a processor, a laser diode
and a photo diode. Other, but similar, configurations exist, for example,
where incoming light is modulated by a modulator in the smart pixel.
94
)LJXUH5RZRIVPDUWSL[HODUUD\V
A system where smart pixel arrays are connected in a ring has been
reported [Chen et al. 1998] [Chen et al. 1998B]. Each array operates in
SIMD mode on two-dimensional data. A modified CSMA/CD protocol is used
for arbitration in the ring where some of the pixels in each array are used
for data and some for addressing and clocking.
95
2SWLFDODQG2SWRHOHFWURQLF6ZLWFK)DEULFV
The architecture with optical interconnections and electronic crossbars is
flexible and powerful. Optics and optoelectronics can however also be used
internally in a switch fabric, i.e., more than just in the I/O interface. A
broad spectrum of solutions has been proposed, and some examples are
given below.
96
)LJXUH ([DPSOH RI D SODQDU IUHH VSDFH V\VWHP 7KH GLUHFWLRQ RI WKH
EHDPLVVWHHUHGE\WKHRSWLFDOHOHPHQWRQWKHZD\EHWZHHQWZRFKLSV
3ODQDU)UHH6SDFH2SWLFV
By placing electronic chips (including optoelectronic devices) and optical
elements on a substrate where light beams can travel, we get a planar free
space system (Figure 26) [Jahns 1994] [Jahns 1998] [Sinzinger 1998].
Electronic chips are placed in a two-dimensional plane, while light beams
travel in a three-dimensional space. In this way, optical systems can be
integrated monolithically, which brings compact, stable and potentially
inexpensive systems [Jahns 1998].
The interconnection pattern in a planar free space system can, for example,
be chosen with respect to a pipelined dataflow between chips. Another
possibility is to have a more general topology, such as the two-dimensional
mesh, or to have special optical or optoelectronic devices dedicated toswitch
functions. The latter configurations may be the best choice if the SPMD
program model is used.
)UHH6SDFH2SWLFDO%DFNSODQHV
Several different optical backplanes have been proposed, three of which are
discussed below. As shown in Figure 27a, using planar free space optics is
one means of transporting optical signals between PCBs. Holographic
gratings can be used to insert and extract the optical signals to/from the
waveguide, which may be a glass substrate [Zhao et al. 1995]. Several
beams or bus lines can be used, i.e., each arrow in the figure represents
several parallel beams [Zhao et al.1996].
97
(a) (b)
(c)
)LJXUH2SWLFDOEDFNSODQHFRQILJXUDWLRQVDZLWKSODQDUIUHHVSDFH
RSWLFVEZLWKVPDUWSL[HODUUD\VDQGFZLWKDPLUURU
98
1HWZRUN 3LSHOLQH 630' 1RWHV
Fiber-ribbon pipeline ring Good Moderate Good for SPMD too if enough
bandwidth
WDM star, rapid tuning Good Good Flexible passive network
WDM star, slow tuning and Good Poor WDM star alternative that
broadcast channel may be cheaper
Multi-hop WDM star Moderate Moderate Can be optimized for
pipelined mapping
WDM ring with rapid tuning Good Good More channels might be
needed for SPMD
WDM ring, slow tuning and Good Poor WDM ring alternative that
broadcast channel may be cheaper
Fiber ribbons and array of Moderate Moderate Power and bandwidth limited
stars
Fully connected topology Moderate Moderate Bandwidth limited. Cost
with broadcast driving grows with N2
Fully connected topology Good Good Cost grows with N2
with flexible driving
Optical fibers and electronic Good Good Optolectronics also needed in
crossbars switch
7DEOH 3HUIRUPDQFH VXPPDU\ RI VRPH RI WKH QHWZRUNV GLVFXVVHG ZLWK
UHVSHFWWRSLSHOLQHDQG630'PDSSLQJ
Of the three types of optical backplanes discussed, the one with smart pixel
arrays seems to be the most powerful. On the other hand, a simple passive
optical backplane may have other advantages. Other optical backplanes
have been proposed, e.g., a bus where optical signals can pass through
transparent photo detectors or be modulated by spatial light modulators
[Hamanaka 1991].
6XPPDU\
Some of the networks that incorporate fiber-optics are summarized in Table
7 with remarks on their suitability/performance for the two basic cases of
mapping (pipeline and SPMD). The pipeline mapping fits on a larger variety
of networks because of the absence of all-to-all traffic patterns. Limiting
factors tothe use of SPMD mapping vary from network to network; these
may be tuning speed when switching from many different sources and
destinations, shared resources that become bottlenecks and a topology that
favors nearest-neighbor communication.
99
1HWZRUN 3LSHOLQH 630' 1RWHV
Fiber-ribbon pipeline ring Moderate Poor Extra bandwidth needed
to distribute input data
WDM star, rapid tuning Good Good Broadcast support is used
WDM star, slow tuning and Good Poor Broadcast support is used
broadcast channel
Multi-hop WDM star Moderate Moderate Depends on which virtual
topology is chosen
WDM ring with rapid tuning Good Good Broadcast support is used
WDM ring, slow tuning and Good Poor Broadcast support is used
broadcast channel
Fiber ribbons and array of stars Moderate Moderate Broadcast support is used
Fully connected topology with Moderate Moderate Broadcast support is used
broadcast driving
Fully connected topology with Good Good Broadcast support is used
flexible driving
Optical fibers and electronic Good Good Broadcast support is used
crossbars
7DEOH 3HUIRUPDQFH VXPPDU\ RI VRPH RI WKH QHWZRUNV GLVFXVVHG ZLWK
UHVSHFWWRSLSHOLQHDQG630'PDSSLQJZKHQVHYHUDOFRQFXUUHQWPRGHVUXQ
LQSDUDOOHOHDFKRQGHGLFDWHGKDUGZDUH
antenna must, in the pipeline case, be multicasted to the first node in each
group of nodes dedicated to a working mode. In the case of SPMD, the
incoming data are distributed by multiple multicast transmissions, each
carrying a subset of the data cube.
100
.LQGRI :'0 )LEHU )RLORI IUHH
FRPPXQLFDWLRQ VWDUULQJ ULEERQ ILEHUV VSDFH
Intra chip Good
Intra MCM Good
Intra board Good Good
Inter board Expensive Good Good (Good)
Inter cabinet Expensive Good
Intra and inter room Good Good
Intra and inter building Good Moderate
7DEOH6XLWDELOLW\LQGLIIHUHQWV\VWHPVL]HV(PSW\FHOOV
LQ D FROXPQ PHDQ WKDW WKH WHFKQRORJ\QHWZRUN LV QRW
VXLWDEOHIRUWKHFRUUHVSRQGLQJV\VWHPVL]HV
Not all optical interconnection networks are quite as easy to categorize into
the groups above, e.g. those described in [Chiarulli et al. 1994] [Teza et al.
1995] [Louri et al. 1996], the latter describing a network using both free-
space and fiber-ribbon interconnects. In another system, plastic modules
performing different functions are snapped together to form a free space
optical interconnection system [Neilson and Schenfeld 1998]. Examples of
module functions are relaying and beam splitting of a two-dimensional
array of incoming beams.
101
3DUW,,,3URSRVHG1HWZRUN$UFKLWHFWXUHV
DQG3URWRFROV
103
)775:'06WDU1HWZRUN
High-performance interconnection networks can be foreseen to have a
central role in future distributed real-time systems. If a number of
computation modules, each very powerful or even parallel, are used to
obtain a massively parallel or distributed system, a modular interconnection
network able to carry a huge amount of data is needed. Other key features
of the network are time deterministic latency and guarantees to meet
deadlines. Application examples are future radar signal processing systems,
distributed multimedia systems, and image processing systems. A typical
system is the radar signal processing system described in [Jonsson et al.
1996] [Taveniku et al. 1996], where each module consists of a SIMD
computer and a network interface. In this way, an MSIMD computer system
is formed. Other applications in which the MSIMD architecture with a high
performance interconnection network may be required are described in
[Davis et al. 1992] [Svensson and Wiberg 1993]. It should noted that other
MIMD systems can have similar network requirements as MSIMD systems.
The WDM technique [Hill 1989] [Green and Ramaswami 1990] offers
extremely high aggregated bandwidth by the use of multiple wavelength
channels in fiber-optic communication systems, and is expected to have a
key role in future computer networks. Each channel has a specific
wavelength, i.e., color, of the light. A promising network architecture for
high performance systems, for which commercial components have already
appeared, is the WDM star network with multiple Gbit/s channels. This is
based on a passive optical star which implements a fiber-optic multi-access
network [Mestdagh 1995] where all incoming packets, in the form of trains
of light pulses, to the star are distributed to all nodes in the network by
splitting the light.
105
3URWRFRO2YHUYLHZDQG5HODWHG1HWZRUNV
&LUFXLW6ZLWFKHG1HWZRUNV
A number of protocols for WDM star networks have been proposed.
However, the area of real-time protocols for these networks is relatively
unexplored, with a few exceptions. The Rainbow network, described in
[Dono et al. 1990] [Janniello et al. 1992], and the 1DT-WDMA protocol,
described in [Humblet et al. 1993], support guaranteed bandwidth for circuit
switched and virtual circuit switched traffic, respectively. However, the
bandwidth utilization is reduced when there is no traffic on an established
connection, because the bandwidth cannot be reused by other nodes. The
I-TDMA [Sivalingam et al. 1992] and I-TDMA* [Bogineni et al. 1993]
protocols are other examples where guaranteed bandwidth cannot be
dynamically reused. These two protocols are multichannel extensions to
static uniform TDMA.
3DFNHW6ZLWFKHG5HDO7LPH:'01HWZRUNV
A real-time protocol for packet switched communication in WDM star
networks is described in [Yan et al. 1996B]. The QoS associated with a real-
time packet in the network is related to the probability of missing its
deadline. The protocol tries to globally minimize the number of packets not
managing to keep their QoS by adaptively changing the priority of queued
packets. Although dynamic real-time properties are supported, the matter of
the success or failure of a packet transmission depends on the global state of
the network, and transmission success can not be guaranteed in advance.
106
• A star network with support for virtual circuits, both with and
without (best effort) guaranteed bit-rate, but with the need of a
dedicated control channel and a central scheduling node and without
support for aperiodic traffic [Kam et al. 1998] [Kam et al. 1998B].
• A WDM bus network with support for real-time streams but not for
guarantees to separate messages [Cho et al. 1995].
• A network in which all nodes track already guaranteed but not sent
messages to be able to decide whether to guarantee new traffic.
However, there is no support for RTVCs or the like, and a control
channel with 0 mini slots per data slot, where 0 is the number of
nodes, is needed (can be difficult to implement owing to, e.g., clock
synchronization problems) [Chen and Georganas 1993].
$1HZ3URWRFROZLWK5HDO7LPH6XSSRUW
This chapter gives a proposal for a medium access protocol for packet
switched real-time communication in WDM star networks. The protocol,
which is time deterministic, is called TD-TWDMA and supports guaranteed
real-time services for both single destination, multicast, and broadcast
transmission. Slot reservation is also supported, while bandwidth is used
efficiently owing to a simple slot release method. The protocol uses both
time and wavelength division multiplexing and is targeted for distributed
real-time systems, especially very high performance systems, e.g., by
parallelism within the nodes. The deterministic properties of the protocol
are theoretically analyzed, while computer simulations show the
performance in a network with general traffic.
107
)LJXUH0XOWLSOHSDVVLYHRSWLFDOVWDUVWRSRORJ\
Other hierarchical WDM star networks include the wavelength flat (all
nodes share the same wavelength space) tree-of-stars network [Dowd et al.
1993], the tree-of-stars network (called LIGHTNING) that has wavelength
routing elements between each level [Dowd et al. 1996] and the multiple
star network where each node is directly connected to both a local star and a
remote star [Ganz and Gao 1992B]. The star-of-stars network proposed in
this thesis can be seen as a two-level tree-of-stars network.
108
λ 1 λ 2 λ Ν
)LJXUH 3DVVLYH RSWLFDO VWDU QHWZRUN ZKHUH WKH QRGHV DUH HTXLSSHG ZLWK
IL[HGWUDQVPLWWHUVDQGWXQDEOHUHFHLYHUV
3URWRFRODQG5HDO7LPH)HDWXUHV
In dynamic distributed real-time systems, messages may be classified into
two categories [Arvind et al. 1991]: best effort messages and guarantee
seeking messages. While best effort messages normally have soft deadlines,
such that the system need only try its best to meet the deadlines, guarantee
109
seeking messages have harder timing constraints. If the communication
system cannot guarantee the timing constraints of a guarantee seeking
message, the owner of the message should be aware of it immediately.
The main function of the TD-TWDMA protocol is to allocate time slots for
either guarantee seeking messages or best effort messages. The allocation is
done using a deterministic distributed slot allocation algorithm. The
algorithm temporarily changes the predetermined scheme according to the
current slot demands from each node. These slot demands are transmitted
in advance on the same channels as the data are transmitted on and they
contain information about which guaranteed slots should be kept and which
should be released. These types of WDM networks without a separate
control channel are denoted as QRQ FRQWURO FKDQQHO based networks.
Networks in which a separate control channel is used to reserve access to
the data channels are denoted as FRQWUROFKDQQHO based networks. Other non
control channel based networks than the one presented in this thesis are
found in [Dono et al. 1990] [Ganz and Koren 1991] [Ganz and Gao 1992],
while control channel based networks are found in [Bogineni and Dowd
1992] [Habbab et al. 1987] [Chen et al. 1990] [Chipalkatti et al. 1992].
FatMAC, presented in [Sivalingam and Dowd 1995], is another non control
channel protocol for WDM star networks where the access to the channels is
divided into a control phase and a data phase. However, FatMAC has no
support for real-time services.
3URWRFRO'HVFULSWLRQ
The network and protocol proposed will now be described in detail. The
notation used when describing the network is found in Table 10. This
notation is used in the next chapter as well. Because each transmitter in the
network has a specific wavelength, the number of wavelengths, &, hereafter
denoted as channels, equals the number of nodes, & = 0. The transmitter
110
0: Number of nodes in the network
&: Number of channels (wavelengths) in the network
PL: Designation of node L
%: Effective bandwidth of each channel in the network
γ: Slot length, including inter-slot gap
µ: Computing time of the distributed slot allocation algorithm
6: Total number of slots in a cycle
3: Number of cycles deadline guarantees can be forecast
.: Number of best effort multicast/broadcast packets to be sent in next cycle
VL: Designation of the L:th slot in a cycle
YLM: Index of the default high priority owner (transmitter) of slot L in the
receiver cycle in node M (YLM is an element in matrix 9 with size0×(0 − 1))
ZLM: Index of the default low priority owner (transmitter) of slot L in the receiver
cycle in node M(ZLM is an element in matrix : with size0×(0 − 1))
[LM: Element in the control slot matrix ; with size 0 × (0 − 1), sent from one
node, where status is given of slot L in the receiver cycle in node M
\L: Tells which queue to take a sub-message from for transmission in slot L (\L
is an element in matrix < with size0(0 − 1))
]L: Tells which channel to tune in for receiving in slot L (]L is an element in
matrix = with size0(0 − 1))
JLM: Number of high priority slots in the L:th next cycle, where M is the
corresponding queue (JLM is an element in matrix * with size(0 + 1)×3)
KL: Current number of sub-messages in each guarantee seeking queue (KL is an
element in matrix + with size0 + 1)
τ: Packet latency: delay from the moment a packet arrives at the transmitter
buffer until the moment the transmission begins
7DEOH1RWDWLRQVIRUGHVFULSWLRQVRIWKHQHWZRUNDUFKLWHFWXUHDQGSURWRFRO
and receiver parts of the transceiver are independent and can work
concurrently.
There are 20 queues in each transmitter, 0 queues for best effort messages
and 0 for guarantee seeking messages. For each of the two types of
messages, one queue is for broadcast and 0 − 1 queues are for single
destination messages (one for each of the other nodes). The broadcast
queues are used for both true broadcast messages and for multicast
messages (messages destined to more than one node but not to all). We
111
5HFHLYHU 'DWDVORWV
F\FOHV
1RGH 2 2 3 4 3 2 3 4 4 2 3 4
1RGH 1 3 3 3 4 4 3 4 1 1 3 4
1RGH 1 4 4 4 1 2 1 4 1 1 2 4
1RGH 1 2 1 1 2 2 3 2 1 2 3 3
⇓
'DWDVORWV
7UDQVPLWWHU 2 - 4 4 3 - 3 - 2 2 - -
F\FOHLQ 3 3 3
QRGH 4 4
)LJXUH7KHWUDQVPLWWHUF\FOHORZHUWDEOHLVILOOHGE\WDNLQJ
DOO RZQHG VORWV LQ WKH UHFHLYHU F\FOHV XSSHU WDEOH LQ DOO RWKHU
QRGHV1RWHWKDWDPXOWLFDVWLVSRVVLEOHLQ6ORWVDQG
define the size of an entry in the queues as that of a packet, i.e., a part of a
message corresponding to one slot.
Section 7.2.1 explains the function of the cycles, while the distributed slot
allocation algorithm is presented in Section 7.2.2.
7UDQVPLWWHUDQG5HFHLYHU&\FOHV
Because every transmitter has its own home channel, the only possible
conflict is when two or more nodes wish to transmit to the same node at the
same instant. To prevent conflicts, slots in the network are therefore
allocated in the receiver cycles, where each slot will have a specific owner.
This allocation is done by the distributed slot allocation algorithm.
One cycle is also running in each transmitter, but this is only to tell when
and to whom the node is allowed to transmit. The transmitter cycle reflects
the slots that the node owns in the receiver cycle of every other node
(exemplified in Figure 30). In the receiver cycles, shown in the upper table
in the figure, each slot in each receiver cycle can be assigned to only one
transmitter at the same time. The lower table shows how the slots of a
transmitter cycle are built up by copying its entries in the corresponding
slots in each receiver cycle.
Figure 31 shows how a UHFHLYHU cycle is partitioned into data slots and
control slots. Each node PL, 1 ≤ L ≤ 0, is assigned one of the 0 control slots,
where it broadcasts control information to all other nodes PM, 1 ≤ M≤ 0 and
112
)LJXUH$UHFHLYHUF\FOHLVSDUWLWLRQHGLQWRGDWDVORWV
DQGFRQWUROVORWV
To reduce the latency, the control slots are placed as late as possible in the
cycle. However, there must be time for the distributed slot-allocation
algorithm to be calculated before beginning the next cycle.
'LVWULEXWHG6ORW$OORFDWLRQ$OJRULWKP
The TD-TWDMA protocol consists of three steps:
1. Each node transmits a control slot
2. Each node separately runs the distributed slot allocation algorithm
3. The nodes transmit and receive data slots
The ways in which the contents of the control slots is calculated and
incoming messages and buffers are handled partly determine the real-time
services and are described in Section 7.3. When describing the distributed
slot allocation algorithm, it is assumed for simplicity that all broadcasted
control slots are received before the algorithm is started. As described in
Section 7.3, however, this is not a requirement in a real implementation.
113
5HFHLYHUV 'DWDVORWV
3ULRULW\
1: High - 2 3 4 - 2 3 4 - 2 3 4
/RZ
2: High 1 - 3 4 1 - 3 4 1 - 3 4
/RZ
3: High 1 2 - 4 1 2 - 4 1 2 - 4
/RZ
4: High 1 2 3 - 1 2 3 - 1 2 3 -
/RZ
)LJXUH$OORFDWLRQVFKHPHIRUWKHUHFHLYHUF\FOHVLQDIRXU
QRGHV\VWHP
The total number of slots in a cycle is set to 6 = 02, and the number of data
slots is 0(0− 1). Each pair of rows represents one receiver cycle, where
each number is the index of the transmitter that owns the corresponding
slot. The high priority row is the default scheme, but if the high priority slot
owner does not need the slot it is temporarily released for the current cycle,
i.e., the cycle where the data should have been sent. This is done by a
release message (described below) contained in the owner’s control slot. The
low priority owner will then get the slot. If neither the high priority nor the
low priority owner needs the slot, it will be unused. This is the cost of
having a VLPSOH algorithm. However, compared to a static TDMA system,
HIILFLHQW bandwidth utilization is achieved with the slot release method.
for 1 ≤ L ≤ 0(0 − 1), 1 ≤ M ≤ 0 and L ≠ M, while the low priority owners are
determined by
The high priority slots are coordinated to allow for broadcasts but, as
described in Section 7.3.4, this is not always the case.
114
The release message mentioned above is actually a specified value in the
control slot matrix, ;, that each node transmits in the control slot. The
matrix has the same size as matrix 9, and each element in the matrix gives
information about the corresponding high priority slot. All elements [LM,
1 ≤ L ≤ 6, 1 ≤ M ≤ 0, in the matrix are set to zero except for those
corresponding to high priority slots belonging to the node, which should not
be released. Those elements are instead set to one. This means that a zero
in the position of a high priority slot belonging to the node will release the
slot.
When all the ; matrices (one from each node) have been gathered by a node,
two new matrices are composed, < and =. Each element in < is used to
determine what queue to take a packet from, for transmission in a slot,
while an element in = determines what channel to tune in at the beginning
of a slot. The two matrices have elements corresponding to both data slots
and control slots during one cycle but, for clearness, the control slot
elements are not treated here. The way the elements in < are set and used
to determine the current queue is described in Section 7.3. When the
distributed slot allocation algorithm is run, an element states the kind of
slot and is defined as:
for slot L where 1 ≤ L ≤ 0(0 − 1). Each element is set using the following
algorithm for node PN:
refers to the element [LO = 1 in matrix ; from node PN, i.e., a high
priority single destination slot belonging to the node
115
Each element ]L, 1 ≤ L ≤ 0(0 − 1), in matrix = is set using the following
algorithm for node PN:
Since each node can independently perform the computations of the slot
allocation scheme, it is called a distributed algorithm. No extra latency is
required to return the result of the algorithm, which had been the case if,
e.g., a master had calculated it.
5HDO7LPH6HUYLFHV
In the description of how guarantee seeking and best effort messages are
passed through the transmitter to obtain real-time services, the following
parts will be explained:
Points 1 and 2 will first be explained. In 7.3.2, the third and fourth points
are discussed, and the last point is described in 7.3.3. Finally, slot reserving
and RTVCs are introduced in 7.3.4.
$UULYLQJ0HVVDJHV
The ability to give deadline guarantees for a message relies on knowing
when there are guaranteed slots in the forthcoming cycles. Element JLM,
1 ≤ L ≤ 3, 1 ≤ M ≤ 01, in matrix * holds the number of high priority slots
belonging to the node in the L:th cycle next to the currently running one,
where M = 1 for broadcast slots and 2 ≤ M ≤ 01 for single destination slots
to node PM-1. 3 is chosen at system design and tells us how far in the future,
relative to the current time, deadlines can be guaranteed. Broadcast and
multicast do not have to be treated separately here since the protocol does
not allow the default scheme to be changed (by reservation) to have high
priority multicast to less than all other nodes.
By scanning JLM, starting with L = 1 and with M set to the actual type of
arriving message, we can see whether there are enough high priority slots,
116
before deadline, to guarantee that a message will be sent in time. If not, the
message will be rejected immediately and the owner will have time to
handle the situation. If, instead, a guarantee can be given, each element in
* corresponding to required slots is decremented by the number of required
slots for that element, i.e., the sum of all decrements equals the number of
packets in the message. In the case of a single destination message with too
few available slots, the broadcast elements are also scanned. Packets are
always put in the guarantee seeking queue that corresponds to the element
in * that was decremented in the order of transmission to facilitate
reassembling the message.
Best effort messages may be transmitted in order, according to, e.g., the
EDF (Earliest Deadline First)algorithm, but in the scope of this thesis they
are assumed not to have deadlines specified and to be transmitted in order
of arrival. An arriving best effort message is hence simply put in the correct
best-effort destination-queue. Multicast messages are put in the broadcast
queue.
&RQWURO6ORWV
When it is time to send a control slot, the following algorithm is executed in
node PN, 1 ≤ N ≤ 0:
if J11 > 0
. = MIN(L J11), where L is the number of packets in the best effort
broadcast queue
endif
count the packets in each guarantee seeking queue to set KL, 1 ≤ L ≤ 0 + 1,
L≠N+1
[LM = 0, 1 ≤ L ≤ 0, 1 ≤ M ≤ 0(0 − 1)
for M = 1 to 0, M ≠ N
scan [LM starting with L = 1 and set [LM = 1 in KM1 positions (if possible),
where the corresponding element in 9 isYLM = N and YLO ≠ N, 1 ≤ O ≤ 0,
O≠M
endfor
do the same scan of [LM for broadcast high priority slots, first for the K1
guarantee seeking packets and then for the . best effort packets, setting
[LM = 1 for each corresponding destination that is set in the address field
of a multicast (all destinations for broadcast), i.e., releasing slots to
destinations not specified
send control slot
117
JLM = JL1M, 1 ≤ L ≤ 3 − 1, 1 ≤ M ≤ 0 + 1
JLM, L = 3, 1 ≤ M ≤ 0 + 1, are set to the corresponding number of default
(after reservation) high priority slots for each corresponding kind of
destination
When all control slots have been received, the elements in < are calculated
as described in the Section 7.2.2. However, each element is modified to state
the queue from which a packet should be taken when transmitting in the
corresponding slot, and is, for Slot L, defined as:
'DWD6ORWV
At the beginning of each data slot, \L is used to determine the queue from
which to take a packet. By the definition of the protocol, there will always
be a packet in the addressed queue when it is a guarantee seeking queue or
the best effort broadcast queue. However, the best effort single destination
queues may be empty. In that case, an empty message is sent, telling the
receiver only that there was nothing to send. A packet generated
immediately before a best effort single destination queue is checked will be
sent in that slot. The minimum latency for best effort single destination
messages is therefore zero.
6ORW5HVHUYLQJ
A node can reserve slots to increase its guaranteed bandwidth. A maximum
of 0(0 − 2) slots per cycle, Slots 5 to 12 in the example in Figure 32, are
allowed to be reserved. Slots VL, 1 ≤ L ≤ 0, are not allowed to be reserved and
slots VL, 02 − 0 + 1 ≤ L ≤ 02, are control slots. When reserving slots, the
118
corresponding high priority entry (element in matrix 9) in the receiver
cycle, or cycles if broadcast is used, are exchanged with the index of the
reserving node. To reduce complexity in the transmitters, the corresponding
slots in all receivers must be reserved (i.e., for broadcast) if the slots are to
be used for multicast messages.
Slot reservation can also be used for the allocation of RTVCs where an
RTVC, with its reserved slots, is typically dedicated to a specific application
task. An RTVC then has a specified worst-case latency and a guaranteed
bandwidth (see Section 7.5). The allocation of RTVCs can also be used as a
service-primitive when reserving slots to increase the bandwidth for
guarantee seeking traffic. Each such RTVC is then dedicated to be used for
guarantee seeking messages generated by the task owning the RTVC. The
mechanism for guarantee seeking traffic described above must then be
replicated for each RTVC but is somewhat simpler because only slots
reserved for the RTVC in question must be considered.
,PSOHPHQWDWLRQ$VSHFWV
&ORFN6\QFKURQL]DWLRQ
The TDMA method used requires synchronized nodes. Harder
synchronization will reduce the necessary gap between the time slots. A
method to achieve harder synchronization in TDMA star networks is to
account for the node-to-star propagation delay so that all packets
transmitted in the same slot pass the star simultaneously [Jonsson et al.
1995] [Bengtsson et al. 1994]. A discussion of timing and dispersion in WDM
star networks can be found in [Semaan and Humblet 1993]. In this thesis,
the interconnection distances are assumed to be short, and the propagation
delay is hence neglected.
&ORFN5HFRYHU\
A method to reduce the effect of clock-recovery and tuning latencies in the
receivers will increase performance. By duplicating the optoelectronic and
clock-recovery parts of the receiver, the time for clock-recovery and
wavelength tuning can even be eliminated. This is done by locking one
clock-recovery circuit to the currently used channel while the other one
recovers bit synchronization for the channel that will be used in the next
119
slot. In this way, the tuning time of the receiver only needs to be shorter
than the duration of one slot, minus the clock-recovery time. For this
reason, we assume that the tuning latency in the receivers can be neglected.
&RPSXWDWLRQDO&RPSOH[LW\
For each slot, the outcome of the distributed slot allocation algorithm
depends only on the control slot information from the corresponding high
priority owner. Therefore, the algorithm can start as soon as the first control
slot is received. A gap of only one data slot, between the control slots and
the beginning of the next cycle, to finish the computation is hence assumed.
The small gap positively affects the latency. Only table indexing is then
used in the data slots, in transmitters to choose between buffered messages
and in receivers to choose channel tuning.
The protocol and its real-time services may seem rather computationally
intensive because, for the computation of each slot allocation, operations are
done on every element in a matrix column or in a whole matrix. In a real
implementation, however, the searches and scans of matrix 9 and : can be
exchanged for simpler operations. The reason is that these matrices do not
change for each cycle and, e.g., a matrix search must only be done when the
default allocation scheme is changed. Similar reasoning can be used
concerning matrix *, because only a few elements are changed in each cycle.
When all ; matrices are received, they can be composed into one ; matrix
to save memory space by doing an element wise OR. Since all elements in X
are binary, the composition can be accelerated using bit-parallel operations.
Each element in ; corresponds to the element with the same indexes in
matrix 9 (and :) and, even if the elements in ; change values in each cycle,
their relation to 9 (and :) does not change. Hence, similar reasoning as is
used with 9 and : to reduce the computational complexity can be used here
as well. The counting of packets in the queues can be eliminated by
continuously updating counters when packets are inserted or removed.
(OHFWURQLF6WDUV
Electronic stars are also consistent with the network architecture; these
give a network which can be implemented with cheap components today or
in the near future depending on size and bandwidth (see Figure 33)
[Jonsson et al. 1995]. The switching circuit should always be able to
simultaneously assign a path to an arbitrary output channel from every
single incoming channel (true crossbar). The switch should also be able to
map one input channel to many output channels, multicasting, or one input
to all outputs, broadcasting (these features are automatically implemented
with a true crossbar switch). As shown in the figure, the control unit of the
star is connected to the switch in the same way as the other channels are.
By this connection, the control unit also receives control packets and can,
120
)LJXUH(OHFWURQLFVWDU
with the information in these packets, configure the star according to the
TD-TWDMA protocol. The electronic star simulates the passive optical star
in a way almost invisible to the nodes. Therefore, the nodes still run the
distributed slot allocation algorithm.
Although the electronic star brings additional cost in, e.g., optoelectronic
transceivers in the star, the point-to-point links have a positive effect. The
difference as compared to the passive optical star is the simplified clock-
recovery because bit synchronization can be maintained by continuously
sending information, either data or synchronization patterns, over the serial
link (see, e.g., [TriQuint 1992]). This can be done due to the fact that each
fiber is actually a point-to-point connection and not a shared medium.
'HWHUPLQLVWLF3HUIRUPDQFH
The real-time services rely on time-deterministic communication. The
deterministic bandwidth and the worst-case packet latency for this
guaranteed bandwidth is analyzed below. As stated above, the analysis
treats intra-cluster communication.
When a specific node is the only transmitter in the network, the bandwidth
it receives is %(0 − 1) ⁄ 0, where % is the channel bandwidth. This gives 87,
94, and 97 percent of the channel bandwidth for 8, 16, and 32 nodes,
respectively. This bandwidth is then divided into %(0 − 1) ⁄ 02, which can
121
)LJXUH/DWHQF\FDOFXODWLRQ
which consists of: (i) the worst-case delay before the own control slot
appears, i.e., one cycle as shown in the figure, (ii) the delay from the control
slot to the first owned data slot. By fine grain interleaving of the slots,
where each node is assured to have one of the first 0 slots, we can minimize
the second part of the latency so that node PL, 1 ≤ L ≤ 0, will never have a
slot later than slot VL as its first slot in the cycle. As explained above, µ can
be assumed to be the duration of one slot. This gives, for a network with a
122
slot duration of 1 µs, a worst-case latency of 73 µs, 0.27 ms, and 1.1 ms for
8, 16, and 32 nodes, respectively. Note that the actual latency may be
significantly lower, especially when many low priority slots are available
since those slots can be used without first waiting for control slot
transmission.
if VM is the first slot in the cycle reserved for the RTVC. The bandwidth of an
RTVC is 1%⁄ 02, where 1 is the number of slots reserved for the RTVC.
&DVH6WXG\
The signal processing system shown in Figure 11 on page 77 was chosen as
a case study, where the bandwidth demands on the inter-module
communication are also given. Although the data flow has a very simple
structure, there can exist control messages in other directions than the data
flow. Also, the algorithms, and hence the data flow, can be changed when
switching to a different working mode. A general network is therefore
required. For simplicity, the maximum data flow from one module is
assumed to be 6.0 Gbit/s, excluding, e.g., error checking codes. The latency
budget for the whole chain is 100 ms, in which 10 ms is included for inter-
module communication. With the ten pipeline stages (some of the stages
include several nodes) shown in the figure, including the antenna, the
maximum allowed latency per link is τPD[ = 1.0 ms.
123
and the next cycle, i.e., µ = γ, and a worst-case latency of τPD[ = 1.0 ms, 0max
= 30 is found to be the maximum number of nodes. The value of 0max can be
increased by decreasing γ, since the number of bits in one slot is rather high.
At an effective bandwidth of 6.0 Gbit/s, there will be 6000 bits in each slot,
which can be compared to ATM with its 424-bit cells. Clustering the system
into a star-of-stars topology can also increase 0max [Jonsson et al. 1996].
6LPXODWLRQ5HVXOWV
In addition to the worst-case analysis (deterministic performance), the
average performance of the network was analyzed through computer
simulations for general traffic. Single-star networks with 8, 16, and 32
nodes were simulated. Other assumptions were:
• A gap of one data slot between the last control slot and the next cycle,
i.e., µ = γ. To give an example of a real system, the slot duration was set
to γ = 1 µs. This corresponds to a 1 kbit packet if the data rate is 1 Gbit/s.
• All guarantee seeking messages have a deadline at 5 ms from the
moment of generation.
• Uniform traffic was assumed, i.e., all nodes had equal probability of
message generation and uniformly distributed destination addresses.
Messages were generated according to a Poisson process, and all
messages were of single destination type.
• Message lengths were exponentially distributed between one and ten
slots (discrete number of slots), with a length of one slot as the highest
probability.
• For simplicity, the propagation delay was neglected and no slot
reservation was done.
• Infinite queue lengths were assumed.
• Packet generation rate is the message generation rate through the mean
message length.
• Latency is defined as the time elapsed from the moment of arrival of a
message until the last packet of the message leaves the transmitter.
First, the deadline missing percentage (percentage of the messages that are
not accepted for transmission) versus packet generation rate of guarantee
seeking messages is plotted in Figure 35. At moderate traffic intensities, no
messages miss their deadlines for the given assumptions. The deterministic
bandwidth fraction is (0 − 1) ⁄ 02. This gives 10.9, 5.9, and 3.0 percent for 8,
16, and 32 nodes, respectively. As shown in the figure, guarantee seeking
messages begin to be partly rejected around these values. The plot is
124
100
90 o 8 nodes
+ 16 nodes
80
Deadline missing percentage
* 32 nodes
70
60
50
40
30
20
10
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Guarantee-seeking packet generation rate
)LJXUH5HDOWLPHSHUIRUPDQFHSORWWHGDVIUDFWLRQRIPHVVDJHVWKDWPLVV
WKHLUGHDGOLQHVYHUVXVWUDIILFLQWHQVLW\
In Figure 36, the mean latency for guarantee seeking messages is plotted
against packet generation rate. Again, the performance is independent of
the amount of best effort traffic. Because messages that can not be
guaranteed to meet their deadlines are discarded, the guarantee seeking
latency is upper bounded. The latency is rather uniform at low traffic
intensities, but starts to grow earlier for larger networks because of the
lower fraction of deterministic bandwidth.
The plot in Figure 37 shows the latency for best effort messages versus total
packet generation rate (guarantee seeking plus best effort). The traffic
consisted of 10 % guarantee seeking messages and 90 % best effort
messages. The fraction of the bandwidth available for data packets is
(0 − 1)⁄0, which gives 87.5, 93.8, and 96.9 percent for 8, 16, and 32 nodes
respectively. However, when the guarantee seeking traffic becomes
saturated, some of the total number of generated packets are discarded
guarantee seeking messages. Below saturation, the best effort latency is
almost uniform.
125
5
4.5
Mean guarantee-seeking latency [ms]
4
3.5
2.5
1.5
o 8 nodes
1
+ 16 nodes
0.5 * 32 nodes
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Guarantee-seeking packet generation rate
)LJXUH /DWHQF\ IRU JXDUDQWHH VHHNLQJ PHVVDJHV SORWWHG YHUVXV WUDIILF
LQWHQVLW\
0.9 o 8 nodes
+ 16 nodes
0.8
Mean best effort latency [ms]
* 32 nodes
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Total packet generation rate
)LJXUH 0HDQ ODWHQF\ IRU EHVWHIIRUW PHVVDJHV LQ D QHWZRUN ZLWK
JXDUDQWHHVHHNLQJDQGEHVWHIIRUWPHVVDJHV
126
1
0.9 o 8 nodes
+ 16 nodes
0.8
Mean best effort latency [ms]
* 32 nodes
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100
Bandwidth utilization
)LJXUH 0HDQ ODWHQF\ IRU EHVWHIIRUW PHVVDJHV SORWWHG YHUVXV EDQGZLGWK
XWLOL]DWLRQ 7KH WUDIILF FRQVLVWHG RI JXDUDQWHH VHHNLQJ DQG EHVW
HIIRUWPHVVDJHV
The mean best effort latency is plotted again in Figure 38, this time versus
the bandwidth utilization. The same ratio between guarantee seeking
messages and best effort messages was used. The plot looks very similar to
the previous best effort latency plot in Figure 37. This is a consequence of
the almost linear relationship between packet generation rate and
bandwidth utilization below saturation.
6XPPDU\
A medium access protocol for WDM star networks in distributed real-time
computer systems has been proposed. To obtain scalability, the network
architecture can be extended to a star-of-stars configuration. At the same
time as dynamic real-time properties are supported, the protocol gives
almost uniform latency nearly up to the theoretical saturation point.
Deadline guarantees are supported, where the underlying deterministic
bandwidth can be changed dynamically through slot reserving. An efficient
bandwidth utilization is achieved by means of a simple slot release method.
127
:'06WDURI6WDUV1HWZRUN
By using electronic gateway nodes we retain the popular WDM star network
architecture in each cluster, for which cheap components can be expected to
appear in the future [Jonsson and Svensson 1997]. With electronic gateway
nodes, we also achieve wavelength reuse in each cluster.
The same MAC protocol, TD-TWDMA (see the previous chapter) [Jonsson et
al. 1996] is used in every cluster and in the backbone. Real-time services are
implemented for inter-cluster communication in a similar way as for intra-
cluster communication. However, the larger number of nodes sharing the
backbone slots must be considered when calculating the deterministic
performance. We also present a new method for clock synchronization to
reduce the worst-case latency in this kind of multi-cluster network using
TDM. The same notation as was used in the previous chapter (see Table 10
on page 111) is used when describing the multi-cluster extension of the
network, but with the additional notation found in Table 11.
1HWZRUN$UFKLWHFWXUHDQG3URWRFRO
Although the same MAC protocol is used separately in each cluster, the
clusters can be coordinated to improve performance and to get time-
deterministic communication for inter-cluster communication as well.
129
)LJXUH *DWHZD\ QRGH ZLWK KLJKHU FKDQQHO
EDQGZLGWKRQWKHEDFNERQHVLGHULJKWVLGH
The cycle length in the backbone is always the same as that in the clusters,
both when measured in time and when measured in number of slots.
130
)LJXUH(DFKEDFNERQHVORWLVGLYLGHGLQWR5VXEVORWVZLWK
WKHVDPHSDLURIJDWHZD\QRGHVDVVRXUFHDQGGHVWLQDWLRQ
'HWHUPLQLVWLF3HUIRUPDQFH
Deterministic performance is important, e.g., for the ability to give
guarantees for guarantee seeking messages. The worst-case latency for
inter-cluster communication between two end-nodes is analyzed for two
cases: (i) full slot reservation by other nodes and (ii) no slot reservation.
When reservation is considered, full reservation is assumed in all clusters
and in the backbone by RWKHUQRGHV than the analyzed transmitting node. In
both cases, the network size, 0total, is assumed to be the total number of
nodes in the network as seen by the MAC protocol. Also, it is assumed that
each cluster has the same number of nodes as the number of clusters in the
network:
/
0 WRWDO = ∑ 0 L = /2 (8)
L =1
The gap between the last control slot and the next cycle is assumed to be
one data slot, i.e., µ = γ. We can then get the worst-case latency of a single
cluster from Equation 5:
131
The guaranteed minimal bandwidth per source node, proportional to the
number of high priority slots excluding reserved slots, is also analyzed. It is
assumed in the analysis that no manipulation is made of the allocation
scheme in order to utilize slots not having any function. These slots are
those which are allocated for traffic between two transceiver modules in the
same gateway node.
The second transit is between the gateway nodes of the source cluster and
the destination cluster, through the backbone. Here, slots from all 4 = / − 5
ordinary nodes in the source cluster must, in the worst-case, be multiplexed
over several high priority backbone slots. At full slot reservation, a node has
only one high priority slot per cycle in which to transmit. Therefore, / − 5 −
1 extra cycles, in addition to the normal intra-cluster latency (Equation 9),
are needed. The source gateway node is responsible for carrying out this
multiplexing using the round robin scheduling strategy. Hence, the source
end-node is guaranteed its part of the bandwidth. The worst-case latency for
the second transit is
In the destination cluster, for transfer from the gateway-node to the end-
node, slots from all (/ − 5)(/ − 1) ordinary nodes outside the cluster must, in
the worst-case, be multiplexed. The latency decreases when 5 > 1, because
the multiple transceiver modules, in the gateway node, work in parallel:
(/ − 5 )(/ − 1) 2
τ = γ 7UXQF + 1 / + / + 1 (12)
5
τ PD[ = τ + τ + τ =
(/ − 5 )(/ − 1) (13)
γ + 2 + / − 5 /2 + 3/ + 3
5
132
10
9 o R=1
+ R=2
8
* R=4
7
Latency [ms]
0
0 20 40 60 80 100
Number of nodes
)LJXUH :RUVWFDVH ODWHQF\ ZKHQ IXOO VORW UHVHUYDWLRQ E\ RWKHU QRGHV LV
DVVXPHG 7KH [D[LV UHSUHVHQWV WKH QXPEHU RI RUGLQDU\ QRGHV DQG WKH VORW
OHQJWKLVDVVXPHGWREHγ µV
and is plotted in Figure 41, in which the horizontal axis represents the total
number, /(/ − 5), of ordinary nodes in the network and each curve
represents a specific value of 5. To give an example of a real system, the slot
length is assumed to be γ = 1.0 µs in all latency plots. Figure 41 indicates
how latency decreases with 5.
/−1≥/−5 (14)
133
1.5
o R=1
+ R=2
* R=4
1
Latency [ms]
0.5
0
0 20 40 60 80 100
Number of nodes
)LJXUH:RUVWFDVHODWHQF\ZKHQQRVORWUHVHUYDWLRQLVDVVXPHG7KH[D[LV
UHSUHVHQWVWKHQXPEHURIRUGLQDU\QRGHVDQGWKHVORWOHQJWKLVDVVXPHGWREH
γ µV
In the third transit, to the destination end-node, the gateway node can
multiplex 5(/ − 1) slots per cycle time. The latency is
/
τ = γ /2 + / + 1 (15)
5
/
τ PD[ = τ + τ + τ = γ + 2 /2 + 3/ + 3 (16)
5
Figure 42 shows how the worst-case latency when no slot reservation is used
varies with the total number of ordinary nodes. The latency is significantly
lower as compared to the case of full-reservation. This effect is related to the
higher number of slots in the gateway nodes that can be used for
multiplexing incoming messages.
134
)XOOUHVHUYDWLRQ 1RUHVHUYDWLRQ
)URPVRXUFHHQGQRGHWRJDWHZD\QRGH
1 / −1
/2 /2
7KURXJKEDFNERQH
5 5 (/ − 1)
/ (/ − 5 )
2
/2 (/ − 5 )
)URPJDWHZD\QRGHWRGHVWLQDWLRQHQGQRGH
5 5
/ (/ − 5 )(/ − 1)
2
/ (/ − 5 )
2
The real-time services offered by the MAC layer rely on the deterministic
latency and bandwidth. A guarantee can be stated if the known minimum
number of high priority slots along the whole path through the network is
sufficient to transfer the message in time. In the calculation of this
justification, the deterministic latency, the deterministic bandwidth, and
the deadline of the message are used. A slot of a guaranteed message is
tagged to indicate its priority over other slots. The gateway nodes then
always transmit the tagged messages before buffered best effort messages.
&ORFN6\QFKURQL]DWLRQ$VSHFWV
The nodes are synchronized to account for the propagation delay between
transmitting and receiving nodes, i.e., receivers are synchronized to
135
0.01
0.009 o R=1
+ R=2
0.008
* R=4
0.007
0.006
Bandwidth
0.005
0.004
0.003
0.002
0.001
0
0 20 40 60 80 100
Number of nodes
)LJXUH 1RGH EDQGZLGWK ZKHQ IXOO VORW UHVHUYDWLRQ E\ RWKHU QRGHV LV
DVVXPHGLQQXPEHURIKLJKSULRULW\VORWVSHUWRWDOQXPEHURIVORWVLQDF\FOH
7KH[D[LVUHSUHVHQWVWKHQXPEHURIRUGLQDU\QRGHV
transmitters, proportional to the propagation delays, and thus slots are first
expected when they have traveled through the network [Jonsson et al. 1995]
[Bengtsson et al. 1994]. This type of synchronization is always used inside a
cluster.
The midpoint of the backbone star is used as the reference point. Each
cluster is then synchronized to the backbone by its gateway node. In a
gateway node, the receiver on the cluster side is synchronized so tightly to
the transmitter on the backbone side that incoming messages are directly
forwarded when possible. In this way, the information extracted from an
incoming control slot (about next hop, i.e., the gateway node of the
136
0.03
o R=1
0.025 + R=2
* R=4
0.02
Bandwidth
0.015
0.01
0.005
0
0 20 40 60 80 100
Number of nodes
)LJXUH1RGHEDQGZLGWKZKHQQRVORWUHVHUYDWLRQLVDVVXPHGLQQXPEHU
RIKLJKSULRULW\VORWVSHUWRWDOQXPEHURIVORWVLQDF\FOH7KH[D[LVUHSUHVHQWV
WKHQXPEHURIRUGLQDU\QRGHV
The improved worst-case latency (Equation 13) when full slot reservation is
assumed is
(/ − 5 )(/ − 1)
τ PD[ = γ + 1 + / − 5 /2 + 2/ + 2 (17)
5
137
)LJXUH :LWK WKH V\QFKURQL]DWLRQ VFKHPH XVHG
LQFRPLQJ WUDIILF WR D JDWHZD\ QRGH FDQ EH IRUZDUGHG
LPPHGLDWHO\ H[FHSW IRU WKH FDVH RI LQWHUQDO GHOD\ LQ WKH
JDWHZD\QRGH
/
τ PD[ = γ + 1/2 + 2/ + 2 (18)
5
138
1.5
+ No synch., R = 1
o Synch., R = 1
x No synch., R = 4
1 * Synch., R = 4
Latency [ms]
0.5
0
0 20 40 60 80 100
Number of nodes
)LJXUH:RUVWFDVHODWHQF\FRPSDULVRQEHWZHHQXVDJHDQGQRXVDJHRIWKH
SURSRVHGV\QFKURQL]DWLRQVFKHPH1RVORWUHVHUYDWLRQLVDVVXPHG7KH[D[LV
UHSUHVHQWVWKHQXPEHURIRUGLQDU\QRGHVDQGWKHVORWOHQJWKLVDVVXPHGWREH
γ µV
6XPPDU\
We have shown how to calculate the worst-case latency for inter-cluster
communication in a WDM star network using the TD-TWDMA protocol. The
analysis shows how the latency decreases for larger networks when the
ratio between the backbone bandwidth and the cluster bandwidth increases.
A calculation of the minimum deterministic bandwidth obtained in the case
in which no reservation is used by the analyzed node was presented, and a
synchronization scheme was proposed. At reserved traffic or low traffic, the
improvement to the worst-case latency when using this synchronization
scheme is 33 percent, as compared to the case in which the clusters are not
synchronized with one another.
139
&RQWURO&KDQQHO%DVHG)LEHU5LEERQ3LSHOLQH
5LQJ1HWZRUN
This chapter and the next chapter presents two ring networks suitable for
different situations, both based on fiber-ribbon links. The proposed networks
are pipeline ring networks based on optical fiber-ribbon point-to-point links.
In a pipeline ring network, several packets can be traveling through the
network simultaneously, thus achieving an aggregated throughput higher
than the capacity of a single link. Motorola OPTOBUS bi-directional links
[Schwartz et al. 1996] with ten fibers per direction are used (similar links
can be used, too) but the links are arranged in a unidirectional ring
architecture (see Figure 47) where only 0 / 2 bi-directional links are
needed to close a ring of 0 nodes. Fiber-ribbon links offering an aggregated
bandwidth of several Gbit/s have already reached the market [Bursky 1994].
The increasingly good price/performance ratio for fiber-ribbon links
indicates a great potential for success of the proposed kind of networks.
141
is given in Section 9.2. The CC-FPR protocol is presented in Section 9.3.
Section 9.4 describes different user services. Section 9.5 discusses
implementation aspects, and a case study is presented in Section 9.6 to
show the efficiency of the networks. This is followed by a performance
analysis and a summary in Sections 9.7 and 9.8, respectively.
1HWZRUN2YHUYLHZ
The first network (described in this chapter) is called CC-FPR [Jonsson
1998] [Jonsson 1998B] and has special features for both parallel processing
in general and for distributed real-time systems [Jonsson et al. 1999]
[Jonsson et al. 1999B]. The physical ring network is divided into two rings:
a data ring and a control ring. In each fiber-ribbon link, eight fibers carry
data and one fiber is used to clock the data, byte by byte. Together, these
fibers form a data channel that carries data packets. The access is divided
into slots as in an ordinary TDMA network. The tenth fiber is dedicated to
bit-serial transmission of control packets that are used for the arbitration of
data transmission in each slot. The clock signal on the dedicated clock fiber,
which is used to clock data, also clocks each bit in the control packets.
Separating clock and control fibers simplifies the transceiver hardware
implementation, which is verified by the current prototype development.
The control channel is also used for the implementation of low level support
for barrier synchronization, global reduction, and reliable transmission.
142
5HODWHG1HWZRUNV
Two kinds of spatial reuse in ring networks can be identified. The intention
of the first type is to overcome performance degradation owing to long
propagation delays around the ring as compared to the frame length.
Performance is increased if another node (or nodes) can start transmission
before the current transmission is finished. One example is early token
release in the 16 Mbit/s token ring network, where the token is released
when the last bit of the frame has left the sending node. When early token
release is QRW used, the sending node does not release the token until the
frame has traveled around the ring and is completely removed from the
ring. Another example is slotted ring (e.g., the Cambridge network [Greaves
and Zielinski 1993] [Hopper and Needham 1988]) where several time slots,
in which each can host a frame, travel around the ring at the same time.
The first type of spatial reuse increases the utilization, but it cannot exceed
1 as long as data frames are not removed for reuse of the bandwidth for new
frames. This states the definition of the second type of spatial reuse. For
example, suppose the frame contained in a slot in a slotted ring is removed
by the destination node [Adams 1994]. The destination node, or another
node downstream, can then reuse the slot. If the distribution of source and
destination nodes is totally uniform, the average utilization can
theoretically reach 2. This means that the average source to destination
distance is half the ring, which gives an average of two transmitted frames
per time unit. The term "pipeline ring network" is used in this thesis to
denote a network with spatial reuse of the second type. It should be noted
that spatial reuse of the bandwidth can be implemented in dual uni-
directional bus networks as well [Garrett and Li 1991] [Ray and Jiang 1995]
[Ray and Mukherjee 1995].
Other pipeline ring networks are described in [Chen et al. 1991] [Ofek 1994]
(MetaRing), [Imai et al. 1994] (ATMR), [Wong and Yum 1994], [Jafari et al.
1980], and [Xu and Herzog 1988] and further references are given in [Wong
and Yum 1994]. Advantages of the CC-FPR network over these other
networks include the use of high bandwidth fiber-ribbon links and the close
relation between a dedicated control channel and a data channel without
disturbing the flow of data packets. In other words, control and data are
overlapped in time. With less header overhead in the data packets the slot
length can be shortened to reduce latency without too great a sacrifice of
bandwidth utilization. The separate clock and control fibers also simplify
the transceiver hardware implementation. Another fiber-ribbon ring
network is the USC PONI (formerly called POLO) network, proposed to be
used in multimedia applications [Sano and Levi 1998] [Raghavan et al.
1999]. However, spatial reuse of bandwidth is not a function of the USC
PONI, i.e., it is not a pipeline ring network.
143
Cycle 1 Cycle 1+1
Slot
P1 P2 P3 P0 P1 P2 P3 P0
7LPH
A network with a similar slot reuse mechanism for reserved slot as for the
CC-FPR network (see Section 9.4.1) has been reported [Marsan et al. 1997].
However, the network does not support concurrent transmissions in
different segments of a single ring channel.
Other high performance ring networks include the WDM passive ring
[Irshid and Kavehrad 1992] and the hierarchical WDM ring [Louri and
Gupta 1997], which are more closely related to the WDM star network and
star-of-stars network described in Chapters 7 and 8, respectively.
7KH&&)353URWRFRO
Throughout Section 9.3, slot reserving, as described in Section 9.4.1, is
assumed not to be used. Before we explain the arbitration mechanism, we
will describe how data packets travel on the ring.
The access to the network is cyclic; each cycle consists of 0 time slots, where
0 is the number of nodes. Each node is denoted PL, 1 ≤ L ≤ 0. Each slot
always has one node that is responsible for initiating the traffic around the
ring. This node is called the slot initiator (SI). Each node is slot initiator in
one slot per cycle, as shown in Figure 48. At the end of the slot, the role of
slot initiator is asynchronously handed over to the next node downstream.
This can be done implicitly simply by sensing the end of the slot, i.e., the
last bit.
144
Node 1 Node 2
Node 6 Node 3
Node 5 Node 4
The CC-FPR medium access protocol is based on the use of a control packet
that, for each slot, travels almost one round (over 0− 1 links) on the control
channel ring, as shown in Figure 49. The node that will be the slot initiator
in the next slot initiates the transmission of the control packet, as shown in
the figure. We denote this node SI+1. In the time domain, the control packet
always travels around the ring in the time slot preceding the one for which
it controls the arbitration (see Figure 50). Accordingly, the control packet
always passes each node one time slot before the data packet to which it is
related.
The contents of the control packet are shown in Figure 51. The control
packet consists of a start bit followed by an 0-bit long link reservation field
and an 0 bit long destination field, where 0 is the number of nodes. Each
bit in the link-reservation field states whether the corresponding link is
reserved for transmission in the next slot. In the same way, each bit in the
destination field states whether the corresponding node has a data packet
destined to it in the next slot. Additional information, such as flow control,
is also included in the control packet; for clarity, this is not shown in the
figure.
145
Slot 1 Slot 1+1 Slot 1+2
7LPH
)LJXUH,QHDFKVORWDQRGHSDVVHVWUDQVPLWVRQHFRQWUROSDFNHW
DQG RQH GDWD SDFNHW ZKHUH WKH FRQWURO SDFNHW LV XVHG IRU WKH
DUELWUDWLRQRIWKHQH[WVORW
Because all of the nodes succeeding the slot initiator repeat the procedure of
checking the control packet, multiple transmissions in different segments of
the ring can occur in the same slot. The transceivers are designed to allow
for both reception and transmission at the same time (see Figure 53), which
increases the possibility of spatial bandwidth reuse. An example of how the
control packets travel around a five-node network is shown in Figure 54.
146
Prepared
control-packet
Control- Register
packet
From up-stream To down-stream
neighbor neighbor
The reason why the control packet travels only among the first 0− 1 links
after SI+1 is that the clock signal is interrupted by the SI (see Figure 49).
The node that initiated the transmission of the control packet, SI+1, does
not return the packet. Consequently, it will not be informed of whether or
not there is a data packet destined to it in the next slot. However, the node
Receive Transmit
buffer buffer
Data
packet
From up stream To down stream
neighbor neighbor
)LJXUH&RQFHSWXDOYLHZRIWKHGDWDFKDQQHOSDUWRI
WKHWUDQVFHLYHU
147
2XWJRLQJFRQWUROSDFNHW
1RGH /LQN 'HVW 7UDQVPLVVLRQDOORFDWHG
1 11000 00100 To Node 3
2 11000 00100 Could not allocate
3 11000 00100 Could allocate transmission to
Nodes 4, 5, and 1 but had
nothing to send
4 00011 10001 Multicast to Nodes 5 and 1
5 00011 10001 Could not allocate
)LJXUH$FRQWUROSDFNHWWUDYHOVDURXQGDQHWZRUNZLWKILYHQRGHV
1RGHLVWKHVORWLQLWLDWRU
It is essential for desirable performance that the delay of the control packet
in each node it bypasses be minimal, especially in large networks. One
method is to organize the bits in the link-reservation field in the control
packet for each slot, so that they appear in the same order as that in which
the control packet travels. In other words, the first bit corresponds to the
outgoing link from the slot initiator. Thus, when a node wishes to change
the contents of a control packet, it does not have to store the whole packet
before checking and possibly overwriting it. Instead, it can retransmit the
packet bit by bit and exchange the remaining part of the packet (if
transmission is possible) after reading the bit in the link reservation field
corresponding to its outgoing link. The node’s bit in the destination field in
the incoming control packet must, however, be checked before it is thrown
away. Using this method, the delay in each node can be reduced to only one
or a few bits.
8VHU6HUYLFHV
The user services described below are: real-time virtual channels (Section
9.4.1), guarantee seeking messages (Section 9.4.2), best effort messages
(Section 9.4.3), barrier synchronization (Section 9.4.4), global reduction
(Section 9.4.5), and reliable transmission (Section 9.4.6).
148
Node 1 Node 2 Node 3
Link 1 Link 2
Link 5 Link 3
Link 4
Node 5 Node 4
5HDO7LPH9LUWXDO&KDQQHOV
Many computer systems have real-time demands where the network must
offer logical connections with guaranteed bandwidth and bounded latency.
This can be done in the network by using slot reserving. We refer to such
connections as RTVCs. Either the whole ring is reserved for a specific node
in a slot, or several segments of the ring are dedicated to some specific
nodes.
When a node wishes to reserve a slot for an RTVC, it searches for slots
where the required links are free to allow allocation of a new segment. First,
the node’s own slots (i.e., where the node itself is the slot initiator) are
searched. If enough slots (actually only a segment in each slot) cannot be
allocated for the reservation, the search is continued in other slots. In this
149
)LJXUH 7KH EDQGZLGWK XWLOL]DWLRQ GHSHQGV RQ WKH UDWLR RI WKH
WRWDO SURSDJDWLRQ GHOD\ DURXQG WKH ULQJ WR WKH F\FOH OHQJWK 7KH
ER[HV ZLWK EROG WH[W VKRZ WKH OLQN WKURXJK ZKLFK HDFK VORW ILUVW
SURSDJDWHV
case, the node broadcasts a packet containing a request to all other nodes to
allocate the desired segment in their slots. The packet contains information
about the links required and the number of slots needed. Each node then
checks whether any of its own slots have the required free links. All nodes
send a packet back to the requesting node to notify which slots, if any, that
have been allocated. When the requesting node has received the answers, it
decides whether it is satisfied with the number of allocated slots. If not, it
sends a release packet. Otherwise, it can start using the reserved slots
immediately. However, it should still send a release packet if more slots
than needed were allocated. The same medium access method as for
ordinary slots is used for reservation slots, but with the restriction not to
pass links in reserved segments of the ring. A reserved segment of the ring
temporarily not in use, however, can be reused by nodes down stream of the
owner.
*XDUDQWHH6HHNLQJ0HVVDJHV
Guarantee seeking messages normally have hard timing constraints. If the
communication system cannot guarantee the timing constraints of a
guarantee seeking message, the owner of the message should be made
aware of it immediately. In the CC-FPR network, a guarantee is given only
if enough deterministic bandwidth (slots) owned by the node is free before
the deadline of the message. The available deterministic bandwidth
corresponds to ordinary slots where the node is the slot initiator and which
150
have not already been encountered for other guarantee seeking messages
queued in the node.
Each transmitter has 0 − 1 queues for guarantee seeking messages, one for
each possible destination (the node itself excluded). When a multicast
packet arrives for queuing, it is put in the queue corresponding to the
multicast destination furthest away from the source node downstream. In
this way, multicast packets are treated in the same way as single-
destination packets, and multiple multicast packets can travel in the
network at the same time whenever possible.
%HVW(IIRUW0HVVDJHV
Best effort messages can be sent in all ordinary slots where the node is SI
but in which the node does not have any guarantee seeking messages that
can be sent. In competition with the other nodes according to the CC-FPR
protocol, other ordinary slots can also be used but, again, only as long as no
guarantee seeking messages can be sent. The same method is used for
reservation slots that are not (or are only partly) reserved for the moment or
that are reserved but not used in the current slot.
%DUULHU6\QFKURQL]DWLRQ
Barrier Synchronization (BS) is an operation to control the flow of processes
in a distributed processing system. A logical point in the control flow of an
algorithm is defined, at which all processes in a processing group must
arrive before any of the processes in the group are allowed to proceed
further. When, during execution, a BS point is encountered in the
application program, the node broadcasts the encountered BS_id in the
control packet when the node is SI+1. In this way, all nodes are notified that
the node has reached the BS point. Nodes not belonging to the BS group can
ignore the broadcast, but nodes belonging to the same group, i.e., have the
same id, will make a note in an internal table. The control packet contains a
field in which BS_id can be sent (see Figure 57). The id field contains eight
bits, which permits ids ranging from one to 255. When the field is zero, no
BS command is sent.
When a node participating in the BS group has received the correct BS_id
from all the participants, it knows that all the other nodes are at the same
executing point and may proceed. The worst case latency for a node that
reaches the BS point until it can broadcast this to the other nodes is one
151
7\SH
Link
Destination ACK/NACK Flow ID Type
Start reservation Data field
field field control field Field bit
field
1 M M M 4*(M-1) 8 1 320
1EURIELWV
)LJXUH'HWDLOHGGHVFULSWLRQRIWKHFRQWUROSDFNHWFRQWHQWV
cycle. This assumes one slot per node and that each node is SI+1 only once
per cycle. Clearly, the implications of sending BS information in the control
channel are both bounded latency and better bandwidth utilization in the
data channel. The whole BS mechanism is handled by the communication
interface, transparent to the calling user processes.
*OREDO5HGXFWLRQ
Global reduction is similar to barrier synchronization, where data are
collected from distributed processes when they signal their arrival at the
synchronization point. A global operation, e.g., sum or product, is performed
on the collected data so that only a single value is returned. At the end of
the GR, all participating nodes have access to the same data. As in the case
of BS, we assume that the programmer (or the compiler) statically allocates
the necessary parameters, off-line before run-time.
152
The type bit states whether the control packet contains a BS or a GR
command and hence whether data are contained in the data field (see
Figure 57). The data field is currently only used for data reduction.
/RZ/HYHO6XSSRUWIRU5HOLDEOH7UDQVPLVVLRQ
The proposed network has low level support for reliable transmission
[Bergenhem and Olsson 1999]. Network control information,
acknowledge/negative acknowledge and flow control, is sent in the control
channel instead of in ordinary data packets. This results in less or no
overhead in the data channel, i.e., better bandwidth utilization. The field in
the control packet named ACK/NACK contains bits that correspond to the 0
packets that may have been received by the current SI+1 during the
previous 0 slots. The ACK/NACK information is therefore always sent
when a node is SI+1. If a packet was correctly received (correct checksum), a
“1” is written in the position of the ACK/NACK field that corresponds to the
slot that the packet was received. If a faulty packet or no packet was
received, a “0” is written. All nodes must keep track of their transmissions
and can therefore resolve the meaning of the bits in the ACK/NACK field. In
this way, the nodes can be notified as to whether their packet was correctly
received or must be retransmitted. The latency for a node to send and
receive ACK/NACK is bounded, which is desirable.
The 4(0 − 1) bits in the flow control field relate to independent logical
connections. Put simply, each node can have up to four logical connections,
with low-level support for flow control, from each other node. The SI+1 sets
the bit corresponding to a logical connection to "1" if it is to be halted
temporarily; otherwise, the bit is set to "0".
,PSOHPHQWDWLRQ$VSHFWV
In addition to the high bandwidth offered by a fiber-ribbon cable, it also
offers a ten-fold increase in packing density as compared to electrical cables,
resulting in less rigid cables [Karstensen et al. 1995]. Furthermore, it is not
necessary for the designer to be concerned about electromagnetic emissions.
These properties make possible new components such as the single-chip
optoelectronic switch core reported in [Szymanski et al. 1998]. In addition to
the switch function, the chip eliminates 32 OPTOBUS 800 Mbit/s per fiber
transceivers. This translates to an aggregated bandwidth of 204 Gbit/s
through the switch when eight of the ten fibers on each link are used for
data. Such a switch can connect multiple ring clusters when building large
networks. The high bit-rate of a fiber-ribbon link makes it possible to reduce
the slot duration in the proposed networks, still keeping the same number of
bits in a packet as on a slower link based on an electrical cable. This reduces
the latency.
153
In scaling up the bandwidth of a fiber-ribbon link where a dedicated fiber
carries the clock signal, the main problem is channel-to-channel skew. The
skew is mainly the result of differences in propagation delay between
different fibers and variations of lasing delay time among different laser
diodes [Kurokawa et al. 1998]. The 400 Mbit/s OPTOBUS has a specified
maximum skew of 200 ps, excluding the fiber-ribbon cable for which 6 ps/m
is assumed for standard ribbons [Schwartz et al. 1996]. Since the data
stream passing a node in the ring network is, at least, passing a pipeline
register, the channel-to-channel skew is not accumulated over several
nodes. The limited distance between two adjacent nodes, owing to skew, is
in the same order of magnitude as in today’s LAN networks (a few hundred
meters). In parallel and distributed computer systems, so called System
Area Networks (SAN), the maximum required distance is normally lower
than this limit. It may however be difficult to increase the bit rate to several
Gbit/s per fiber without significantly reducing the distance. It can also be
argued that it should be possible to construct networks with more physically
distributed nodes, since this may be valuable in some applications. The
latency of the network is not dependant on the distance, except that the
propagation delay is added and the throughput is reduced as mentioned
above. This motivation calls for the discussion of techniques to reduce the
effect of the skew below.
One technique is to actually reduce the skew, either by using low skew
ribbons or employing skew compensation. Fiber-ribbons with about 1 ps/m
skew [Siala et al. 1994] and below [Kanjamala and Levi 1995] have been
developed, which essentially increases the possible bandwidth distance
product. All the fibers in the same ribbon were sequentially cut to reduce
the variation of refractive index among the fibers. In the fiber-ribbon link
described in [Wong et al. 1995], a dedicated fiber carries a clock signal used
to clock data on 31 fibers. The transmitter circuitry for each channel has a
programmable clock skew adjustment to adjust the clock in 80-ps
increments.
Another technique is to extract the clock signal from the bit flow on each
fiber instead of using a separate fiber to carry the clock signal. The
disadvantage is increased hardware complexity when adding a clock
recovery circuit and a buffer circuit for each channel in the receiver. A
hybrid solution is to skip the separate clock channel and encode clock
information on the data channels while still sending in bit-parallel mode, as
reported in [Yoshikawa et al. 1997] [Yoshikawa et al. 1997B]. In this case, a
deskew unit relying on FIFO registers (First In First Out) ensures that
parallel data words that are output from the receiver are identical to those
which were sent. A possible ± 15-ns deskew was reported. A similar system
is reported in [Fujimoto et al. 1998].
154
The techniques mentioned above introduce either increased hardware
complexity or a more sophisticated fiber-ribbon manufacturing process. If
the manufacturing process allows for adding more fibers in each ribbon, this
may be a cheaper alternative. For example, 12 channel links with 1 Gbit/s
per channel [Karstensen et al. 1995] and 2 Gbit/s per channel [Karstensen
1995] have been reported, and array modules supporting 12 × 2.4 Gbit/s for,
e.g., fiber-ribbon links were described in [Peall 1995]. A fiber-ribbon link
with 32 fibers, each with a bit rate of 500 Mbit/s, was described in [Wong et
al. 1995], and researchers at NEC have developed a module in which 8 × 2
lasers are coupled to two fiber-ribbons [Kasahara 1998]. Instead of fiber-
ribbons, fiber imaging guides (FIGs) with thousands of pixels can be used
[Li et al. 1995]. In the system described in [Li et al. 1998B], both a 14000-
pixel FIG and a 3500-pixel FIG were coupled to an 8 × 8 VCSEL array in
different setups.
&DVH6WXG\
A typical application with high throughput requirements and a pipelined
data flow between the computational modules is future radar signal
processing systems. In Figure 58, a signal processing chain, similar to those
described in [Jonsson et al. 1996] [Taveniku et al. 1996] is shown together
with its bandwidth demands. Each computational module in the figure
contains multiple processors. The chain is a good example containing both
multicast, one-to-many, and many-to-one communication patterns. The
aggregated throughput demand is 30 Gbit/s. Only the throughput
requirements are treated here; all details on the chain are covered in the
two papers referred to above. The data flow must not be disturbed by, for
example, status information that the network must also transport. Slot
reserving is therefore a good choice for the data flow in the signal processing
chain when using the CC-FPR network.
We assume links with ten fibers and 800 Mbit/s per fiber in the case study.
In the CC-FPR network, this translates to a bandwidth of 6.4 Gbit/s for data
traffic on eight of the fibers. For simplicity, we assume an efficient
bandwidth of 6.0 Gbit/s after, for example, check-sums have been excluded.
Figure 58 shows 13 nodes. In addition, the antenna is seen as one node
(feeds the first node in the chain with data) and one master node is
responsible for supervising the whole system and interacting with the user.
We denote the antenna as node P, the modules shown in the figure as node
PL, 2 ≤ L ≤ 14, and the master node as P. The numbers of the modules are
also indicated in the figure. Hence, the number of ordinary slots is 15, but
the cycle is extended so that it also contains 30 slots for reservation.
Accordingly, there are 45 slots in a cycle, where one slot per cycle
corresponds to a bandwidth of 133 Mbit/s at a total efficient bandwidth of
6.0 Gbit/s.
155
Beam 2 x 2.0 Gb/s
Pulse Compression
Forming
4 x 500
Mb/s
10 11 12 13 14
)LJXUH 'DWD IORZ EHWZHHQ WKH PRGXOHV LQ WKH UDGDU VLJQDO SURFHVVLQJ
FKDLQ
A feasible allocation scheme of the slots is shown in Figure 59. For clarity,
all of the reservation slots are placed after the ordinary slots. In a real
implementation, however, there are two ways of spreading the reservation
slots, as explained below in Section 9.7. Care must be taken, however, in
distributing the reservation slots, the reason being that, when there are
intermediate nodes between the source and destination nodes, allocation is
not possible in those slots in which one of the intermediate nodes is the slot
initiator.
The maximum data flow from one module is 4.0 Gbit/s, which corresponds to
reserving having a segment of the ring in all of the 30 slots (for reservation).
Slots for both of the two 2 Gbit/s data flows to the pulse compression nodes
can be allocated, since one of the two data flows is tapped before adding the
data flow produced from the same node. The incoming data flow to the
CFAR (Constant False Alarm Ratio) nodes is multicasted to all of these
156
'DWDVORWV
/LQNV
1−2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2−3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3−4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4−5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3
5−6 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5
6−7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
7−8 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
8−9 ..... 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 - - - ..... -
9 − 10 - - - - - - - - - - - - - - - 9 9 9 9
10 − 11 10 10 10 10 - - - - - - - - - - - 9 9 9 9
11 − 12 10 10 10 10 11 11 11 11 - - - - - - - 9 9 9 9
12 − 13 10 10 10 10 11 11 11 11 12 12 12 12 - - - 9 9 9 9
13 − 14 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 - - -
14 − 15 - - - - - - - - - - - - - - - - - - -
15 − 1 - - - - - - - - - - - - - - - - - - -
)LJXUH2QHF\FOHRIVORWVZKHUH6ORWVWKURXJKDUHUHVHUYHG(DFK
QXPEHU LQ WKH WDEOH LQGLFDWHV IRU 6ORWV WKURXJK WKH RZQHU RI WKH
FRUUHVSRQGLQJ OLQN 1HLJKERULQJ VHJPHQWV LQ HDFK VORW KDYH GLIIHUHQW
EDFNJURXQGVKDGLQJ7KHVORWLQLWLDWRUVDUHLQGLFDWHGLQ6ORWVWKURXJK
nodes. Although this multicast data flow must remain unchanged until the
last CFAR node, it can coexist with the data flow produced from the CFAR
nodes. The reason for this is that the multicast bandwidth is only 2 Gbit/s.
The rest of the data flows are pure pipeline flows and fit easily on the
network as long as the calculations are mapped on the nodes according to
the pipeline order.
3HUIRUPDQFH$QDO\VLV
To be able to give guaranteed real-time services, worst-case performance
must be determined. An analysis of how worst-case latency and
deterministic throughput vary with the network design parameters is given
below. Each node is assumed to have 1RUG ordinary slots and 1UHV slots for
which reservation is allowed, per cycle. Two ways of circulating the role of
being slot initiator are treated and compared to each other (see Figure 60):
Case A, where each node is slot initiator in 1RUG + 1UHV slots in sequence, and
Case B, where the slots are interleaved in a way that each cycle is in
practice divided into 1RUG + 1UHV sub-cycles, where each node is slot initiator
once per sub-cycle. Throughout the section, the total propagation time
157
Case A
Cycle 1 Cycle 1+1
7 SURS
7 SURS
P1 P1 P1 P2 P2 P2 P3 P3 P3 P1 P1 P1 P2 P2 P2 P3 P3 P3
R R O R R O R R O R R O R R O R R O
Case B
Cycle 1
O = Ordinary slot
7 SURS
7 SURS
7 SURS R = Reservation slot
P1 P2 P3 P1 P2 P3 P1 P2 P3 P = Slot-initiator node
L
R R R R R R O O O
7LPH
)LJXUH6ORWGLVWULEXWLRQDFFRUGLQJWR&DVHV$DQG%,QWKHH[DPSOH1U
1R DQG0
around the ring is denoted as 7SURS, the skew between a control packet and
the data packet it arbitrates for as 7VNHZ, and 0 is the number of nodes in
the network. The value of 7VNHZ is assumed to be equal to the duration of one
slot, 7VORW, which is set to 1 µs in this analysis.
The equation holds for best effort traffic as well if no other (best effort or
guarantee seeking) traffic is queued in the node to be sent before. The
latency is plotted in Figure 61 against the number of nodes for different
combinations of 1RUG and 1UHV, and for different values of 7SURS. The same
worst-case latency, but here for Case B, is:
and is plotted in Figure 62. As seen in the figures, Case B can be more
sensitive to large propagation delays. With the same assumptions as above,
further comparison between Cases A and B is shown in Figure 63. Case B
158
300
o Nord = 1, Nres = 0, Tprop = 2.5 µs (500 m)
+ Nord = 1, Nres = 0, Tprop = 25 µs (5 km)
250 * Nord = 1, Nres = 3, Tprop = 2.5 µs (500 m)
x Nord = 1, Nres = 3, Tprop = 25 µs (5 km)
200
Latency [µs]
150
100
50
0
0 10 20 30 40 50
Number of nodes
)LJXUH :RUVWFDVH ODWHQF\ ZKHQ WKH RQO\ VORWV D QRGH JHWV DUH WKRVH
RUGLQDU\VORWVIRUZKLFKLWLVWKHVORWLQLWLDWRU&DVH$LVDVVXPHG
can offer lower latency in some situations in which 1RUG > 1, especially for
large values of 0.
159
300 o Nord = 1, Nres = 0, Tprop = 2.5 µs (500 m)
+ Nord = 1, Nres = 0, Tprop = 25 µs (5 km)
* Nord = 1, Nres = 3, Tprop = 2.5 µs (500 m)
250 x Nord = 1, Nres = 3, Tprop = 25 µs (5 km)
200
Latency [µs]
150
100
50
0
0 10 20 30 40 50
Number of nodes
)LJXUH :RUVWFDVH ODWHQF\ ZKHQ WKH RQO\ VORWV D QRGH JHWV DUH WKRVH
RUGLQDU\VORWVIRUZKLFKLWLVWKHVORWLQLWLDWRU&DVH%LVDVVXPHG
experienced by a node when it is handing over the role of being slot initiator
to the next node. Because the worst-case latency appears when a node just
misses a slot where it is the slot initiator (the last one if there are several in
a sequence), the whole 7SURS is always a part of the latency, even if 4 < 0.
Because of the similarity between Equations 19 and 21, a plot of Equation
21 would look the same as the plot of Equation 19 in Figure 61, except that
4 is represented on the x-axis. The same worst-case latency, but here for
Case B, is:
and varies with respect to both 4 and 0 but not with respect to 1RUG.
160
300
250
200
Latency [µs]
150
)LJXUH&RPSDULVRQRIZRUVWFDVHODWHQF\IRU&DVHV$DQG%
where 1JDSBL is the number of slots between the start of slot VL and the start
of slot VL+1 (V0 in next cycle if L = 1). The number of slots for which the source
node is handing over the role of being slot initiator to the next node during
this time is denoted as 16,BSDVVBL.
161
1
0.9
0.8
0.7
0.6
Throughput
0.5
0.4
0.3
* Nord + Nres = 4, Tprop = 2.5 µs (500 m)
o Nord + Nres = 1, Tprop = 2.5 µs (500 m)
0.2
x Nord + Nres = 4, Tprop = 25 µs (5 km)
0.1 + Nord + Nres = 1, Tprop = 25 µs (5 km)
0
0 10 20 30 40 50
Number of nodes
)LJXUH0D[LPXPDJJUHJDWHGWKURXJKSXW&DVH$LVDVVXPHG
037
6max = VORW
(25)
07 VORW
+7 SURS
and is plotted in Figure 65. As can be seen, the maximum throughput for
Case B does not vary with respect to 1RUG and 1UHV. When 1RUG + 1UHV = 1,
Case A and Case B give the same aggregated throughput. If 1RUG + 1UHV > 1,
however, Case A gives better aggregated throughput.
7VORW
6max = (26)
( 1 RUG + 1 UHV )07VORW + 7 SURS
162
1
0.9
0.8
0.7
0.6
Throughput
0.5
0.4
0.3
x Tprop = 0.025 µs (5 m)
0.2 * Tprop = 0.25 µs (50 m)
+ Tprop = 2.5 µs (500 m)
0.1 o Tprop = 25 µs (5 km)
0
0 10 20 30 40 50
Number of nodes
)LJXUH0D[LPXPDJJUHJDWHGWKURXJKSXW&DVH%LVDVVXPHG
and
7VORW
6max = (27)
( 1 RUG + 1 UHV )( 07VORW + 7 SURS )
6XPPDU\
We have presented a fiber-ribbon based ring network with services for
parallel and distributed real-time systems. A key component of the network
architecture is the flexible control channel that can be configured to be used
for different types of network control. Examples of this are the low level
support for barrier synchronization, global reduction, and reliable
transmission. A great advantage is that the low level support can be
implemented with little or no modifications to existing hardware. High
throughputs can be achieved in the network, especially in systems with
some kind of pipelined dataflow between the nodes. The network offers real-
163
time services for logical connections with guaranteed performance, RTVCs,
and for separate messages, best effort and guarantee seeking messages. An
analysis of worst-case latency and deterministic throughput has been
provided for two variants of time slot organization. One offers higher
throughput while the other offers lower latency in some situations. Also
worth mentioning is that the network can be built today using fiber-optic
off-the-shelf components and that this is ongoing work.
164
3DFNHWDQG&LUFXLW6ZLWFKHG)LEHU5LEERQ
3LSHOLQH5LQJ1HWZRUN
The physical ring of the second ring network proposed in this thesis is sub-
divided into two networks carrying different kinds of traffic [Jonsson et al.
1997B]. Nine of the fibers are used for time multiplexed circuit switched
traffic; eight fibers are for data and one is for clocking. The tenth fiber is
dedicated to packet switched traffic using, for example, a token ring
protocol. This fiber also carries control messages to reconfigure the TDMA
schedule (i.e., circuit establishment) for the other nine fibers. This network
is a good choice when the main data flow in the network does not change
rapidly. Compared to the CC-FPR network, the network for both packet and
circuit switched traffic is slightly simpler at the expense of somewhat
reduced support of dynamic traffic patterns. However, in many systems,
only a fraction of the traffic is, e.g., "bursty".
Circuit switched and packet switched traffic are discussed in Sections 10.1
and 10.2, respectively. Section 10.3 describes circuit establishment, while a
case study is provided in Section 10.4. Mode changes are treated in Section
10.5, and a summary is given in Section 10.6.
&LUFXLW6ZLWFKHG7UDIILF
For circuit switched traffic, the first nine fibers in each link form a high
speed channel. All of the high speed channels together form a high speed
ring network for circuit switched traffic. The access is divided into slots as
in an ordinary TDMA network. However, in each slot, the network can be
divided into segments as in the CC-FPR network. Also, for each slot, there is
always a slot initiator node. The same kind of asynchronous slot
synchronization method is also used.
The access is cyclic, and each cycle consists of . slots. In a typical case, . is
a multiple of 0, where 0 is the number of nodes, and each node is the slot
initiator in ./0 slots. An example of an agreed schedule for a network with
. = 0 = 5 slots per cycle is shown in Figure 66. Each column represents one
time slot and contains information on how the ring is segmented in that
slot. Each number in a column is the node index of the owner of the
corresponding link. The bold type numbers indicate the current slot
initiator. In each segment and slot, one, and only one, node can be the owner
of the links and hence has the right to use the segment links for
transmission. In the first slot in the example, node P1 (slot initiator) owns
the link between itself and node P2. Hence, it can transmit to node P2 but
not to any other node. In the same slot, node P2 can transmit to any of
165
/LQN 'DWDVORWV
2ZQHUV
/LQNV
1−2 5 - 5 5
2−3 2 2 5 5
3−4 2 2 3 5
4−5 2 2 3 -
5−1 2 5 3 5
)LJXUH([DPSOHRIDQDOORFDWLRQVFKHPHIRUWKHOLQNV
LQ D ILYHQRGH V\VWHP 7KH VORW LQLWLDWRUV DUH LQ EROG
W\SH DQG GLIIHUHQW VHJPHQWV KDYH GLIIHUHQW EDFNJURXQG
VKDGLQJ
nodes P3, P4, P5, or P1. The choice is made by the process that owns the
circuit (logical connection) to which the slot segment is associated. A
multicast to two or more of these nodes is also possible.
In the third slot, the link between node P1 and node P2 is free. Although the
link is free, node P1 must not disturb the asynchronous slot synchronization
technique and therefore transmits an empty packet to node P2. In the fifth
slot, node P5 has the capability of transmitting a broadcast packet (a packet
to all other nodes in the ring).
.37
6 max
=
.7
VORW
+7
(28)
VORW SURS
where . is the number of slots per cycle, 3 is the average number of packet
transmissions in each slot, 7VORW is the duration of one slot, and 7SURS is the
total propagation delay around the ring.
166
3DFNHW6ZLWFKHG7UDIILF
The tenth fibers from all of the links are combined to form a ring network
totally dedicated to packet switched traffic. An ordinary ring protocol can be
used. However, there are two requirements: (L) it must be possible to halt
the protocol when special packets for circuit establishment are to be
transmitted (see Section 10.3), and (LL) the latency must be upper bounded
to assure transmission of the packets for circuit establishment. When using,
e.g., a token ring protocol on the packet network, this network will support
low latency communication for sporadic packets at moderate traffic rates. At
the same time, it is assured that the circuit switched traffic (often real-time
traffic) is not disturbed by packet switched traffic.
&LUFXLW(VWDEOLVKPHQW
When a node is to establish a new circuit, it searches for slots where the
required links are free so that the allocation of a new segment can be made.
First, the node’s own slots (i.e., where the node itself is the slot initiator) are
searched. When too few slots (actually only a segment in each slot) for the
circuit can be allocated, the search is continued in other slots. In that case, a
special UHTXHVW SDFNHW is transmitted on the packet network to ask other
nodes to allocate the desired segment in their slots. This packet is
immediately followed by a FROOHFWSDFNHWto collect information on the success
of the slot segment allocations.
When the requesting node receives the collect packet after one round, it
decides whether the number of allocated slots is sufficient. If not, it sends a
release packet. Otherwise, it can start using the established circuit
immediately.
167
&DVH6WXG\
With the same assumptions as are made in the case study described in
Section 9.6 (Page 155), we will now show that the second pipeline ring
network is also feasible for the chosen radar signal processing application.
Since we still assume links with ten fibers and 800 Mbit/s per fiber, a
bandwidth of 6.4 Gbit/s is dedicated to circuit switched traffic on eight of the
fibers, while 800 Mbit/s is dedicated to packet switched traffic on one fiber.
We also assume there are still 45 slots per cycle, which gives a bandwidth of
133 Mbit/s per cycle and slot. In the case of the second network, this means
that each cycle is divided into . = 45 slots per cycle for circuit switched
traffic. The allocation scheme in Figure 59 on Page 157 then also holds for
this network, leaving Slots 1 through 15 free. Another possibility is to have
. = 12 slots per cycle. In that case, one slot corresponds to 500 Mbit/s, and
all bandwidths in Figure 58 on Page 156 are divisible by the slot bandwidth.
0RGH&KDQJHV
It is possible to use a number of different working modes in a radar system.
The task of one mode can, for example, be to scan the whole working range,
while the task of another mode may be to track a certain object. Normally,
the algorithm mapping and communication patterns are different for two
different modes. The change of the circuits at mode changes can be
performed in two different ways: (L) switching between the various slot
allocation schemes for a known set of modes, schemes that are statically
stored in each node, and (LL) dynamically changing the slot allocation
scheme in each node at a mode change by establishing new circuits, as
described in Section 10.3.
In the second case, the mode change is initiated by the master node in the
same manner as a broadcast packet. However, each of the nodes involved is
then responsible for requesting its required bandwidth. Each node also
sends its own acknowledge (or negative acknowledge if it failed to establish
168
the required circuits) packet to the master node, indicating that it is
prepared for the mode change.
When all the nodes have been prepared for the mode change, the system
will change to the new mode in the next batch. The packets coming from the
antenna in the new batch will be tagged to indicate the new mode. In that
way, the nodes will be triggered to change to the new mode. Nodes that are
placed later in the signal processing chain are triggered by the packets
generated by succeeding nodes. Even if a node has a totally different job to
do (and a different communication pattern) in the new mode, it can be
triggered in this way. This is possible as long as two different batches are
data independent.
6XPPDU\
We have presented a ring network in which very high throughputs can be
achieved, especially in systems having some kind of pipelined dataflow
between the nodes. The network supports packet switched traffic at the
same time as guaranteed bandwidth is supported through circuit switching.
In a typical system, circuits can be set up for time critical dataflows,
guaranteeing that they are not disturbed by, e.g., control information. These
network features are highly appreciated in, e.g., radar signal processing
systems. It is also worth mentioning that the network can be built today
using fiber-optic off-the-shelf components.
169
&RQFOXVLRQVDQG)XWXUH:RUN
&RQFOXVLRQV
Two different kinds of networks have been in focus, WDM star networks and
fiber-ribbon ring networks. Protocols and real-time services were proposed
and evaluated for both networks (see Chapters 7 and 9, respectively).
Because the WDM star network implements a distributed crossbar it has
better support for general communication patterns, compared to the ring.
From the perspective of fault-tolerance (which, however, not has been
addressed in this thesis) the passive optical star seems to be more reliable
than a uni-directional ring, due to its passive nature. On the other hand, the
ring network allows for simpler transceiver design as a consequence of the
relaxed clock synchronization (because of the asynchronous slot-
synchronization method), lack of tuning latencies, and the fact that no
propagation delay measurements has to be done. In practice, this can lead
to shorter time-slots which, in turn, reduces the latencies. Two other
important factors are what kind of traffic (communication pattern etc.) that
should be carried by the network and how the different technologies will
mature.
2QJRLQJDQG)XWXUHZRUN
A network demonstrator of the control channel based ring network is
currently being built, and some initial tests with two nodes have been
performed (see Figure 67). Some building blocks included in a node are:
171
)LJXUH 1HWZRUN GHPRQVWUDWRU ZLWK WZR QRGHV 7KH FDEOH EXQGOH LQ WKH
XSSHU SDUW RI WKH ILJXUH FRQWDLQV WZR ILEHUULEERQV HQGLQJ LQ HDFK QRGHV
2372%86PRGXOH
• Dual-ported memories and various control logic (the big cards on which
other cards are mounted − see Figure 67). There are three memory
banks for transmission, reception, and protocol interactions between
the two processors, respectively.
172
initiated and include collaboration with research groups on photonics
and micro-electronics at Chalmers University of Technology.
Finally, it should be noted that the work has been valuable for the
industrial partner, Ericsson Microwave Systems, and may have impact on
future computer system designs for, e.g., radar signal processing.
173
5HIHUHQFHV
[Acampora and Karol 1989] A. S. Acampora and M. J. Karol, "An overview
of lightwave packet networks," ,(((1HWZRUN, pp. 29-41, Jan. 1989.
175
[Anderson et al. 1995] T. E. Anderson, D. E. Culler, and D. A. Patterson, "A
case for NOW (Networks of Workstations)," ,(((0LFUR, vol. 15, no. 1, pp.
54-64, Feb. 1995.
176
[Beecroft et al. 1994] J. Beecroft, "Meiko CS-2 interconnect ELAN-ELITE
design," 3DUDOOHO&RPSXWLQJ, vol. 20, pp. 1627-1638, no. 10, 1994.
177
[Bhuyan et al. 1989B] L. N. Bhuyan, D. Ghosal, and Q. Yang, “Approximate
analysis of single and multiple ring networks,” ,((( 7UDQVDFWLRQV RQ
&RPSXWHUV, vol. 38, no. 7, pp. 1027-1040, July 1989.
[Bogineni and Dowd 1992] K. Bogineni and P.W. Dowd, "A Collisionless
Multiple Access Protocol for a Wavelength Division Multiplexed Star-
Coupled Configuration: Architecture and Performance Analysis," -RXUQDORI
/LJKWZDYH7HFKQRORJ\, vol. 10, no. 11, pp. 1688-1699, Nov. 1992.
178
[Borella et al. 1998B] A. Borella, G. Cancellieri, and F. Chiaraluce,
:DYHOHQJWK 'LYLVLRQ 0XOWLSOH $FFHVV 2SWLFDO 1HWZRUNV. Artech House, Inc.,
Norwood, MA, USA, pp. 278-282, 1998, ISBN 0-89006-657-4.
[Carlile 1993] B. R. Carlile, "Algorithms and design: the CRAY APP shared-
memory system," 3URF WK $QQXDO ,((( ,QWHUQDWLRQDO &RPSXWHU
&RQIHUHQFH &203&21 6SULQJ ª 'LJHVW, San Francisco, CA, USA, pp.
312-320, 1993.
179
[Chamberlain et al. 1998] R. D. Chamberlain, M. A. Franklin, R. B.
Krchnavek, and B. H. Baysal, ”Design of an optically-interconnected
multiprocessor,” 3URF WK ,QWHUQDWLRQDO &RQIHUHQFH RQ 0DVVLYHO\ 3DUDOOHO
3URFHVVLQJXVLQJ2SWLFDO,QWHUFRQQHFWLRQV0332,ª Las Vegas, NV, USA,
June 15-17, 1998, pp. 114-122.
180
[Chen et al. 1990] M. Chen, N. R. Dono, and R. Ramaswami, "A media-
access protocol for packet-switched wavelength division multiaccess
metropolitan area networks," ,((( -RXUQDO RQ 6HOHFWHG $UHDV LQ
&RPPXQLFDWLRQV, vol. 8, no. 6, pp. 1048-1057, Aug. 1990.
181
[Chiang et al. 1996] T.-K. Chiang, S. K. Agrawal, D. T. Mayweather, D.
Sadot, C. F. Barry, M. Hickey, and L. G. Kazovsky, "Implementation of
STARNET: a WDM computer communication network," ,((( -RXUQDO RQ
6HOHFWHG$UHDVLQ&RPPXQLFDWLRQV, vol. 14, no. 5, pp. 824-839, June 1996.
[Chlamtac and Ganz 1988] I. Chlamtac and A. Ganz, "A multibus train
communication (AMTRAC) architecture for high-speed fiber optic networks,"
,((( -RXUQDORQ6HOHFWHG$UHDVLQ &RPPXQLFDWLRQV, vol. 6, no. 6, pp. 903-
912, July 1988.
[Cho et al. 1995] W. Cho, C. Shim, M.-L. Song, J.-Y. Lee, and S. B. Lee,
"MCDQDB (multi-channel DQDB) using WDM (wavelength division
multiplexing)," 3URF ,((( *OREDO 7HOHFRPPXQLFDWLRQV &RQIHUHQFH
*/2%(&20ª, vol. 3, pp. 2205-2209, Nov. 1997.
182
[Cremer et al. 1992] C. Cremer, N. Emeis, M. Schier, G. Heise, G.
Ebbinghaus, and L. Stoll, “Grating spectrograph integrated with photo
diode array in InGaAsP/InGaAs/InP,” ,((( 3KRWRQLFV 7HFKQRORJ\ /HWWHUV,"
vol. 4, no. 1, pp. 108-110, Jan. 1992.
[Dally et al. 1998] W. J. Dally, M.-J. E. Lee, F.-T. An, J. Poulton, and S.
Tell, ”High-performance electrical signaling,” 3URF WK ,QWHUQDWLRQDO
&RQIHUHQFH RQ 0DVVLYHO\ 3DUDOOHO 3URFHVVLQJ XVLQJ 2SWLFDO ,QWHUFRQQHFWLRQV
0332,ª Las Vegas, NV, USA, June 15-17, 1998, pp. 11-16.
[Davies and Ghani 1983] P. Davies and F. A. Ghani, “Access protocols for an
optical-fibre ring network,” &RPSXWHU &RPPXQLFDWLRQV, vol. 6, no. 4, pp.
185-191, Aug. 1983.
183
[Delorme et al. 1996] F. Delorme, S. Grosmaire, G. Alibert, S. Slempkes,
and A. Ougazzaden, “Simple multiwavelength device fabrication technique
using a simple-grating holographic exposure,” ,((( 3KRWRQLFV 7HFKQRORJ\
/HWWHUV," vol. 8, no. 7, pp. 867-869, July 1996.
[Dong et al. 1998] L. Dong, R. Melhem, and D. Mossé, "Time slot allocation
for real-time messages with negotiable distance constraints," 3URF 5HDO
7LPH7HFKQRORJ\DQG$SSOLFDWLRQV6\PSRVLXP, pp. 131-136, 1998.
[Dowd and Chu 1994] P. W. Dowd and J. Chu, ”Photonic architectures for
distributed shared memory multiprocessors,” 3URF 0DVVLYHO\ 3DUDOOHO
3URFHVVLQJXVLQJ2SWLFDO,QWHUFRQQHFWLRQV0332,ª Cancun, Mexico, Apr.
26-27, 1994, pp. 151-161.
184
[Eicken et al. 1995] T. von Eicken, A. Basu, and V. Buch, "Low-latency
communication over ATM networks using active messages," ,((( 0LFUR,
vol. 15, no. 1, pp. 46-53, Feb. 1995.
[Eng 1988] K. Y. Eng, "A photonic knockout switch for high-speed packet
networks," ,(((-RXUQDORQ6HOHFWHG$UHDVLQ&RPPXQLFDWLRQV, vol. 6, no. 7,
pp. 1107-1116, Aug. 1988.
[Fibre Systems 1998] "Parallel optics can feed the clamour for speed," )LEUH
6\VWHPV vol. 2, no. 4, pp. 27-28, May 1998.
[Finn and Mason 1996] N. Finn and T. Mason, “ATM LAN emulation,”
,(((&RPPXQLFDWLRQV0DJD]LQH, no. 6, pp. 96-100, June 1996.
185
[Flynn 1966] M. J. Flynn, ”Very high-speed computing systems,”
3URFHHGLQJVRIWKH,(((, vol. 54, no. 12, pp. 1901-1909, Dec. 1966.
[Frenkel and Lin 1988] A. Frenkel and C. Lin, "Inline tunable etalon filter
for optical channel selection in high density wavelength division
multiplexed fibre systems," (OHFWURQLFV /HWWHUV, vol. 24, no. 3, pp. 159-161,
Feb. 4, 1988.
[Ganz and Gao 1992B] A. Ganz and Y. Gao, "Traffic scheduling in multiple
WDM star systems," 3URF ,((( ,QWHUQDWLRQDO &RQIHUHQFH RQ
&RPPXQLFDWLRQV,&&ª, Chicago, IL, USA, June 1992, pp. 1468-1472.
[Ganz and Koren 1991] A. Ganz and Z. Koren, “WDM passive star -
protocols and performance analysis,” 3URF,1)2&20ª, Bal Harbour, FL,
USA, Apr. 1991, pp. 991-1000.
186
[Garrett and Li 1991] M. W. Garrett and S.-Q. Li, "A study of slot reuse in
dual bus multiple access networks," ,((( -RXUQDO RQ 6HOHFWHG $UHDV LQ
&RPPXQLFDWLRQV, vol. 9, no. 2, pp. 248-256, Feb. 1991.
187
[Goodman et al. 1988] M. S. Goodman, E. Arthurs, J. M. Cooper, H.
Kobrinski, and M. P. Vecchi, "Demonstration of fast wavelength tuning for a
high performance packet switch," 3URFRIWKHWK(XURSHDQ&RQIHUHQFHRQ
2SWLFDO&RPPXQLFDWLRQ(&2&ª, pp. 255-258, 1988.
188
[Guilfoyle et al. 1998] P. S. Guilfoyle, J. M. Hessenbruch, and R. V. Stone,
“Free-space interconnects for high-performance optoelectronic switching,”
&RPSXWHU, vol. 31, no. 2, pp. 69-75, Feb. 1998.
[Ha and Pinkston 1995] J.-H. Ha and T. M. Pinkston, ”The SPEED cache
coherence protocol for an optical multi-access interconnect architecture,”
3URF QG ,QWHUQDWLRQDO &RQIHUHQFH RQ 0DVVLYHO\ 3DUDOOHO 3URFHVVLQJ XVLQJ
2SWLFDO ,QWHUFRQQHFWLRQV 0332,ª San Antonio, TX, USA, Oct 23-24,
1995, pp. 98-107.
[Hahn 1995] K. H. Hahn, "POLO – Parallel optical links for gigabyte/s data
communications," 3URF/(26ª, San Francisco, CA, USA, Oct. 30 – Nov. 2,
1995, vol. 1, pp. 228-229.
[Hahn 1995B] K. H. Hahn, "POLO − parallel optical links for Gigabyte data
communications," 3URFRIWKHWK(OHFWURQLFV&RPSRQHQWVDQG7HFKQRORJ\
&RQIHUHQFH(&7&ª, pp. 368-375, 1995.
[Hahn et al. 1995] K. H. Hahn et al., "POLO: parallel optical links for
workstation clusters and switching systems," &RQIHUHQFH RQ 2SWLFDO )LEHU
&RPPXQLFDWLRQ2)&ª7HFKQLFDO'LJHVW, pp. 112-112, 1995.
189
[Halsall 1995] F. Halsall, 'DWD &RPPXQLFDWLRQV &RPSXWHU 1HWZRUNV DQG
2SHQ 6\VWHPV fourth edition, Addison-Wesley Longman Ltd., Essex, UK,
1995, ISBN 0-201-42293-X.
190
[Hockney and Jesshope 1988] R. W. Hockney and C. R. Jesshope, 3DUDOOHO
&RPSXWHUV $UFKLWHFWXUH 3URJUDPPLQJ DQG $OJRULWKPV. Adam Hilger,
Bristol, UK, 1988, ISBN 0-85274-812-4.
[Huang and Sheu 1996] N.-F. Huang and S.-T. Sheu, “A wavelength
reusing/sharing access protocol for multichannel photonic dual bus
networks,” -RXUQDO RI /LJKWZDYH 7HFKQRORJ\, vol. 14, no. 5, pp. 678-692,
May 1996.
[Huang and Sheu 1997] N.-F. Huang and S.-T. Sheu, “An efficient
wavelength reusing/migrating/sharing protocol for dual bus lightwave
networks,” -RXUQDORI/LJKWZDYH7HFKQRORJ\, vol. 15, no. 1, pp. 62-75, Jan.
1997.
191
[Hwang 1993] K. Hwang, $GYDQFHG &RPSXWHU $UFKLWHFWXUH. McGraw-Hill,
Inc., 1993, ISBN 0-07-031622-8.
[IEEE 1993] ,((( 6WDQGDUG IRU 6FDODEOH &RKHUHQW ,QWHUIDFH 6&,. IEEE,
New York, NY, USA, 1993, ISBN 1-55937-222-2.
192
[Jain 1993] R. Jain, “FDDI: current issues and future plans,” ,(((
&RPPXQLFDWLRQV0DJD]LQH, no. 9, pp. 98-105, Sept. 1993.
193
[Jonsson 1998B] M. Jonsson, “Two fiber-ribbon ring networks for parallel
and distributed computing systems,” 2SWLFDO(QJLQHHULQJ, vol. 37, no. 12, pp.
3196-3204, Dec. 1998.
194
[Juma 1996] S. Juma, “Bragg gratings boost data transmission rates,” /DVHU
)RFXV:RUOG, vol. 32, no. 11, pp. S5-S9, Nov. 1996.
195
[Karstensen et al. 1998] H. Karstensen, J. Wieland, R. Dal’Ara, and M.
Blaser, “Parallel optical link for multichannel interconnections at gigabit
rate,” 2SWLFDO(QJLQHHULQJ, vol. 37, no. 12, pp. 3119-3123, Dec. 1998.
196
[Kirkby 1990] P. A. Kirkby, "Multichannel wavelength-switched
transmitters and recievers-new component concept for broad-band networks
and distributed switching systems," -RXUQDO RI /LJKWZDYH 7HFKQRORJ\, vol.
8, no. 2, Feb. 1990.
197
[Krisnamoorthy et al. 1996] A. V. Krisnamoorthy, J. E. Ford, K. W. Goosen,
J. A. Walker, B. Tseng, S. P. Hui, J. E. Cunningham, W. Y. Jan, T. K.
Woodward, M. C. Nuss, R. G. Rozier, F. E. Kiamilev, and D. A. B. Miller,
”The AMOEBA chip: an optoelectronic switch for multiprocessor networking
using dense-WDM,” 3URF UG ,QWHUQDWLRQDO &RQIHUHQFH RQ 0DVVLYHO\
3DUDOOHO 3URFHVVLQJ XVLQJ 2SWLFDO ,QWHUFRQQHFWLRQV 0332,ª Maui, HI,
USA, Oct. 27-29, 1996, pp. 94-100.
198
[Larson and Harris 1995] M. C. Larson and J. S. Harris, Jr., “Broadly-
tunable resonant-cavity light-emitting diode,” ,((( 3KRWRQLFV 7HFKQRORJ\
/HWWHUV," vol. 7, no. 11, pp. 1267-1269, Nov. 1995.
[Larson and Harris 1996] M. C. Larson and J. S. Harris, Jr., “Wide and
continuous wavelength tuning in a vertical-cavity surface-emitting laser
using a micromachined deformable-membrane mirror," $SSOLHG 3K\VLFV
/HWWHUV, vol. 68, no. 7, pp. 891-893, Feb. 12, 1996.
[Laudon and Lenoski 1997] J. Laudon and D. Lenoski, "The SGI Origin: a
ccNUMA highly scalable server," 3URF WK ,QWHUQDWLRQDO 6\PSRVLD RQ
&RPSXWHU$UFKLWHFWXUH,6&$
, Denver, CO, USA, June 2-4, 1997.
199
[Lenoski et al. 1993] D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens,
A. Gupta, and J. Hennessy, "The DASH prototype: logic overhead and
performance," ,(((7UDQVDFWLRQVRQ3DUDOOHODQG'LVWULEXWHG6\VWHPV, vol.
4, no. 1, pp. 41-61, Jan. 1993.
[Li et al. 1998] Y. Li, J. Popelek, J.-K. Rhee, L. J. Wang, T. Wang, and K.
Shum, ”Demonstration of fiber-based board-level optical clock distributions,”
3URF WK ,QWHUQDWLRQDO &RQIHUHQFH RQ 0DVVLYHO\ 3DUDOOHO 3URFHVVLQJ XVLQJ
2SWLFDO ,QWHUFRQQHFWLRQV 0332,ª Las Vegas, NV, USA, June 15-17,
1998, pp. 224-228.
[Li et al. 1998C] Y. Li, J. Ai, and T. Wang, "100×100 opto-electronic cross-
connector using OPTOBUS," 3URF 2SWLFV LQ &RPSXWLQJ 2&
, Brugge,
Belgium, June 17-20, 1998, pp. 282-284.
[Liu and Prasanna 1998] W. Liu and V. K. Prasanna, "Utilizing the power of
high-performance computing," ,((( 6LJQDO 3URFHVVLQJ 0DJD]LQH, vol. 15,
no. 5, pp. 85-100, Sept. 1998.
[Loeb and Stilwell 1990] M. L. Loeb and G. R. Stilwell, "An algorithm for
bit-skew correction in byte-wide WDM optical fiber systems," -RXUQDO RI
/LJKWZDYH7HFKQRORJ\, vol. 8, no. 2, pp. 239-242, Aug. 1988.
200
[Lohmann et al. 1986] A. W. Lohmann, W. Stork, and G. Stucke, ”Optical
perfect shuffle," $SSOLHG2SWLFV, vol. 25, no. 10, pp. 1530-1531, May 15, 1986.
[Louri and Gupta 1996] A. Louri and R. Gupta, ”Hierarchical optical ring
interconnection (HORN): a scalable interconnection-network for
multiprocessors and massively parallel systems,” 3URF 0DVVLYHO\ 3DUDOOHO
3URFHVVLQJXVLQJ2SWLFDO,QWHUFRQQHFWLRQV0332,ª, Maui, HI, USA, Oct.
27-29, 1996, pp. 247-254.
[Louri and Gupta 1997] A. Louri and R. Gupta, ”Hierarchical optical ring
interconnection (HORN): scalable interconnection network for
multiprocessors and multicomputers," $SSOLHG2SWLFV, vol. 36, no. 2, pp. 430-
442, Jan. 10, 1997.
201
[Maode et al. 1998] M. Maode, B. Hamidzadeh, and M. Hamdi, "Efficient
scheduling algorithms for real-time service on WDM optical networks," 3URF
RI WKH WK ,QWHUQDWLRQDO &RQIHUHQFH RQ &RPSXWHU &RPPXQLFDWLRQV DQG
1HWZRUNV,&1ª, Lafayette, LA, USA, Oct. 12-15, 1998.
[MasPar 1992] “The design of the MasPar MP-2: a cost effective massively
parallel computer,” :KLWHSDSHU0DV3DU&RPSXWHU&RUSRUDWLRQ6XQQ\YDOH
&$86$, 1992.
202
[Mestdagh 1995] D. J. G. Mestdagh, )XQGDPHQWDOV RI 0XOWLDFFHVV 2SWLFDO
)LEHU1HWZRUNV. Artech House, Inc., 1995, ISBN 0-89006-666-3.
203
[Neff 1994] J. A. Neff, ”Optical interconects based on two-dimensional
VCSEL arrays,” 3URF 0DVVLYHO\ 3DUDOOHO 3URFHVVLQJ XVLQJ 2SWLFDO
,QWHUFRQQHFWLRQV 0332,ª Cancun, Mexico, Apr. 26-27, 1994, pp. 202-
212.
204
[Nishikido et al. 1995] J. Nishikido, S. Fujita, Y. Arai, Y. Akahori, S. Hino,
and K. Yamasaki, “Multigigabit multichannel optical interconnection
module for broadband switching system,” -RXUQDORI/LJKWZDYH7HFKQRORJ\,
vol. 13, no. 6, pp. 1104-1110, June 1995.
205
[Okamoto et al. 1992B] K. Okamoto, H. Okazaki, Y. Ohmori, and K. Kato,
”Fabrication of large scale integrated-optic NxN star couplers,” ,(((
3KRWRQLFV7HFKQRORJ\/HWWHUV, vol. 4, no. 9, pp. 1032-1035, Sep. 1992.
206
[Peall 1995] R. G. Peall, "Development in multi-channel optical
interconnects under ESPRIT III SPIBOC," 3URF /(26ª, San Francisco,
CA, USA, Oct. 30 – Nov. 2, 1995, vol. 1, pp. 222-223.
207
[Prucnal et al. 1994] P. R. Prucnal, I. Glesk, and J. P. Sokoloff,
”Demonstration of all-optical self-clocked demultiplexing of TDM data at
250 Gb/s,” 3URF 0DVVLYHO\ 3DUDOOHO 3URFHVVLQJ XVLQJ 2SWLFDO
,QWHUFRQQHFWLRQV 0332,ª, Cancun, Mexico, Apr. 26-27, 1994, pp. 106-
117.
[Prylli and Tourancheau 1998] L. Prylli and B. Tourancheau, " BIP: a new
protocol designed for high performance networking on Myrinet," 3URFRIWKH
VW :RUNVKRS RQ 3HUVRQDO &RPSXWHU EDVHG 1HWZRUNV RI :RUNVWDWLRQV 3&
12:ª, Orlando, FL, USA, Apr. 3, 1998.
[Ray and Jiang 1995] S. Ray and H. Jiang, ”A reconfigurable optical bus
structure for shared memory multiprocessors with improved performance,”
3URF QG ,QWHUQDWLRQDO &RQIHUHQFH RQ 0DVVLYHO\ 3DUDOOHO 3URFHVVLQJ XVLQJ
2SWLFDO ,QWHUFRQQHFWLRQV 0332,ª San Antonio, TX, USA, Oct 23-24,
1995, pp. 108-115.
[Ray and Mukherjee 1995] S. Ray and S. Mukherjee, “On optimal placement
of erasure nodes on a dual bus network,” 3URF,1)2&20ª, Boston, MA,
USA, Apr. 2-6, 1995, pp. 883-890.
[Reif and Yoshida 1994] J. H. Reif and A. Yoshida, ”Free space optical
message routing for high performance parallel computers,” 3URF 0DVVLYHO\
3DUDOOHO 3URFHVVLQJ XVLQJ 2SWLFDO ,QWHUFRQQHFWLRQV 0332,ª Cancun,
Mexico, Apr. 26-27, 1994, pp. 37-44.
208
[Reilly 1994] P. Reilly, “PDH, broadband ISDN, ATM, and all that: a guide
to modern WAN networking, and how it evolved,” 6LOLFRQ *UDSKLFV ,QF.,
1994.
[Rom and Sidi 1990] R. Rom and M. Sidi, 0XOWLSOH $FFHVV 3URWRFROV
3HUIRUPDQFH DQG $QDO\VLV. Springer-Verlag, Inc., 1990, ISBN 0-387-97253-
6.
[Ross 1989] F. E. Ross, "An overview of FDDI: the fiber distributed data
interface," ,(((-RXUQDORQ6HOHFWHG$UHDVLQ&RPPXQLFDWLRQV, vol. 7, no. 7,
pp. 1043-1051, Sept. 1989.
[Sachs and Varma 1996] M. W. Sachs and A. Varma, “Fibre channel and
related standards,” ,(((&RPPXQLFDWLRQV0DJD]LQH no. 8, pp. 40-50, Aug.
1996.
[Sano and Levi 1998] B. J. Sano and A. F. J. Levi, "Networks for the
professional campus environment," in 0XOWLPHGLD 7HFKQRORJ\ IRU
$SSOLFDWLRQV B. Sheu and M. Ismail, Eds., McGraw-Hill, Inc., pp. 413-427,
1998, ISBN 0-7803-1174-4.
209
[Sano et al. 1996] B. Sano, B. Madhavan, and A. F. J. Levi, "8 Gbps CMOS
interface for parallel fiber-optic interconnects," (OHFWURQLFV /HWWHUV, vol. 32,
pp. 2262-2263, 1996.
[Scott 1996] S. Scott, "The GigaRing channel," ,((( 0LFUR, vol. 16, no. 1,
pp. 27-34, Feb. 1996.
[Seitz 1985] C. L. Seitz, “The Cosmic Cube,” &RPPXQLFDWLRQV RI WKH $&0,
vol. 28, no. 1, pp. 22-33, Jan. 1985.
[Seitz and Su 1993] C. L. Seitz and W.-K. Su, "A family of routing and
communication chips based on the Mosaic," 5HVHDUFKRQ,QWHJUDWHG6\VWHPV
3URFHHGLQJVRIWKH6\PSRVLXP, pp. 320-337, 1993.
210
[Semaan and Humblet 1993] G. Semaan and P. Humblet, ”Timing and
dispersion in WDM optical star networks,” 3URF ,1)2&20ª, San
Francisco, CA, USA, 1993, pp. 573-577.
211
[Sivalingam 1994] K. M. Sivalingam, "High-speed communication protocols
for all-optical wavelength division multiplexed computer networks,"
'RFWRUDO WKHVLV 'HSDUWPHQW RI &RPSXWHU 6FLHQFH 6WDWH 8QLYHUVLW\ RI 1HZ
<RUNDW%XIIDOR%XIIDOR1<86$, June 1994.
[Sorel et al. 1996] Y. Sorel, A. O’Hare, J.-F. Kerdiles, and P.-L. François,
”160 Gb/s in 28-nm WDM transmission over 238 km of DSF with standard
fiber compensation,” ,((( 3KRWRQLFV 7HFKQRORJ\ /HWWHUV," vol. 8, no. 5, pp.
727-729, May 1996.
212
[Stenström 1990] P. Stenström, “A survey of cache coherence schemes for
multiprocessors,” &RPSXWHU, vol. 23, no. 6, pp. 12-24, June 1990.
213
[Svensson and Wiberg 1993] B. Svensson and P.-A. Wiberg, “Autonomous
systems demand new computer system architectures and new development
strategies,” 3URF RI WKH WK DQQXDO &RQIHUHQFH RI WKH ,((( ,QGXVWULDO
(OHFWURQLFV6RFLHW\,(&21ª, Maui, Hawaii, USA, Nov. 15-19, 1993, pp.
27-31.
214
[Taveniku et al. 1998] M. Taveniku, A. Ahlander, M. Jonsson, and B.
Svensson, "The VEGA moderately parallel MIMD, moderately parallel
SIMD, architecture for high performance array signal processing," 3URF
WK ,QWHUQDWLRQDO 3DUDOOHO 3URFHVVLQJ 6\PSRVLXP WK 6\PSRVLXP RQ
3DUDOOHO DQG 'LVWULEXWHG 3URFHVVLQJ ,33663'3ª, Orlando, FL, USA,
Mar. 30 - Apr. 3, 1998, pp. 226-232.
215
[Tomaševiü and Milutinoviü 1994B] M. Tomaševiü and V. Milutinoviü,
"Hardware approaches to cache coherence in shared-memory
multiprocessors, Part 2," ,(((0LFUR, vol. 14, no. 5, pp. 61-66, Oct. 1994.
[Tseng and Chen 1982] C.-W. Tseng and B.-U. Chen, "D-Net. A new scheme
for high data rate optical local area networks," 3URF ,((( *OREDO
7HOHFRPPXQLFDWLRQV&RQIHUHQFH*/2%(&20ª, pp. 949-955, 1982.
[Tyan et al. 1996] H.-Y. Tyan, C.-J. Hou, B. Wang, C.-C. Han, "On
supporting time-constrained communications in WDMA-based star-coupled
optical networks," 3URFWK,(((5HDO7LPH6\VWHPV6\PSRVLXP, pp. 175-
184, 1996.
216
[Wang et al. 1997] B. Wang, C.-J. Hou, and C.-C. Han, “On dynamically
establishing and terminating isochronous message streams in WDMA-based
local area lightwave networks,” 3URF,1)2&20ª, Kobe, Japan, Apr. 7-11,
1997, pp. 1261-1269.
[Wong and Yum 1994] P. C. Wong and T.-S. P. Yum, "Design and analysis of
a pipeline ring protocol," ,(((7UDQVDFWLRQVRQFRPPXQLFDWLRQV, vol. 42, no.
2/3/4, pp. 1153-1161, Feb./Mar./Apr. 1994.
217
[Wooten et al. 1996] E. L. Wooten, R. L. Stone, E. W. Miles, and E. M.
Bradley, “Rapidly tunable narrowband wavelength filter using LiNbO3
unbalanced Mach-Zehnder interferometers,” -RXUQDO RI /LJKWZDYH
7HFKQRORJ\, vol. 14, no. 11, pp. 2530-2536, Nov. 1996.
218
[Zaccarin and Kavehrad 1993] D. Zaccarin and M. Kavehrad, ”An optical
CDMA system based on spectral encoding of LED,” ,((( 3KRWRQLFV
7HFKQRORJ\/HWWHUV," vol. 4, no. 4, pp. 479-482, Apr. 1993.
[Zhao et al. 1995] C. Zhao, T.-H. Oh, and R. T. Chen, ”General purpose
bidirectional optical backplane: high-performance bus for multiprocessor
systems,” 3URF QG ,QWHUQDWLRQDO &RQIHUHQFH RQ 0DVVLYHO\ 3DUDOOHO
3URFHVVLQJ XVLQJ 2SWLFDO ,QWHUFRQQHFWLRQV 0332,ª San Antonio, TX,
USA, Oct 23-24, 1995, pp. 188-195.
219
$EEUHYLDWLRQV
ADM Add-Drop Multiplexer
APD Avalanche Photo Diode
ATM Asynchronous Transfer Mode
AWG Arrayed Waveguide Grating
BS Barrier Synchronization
CAN Control Area Network
CBR Constant Bit Rate
CC-FPR Control Channel based Fiber-ribbon Pipeline Ring
CDMA Code Division Multiple Access
CFAR Constant False Alarm Ratio
CM Computation Module
COW Cluster of Workstations
CPI Coherent Processing Interval
CSMA Carrier-Sense Multiple-Access
CSMA/CD CSMA with Collision Detection
DBR Distributed Bragg Region
DFB Distributed FeedBack
DS-SS Direct Sequence Spread Spectrum
DT-WDMA Dynamic Time-Wavelength Division Multi Access
EDF Earliest Deadline First
FDDI Fibre Distributed Data Interface
FDM Frequency Division Multiplexing
FIFO First In First Out
FIG Fiber Imaging Guide
FWHM Full Width Half Maximum
GR Global Reduction
ILP Instruction Level Parallelism
I-TDMA Interleaved TDMA
LED Light Emitting Diode
221
MAC Medium Access Control
MAGIC Multistripe Array Grating Integrated Cavity
MIMD Multiple Instruction streams Multiple Data streams
MSIMD Multiple SIMD arrays
NOW Network of Workstations
PCB Printed Circuit Board
PE Processing Element
PFDM Pulse Frequency Division Multiplexing
PIN P-insulator-N
PON Passive Optical Networks
PVM Parallel Virtual Machine
QoS Quality of Service
RTVC Real-Time Virtual Channels
SAN System Area Network
SCMA SubCarrier Division Multiple Access
SDM Space Division Multiplexing
SI Slot Initiator
SIMD Single Instruction stream Multiple Data streams
SLM Spatial Light Modulator
SPMD Same Program Multiple Data
TDMA Time Division Multiple Access
TTP Time-Triggered Protocol
UBR Unspecified Bit Rate
VBR Variable Bit Rate
VCSEL Vertical Cavity Surface Emitting Laser
WDDI Wavelength Distributed Data Interface
WDM Wavelength Division Multiplexing
WDMA Wavelength Division Multiple Access
222