1981 IEEE A Survey On Interconnection Networks
1981 IEEE A Survey On Interconnection Networks
A Survey of
Interconnecfion
Networks
Tse-yun Feng
The Ohio State University
Design decisions
~~~~~~~~~~~~~~In operation
selecting architecture of interconnection
work, fur
cern
the decisions
design can bean net-
identified.iO They con-
mode, control strategy, switching method,
i
Figure 1. Ai overview of concurrent processing systems. and network topology.
December 1981 13
Figure 4. Examples of static network toplogies: (a) one dimensional; (b-f) two dimensional; and (g-j) three dimensional.
14 COM PUTER
Static. Topologies in the static category can be by replacing each node of the 3-cube by a 3-node cycle.
classified according to dimensions required for layout Each node in the cycle is connected to the corresponding
-specifically, one-dimensional, two-dimensional, three- node in another cycle.
dimensional, and hypercube as shown in Figure 3. Ex-
amples of one-dimensional topologies include the linear Dynamic. There are three topological classes in the
array used for some pipeline architectures (Figure 4a).13 dynamic category: single-stage, multistage, and crossbar
Two-dimensional topologies include the ring,14"15 star,'6 (see Figure 5).
tree,17 near-neighbor mesh,'8 and systolic array.13 Ex- Single-stage. A single-stage network is composed of a
amples are shown in Figure 4b-f. Three-dimensional stage of switching elements cascaded to a link connection
topologies include the completely connected,'9 chordal pattern. The shuffle-exchange network23 is a single-stage
ring,20 3-cube,21 and 3-cube-connected-cycle22 networks network based on a perfect-shuffle connection cascaded
depicted in Figure 4g-j. A D-dimensional, W-wide hyper- to a stage of switching elements as shown in Figure 5a.
cube contains Wnodes in each dimension, and there is a The single-stage network is also called a recirculating net-
connection to a node in each dimension. The near-neigh- work because data items may have to recirculate through
bor mesh and the 3-cube are actually two- and three- the single stage several times before reaching their final
dimensional hypercubes, respectively. The cube-con- destination.
nected-cycle is a deviation of the hypercube. For example, Multistage. A multistage network consists of more
the 3-cube-connected-cycle shown in Figure 4j is obtained than one stage of switching elements and is usually capa-
Figure 5. Examples of dynamic network topologies: (a) single stage; (b-i) multistage; and (J) crossbar. (Cont'd on p. 16.)
December 1981 15
Figure 5 (cont'd from p.15). Examples of multistage and crossbar (j) dynamic network topologies.
16 COMPUTER
ble of connecting an arbitrary input terminal to an ar- Routing techniques. The routing techniques depend on
bitrary output terminal. Multistage networks can be one- the network topology and the operation mode used. More
sided or two-sided. The one-sided networks, sometimes or less, each multiple-processor system needs a routing
called full switches, have input-output ports on the same algorithm. Here, we use several well-defined routing
side. The two-sided multistage networks, which usually algorithms for examples.
have an input side and an output side, can be divided into Near-neighbor mesh. Bitonic sort has been adapted by
three classes: blocking, rearrangeable, and nonblocking. several authors45-47 for the routing of an n x n mesh-
In blocking networks, simultaneous connections of connected, single instruction-multiple data stream sys-
more than one terminal pair may result in conflicts in the tem. The procedure developed by Nassimi47 is as follows:
use of network communication links. Examples of this
type of network, which has been extensively investigated, Procedure SORT (n,n)
include data manipulator,24 baseline,25'26 SW banyan,27 1) K-S-1
omega,28 flip,29 indirect binary n-cube,30 and delta.31 A 2) While K < n do
topological equivalence relationship has been established a) consider the n x n processor array as com-
for this class of networks in terms of the baseline net- posed of many adjacent K x 2K subarrays
work.25'26 A data manipulator and a baseline network are b) do in parallel for each K x 2K array
shown in Figure Sb and 5c. HORIZONTAL_MERGE(K, 2K)
A network is called a rearrangeable nonblocking net- c) S S + 1 -
work if it can perform all possible connections between d) Consider the n x n processor array as
inputs and outputs by rearranging its existing connections composed of many adjacent 2K x 2K
so that a connection path for a new input-output pair can subarrays
always be established. A well-defined network, the Benes e) do in parallel for each 2K x 2K subarray
network12 shown in Figure 5d, belongs to this class. The VERTICAL_MERGE(2K, 2K)
Benes rearrangeable network topology has been exten- f) S - S + 1;K 2*K -
generalized-connection network topology is generated to PURCHASE PLAN * 12-24 MONTH FULL OWNERSHIP PLAN * 36 MONTH LEASE PLAN
pass any of the NNmapping of inputs onto outputs where DESCRIPTION
PURCHASE
PRICE
PER MONTH
12 MOS. 24 MOS. 36 MOS.
Nis the number of inputs or outputs (see Figure 5f). In a LA36 DECwriter l ............$1,095 $105 $ 58 S 40
LA34 DECwriterlV ....... ..... 995 95 53 36
one-sided network (or full switch), one-to-one connec- LA34 DECwriter IV Forms Ctrl. 1,095 105
LA120 DECwriter III KSR ....... 2,295 220 122
58 40
83
tion is possible between all pairs of terminals.40'41 A LA120 DECwriter IIRO ........ 2,095 200 112 75
VT100 CRT DECscope ......... 1,695 162 90 61
cellular implementation, a base-line topology construc- VT1O1 CRT DECscope ...... ... 1,195 115 67 43
tion, and a Clos construction are shown in Figure 5g-i. VT125 CRT Graphics ...... .... 3,295 315 185 119
VT131 CRT DECscope ...... ... 1,745 167 98 63
Crossbar. In a crossbar switch every input port can be VT132 CRT DECscope ...... ... 1,995 190 106 72
VT18XAC Personal Computer Option 2,495 240 140 90
connected to a free output port without blocking. Figure T1745 Portable Terminal . 1,595 153 85 58
5j shows a schematic which is similar to one used in T1765 Bubble Memor Terminal 2,595 249 138 93
TI Insight 10 Terminal .695 67 37 25
C.mmp.42 A crossbar switch called a versatile line manip- T1785 Portable KSR, 120CPS. 2,395 230 128
T1787 Portable KSR, 120 CPS ... 2,845 273 152 102
86
ulator has also been designed and implemented.43'44 T1810 RO Printer
T1820 KSR Printer
.1,695 162
.2,195 211 117
90 61
80
ADM3A CRT Terminal .595 57 34 22
ADM5 CRT Terminal .645 62 36 24
ADM32 CRT Terminal . 1,165 112 65 42
Communication protocols ADM42 CRT Terminal . 1,995 190 106 72
DT80/1 CRT Terminal. 1,695 162 90 61
I IN DT80/3 CRT Terminal. 1,295 125 70 48
The switching methodology and the control strategy DT80/5L APL 15" CRT . 2,295 220 122 83
are implemented in switching elements (or switching __ N 920 CRT Terminal .895
950 CRT Terminal .1,075
86
103
48
57
32
39
points) according to required communication protocols. Letter Quality, 7715 RO ........ 2,895
Letter Quality, 7725 KSR. 3,295
278
316
154
175
104
119
The communication protocols can be viewed on two lev- 2030 KSR Printer 30 CPS . 1,195 115 67 43
els. The first level concerns switching control algorithms 2120 KSR Printer 120 CPS 2,195 211 117 80
Executive 80/20 .1,345 127 75 49
which generate necessary control settings on switching Executive 80/30 .1,695 162 90 61
elements to ensure reliable data routings from source to ^ 1
__ *~
^ j i
MX-80
s~~~~~v
F/T Printer.
4n n-.:_--
745
onn a
71
&
42 27
I'lle
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
14 12 5 7 15 8 9 13 4 3 10 6 1 0 2 11
Figure 8. Sorting with shuffle-exchange. (Adapted from The Art of Computer Programming, Vol. 3: Sorting and Searching by D. E. Knuth;
Addison-Wesley, Reading, Mass., © 1973.)
December 1981 19
ly equivalent blocking multistage networks.25 Basically, cascaded matrix whose left part and right part are L and
two types of routing are available: recursive routing and R, respectively. Also let V (n - I)( b) be the 2n - I bit vector
destination tag routing.25'28'51 The recursive routing whose components are all equal to b. The control pattern
algorithm determines the control pattern according to K(n) of the flip function can then be expressed in terms of
permutation names. For some permutation, useful in par- the following recursive formula:
allel processing, the control pattern can be calculated re-
cursively on the fly as the data pass through the network. K(n) (Fwh ) = [ V(n - I) (ko); K(n - I) (Fer))e
Six categories of such permutations have been identified.
For our purpose, we describe one here and show the recur- where
sive routing algorithm. The flip permutation function29 is
described as follows: K(l)(Fln)) [V(n-')(k)].
=
p can be described by
F (3): p (Xr(D 4) = X.
Accordingly, we have
K(3) (F43)) = [V2(0); V(2)(o); V(2)(1)].
Hence
0 l
K(3)(p) = [ 0 1
0 l
O I .
COMPUTER
plexity in terms of parallel processing technique,53 switching element can be set by a control line into a direct-
heuristic method,37 or recursive. formula.35 Here, we connection or crossed-connection state. Assume that 1o,
demonstrate the very basic routing algorithm, called the Ij, 00, 01, and C represent the two inputs, the two out-
looping algorithm. The basic principle, in terms of the puts, and the switching element control. The switching
permutation to be realized by the Benes binary network element's output function can be expressed as follows:
shown in Figure 5d, is
00 = CIo + C * I1, and
(0 1 2 3 4 5 6 7
p = 01 = ClI + C .I
3 7 4 0 2 6 1 5
where C= 0 means straight connection and C = I crossed
The loop algorithm starts recording the permutation, connection (see Figure 14).
p, as shown in Figure 12. The two output numbers of a
switching element in the output stage are shown in the
same column, and the two input numbers of a switching
element in the input stage are shown in the same row. We
then choose an arbitrary entry in the chart as a starting
point. For example, electing to start at row 23 and column
01, we then look for a same-row or column entry to form a
loop and, in Figure 12, choose row 23 and column 45. The
process continues until we obtain a loop by re-entering
row 23 and column 01. The loop's member entries are
then assigned "a" and "b" alternately. The second loop
can be formed in the same way. Then, we assign input and
output lines named "a" to subnetwork a and those named
"b" to subnetwork b. The control of the input and output
switching elements must be set as depicted in Figure 13.
This looping algorithm can be applied recursively to the
two subnetworks.
Figure 11. Distributed routing on a baseline network. Figure 14. A 2 x 2 switching element.
December 1981 21
Dimond 2 x 2 switching element. A switching element Cacko COO + Co,
=
with two input and output ports, called Dimond for dual
interconnection modular network device,58 allows modu- Cack, = C1o + Cl
lar construction of interconnection networks. A packet of Cross = COO + Ci I
messages (containing routing information) arriving at a
Dimond is switched to a designated output port, where it Fillo = COO + Cl0;
is stored in a register. Figure 15 shows an implementation Fill, = Co, + C11
of Dimond which requires one control clock for all inter-
connected switching elements. The central clock has two where prio is the priority line indicating the index (0,1) of
phases. In the first clock phase, it is determined which in- the input served first in the event of conflict, and deso and
puts have to be copied into which registers. The copy des, are destination lines for ino and inI. In the second
allowances so determined are stored in four flip-flops: clock phase, two actions are performed concurrently. In-
COO, CO,, C1O, and C11 (Co, is the allowance for copying puts are copied into the output register, if required, and
ino into regl). In addition, output signals of copy the status flip-flops, stato and stat1 (status of rego and
acknowledgments (cacko and cackl), internal control reg1, respectively) are adapted. Precisely, we have the
signals (crosso, fillo, and fill,) are generated. More following:
precisely, we have the following:
= creqo des0 stato (creq1 +
Fillo - rego = cross ino + cross in ;
COO des, +,prio);
COI = creqO deso stat1 (creql + des, + prio); Fill, -reg1 = cross inO + cross in ;
C10 = creq, -XI stato (creqo + deso + prio); Fillo relo - stato= I
relo - stato = 0
Cl = creq1 des, stat1 (creqo + des0 + prio);
Fill, rell - stat = I1
rell - stat1 = 0
infao = stato
infal = stat1
The interconnection of two Dimonds is shown in Figure
16, which depicts the relation of handshaking lines.
64 x 64 switching element. A centralized-control and
circuit-switching 64 x 64 versatile data manipulator1 l (see
Figure 17) is operating in conjunction with the Staran
computer at the Rome Air Development Center.44 The
data manipulator operates under the control of the Staran
computer's parallel input-output unit. The contents of
the input and output masks, of the address control regis-
ter, and of the input and output control registers, as well
as the data to be manipulated, are entered via the 256-bit
wide PIO buffer interface. The manipulated data leave
Figure 15. A 2 x 2 dual interconnecting modular network device- the data manipulator via the same interface. The data
Dimond58-for packet switching.
manipulator's instruction repertoire allows one to load
the various address registers and masks and to start and
stop data manipulation. Self-test is performed by loading
address and input-data registers, allowing verification of
correct operation without assistance from the Staran
computer. There are 64 x 64 cells in the basic crossbar cir-
cuit. The output gate of cell (i,j) is controlled by the ith
address control register through a decoder. The decoder
has 64 outputs to control the 64 output gates in a basic-
crossbar-circuit row.
22 COMPUTER
Figure 17. Block diagram of a versatile data manipulator.
December 1981 23
instructions and multiple data streams and hence is called Combinatorial capability. In array processing, data are
an MIMD processor. Examples of the MIMD architec- often stored in parallel memory modules in skewed forms
ture include HEP,60 data flow processor,61 and flow that allow a vector of data to be fetched without
model processor.62 A configuration of MIMD architec- conflict .63-65 However, the fetched data must be realigned
ture62 is shown in Figure 19. The N processing elements in prescribed order before they can be sent to individual
are connected to the M memory modules by an intercon- PEs for processing. This alignment is implemented by
nection network. The activities are coordinated by the permutation functions of the interconnection network,
coordinator. Unlike the control unit in an array pro- which also realigns data generated by individual PEs into
cessor, the coordinator does not execute object code; it skewed form for storage in the memory modules.
only implements the synchronization of processes and In the computer architecture project, one should ques-
smooths out the execution sequence. Again, the compiler tion whether the interconnection network chosen can ef-
must be designed to partition a computation task and ficiently perform the alignment. The rearrangeable net-
assign each piece to individual processing elements. Ef- work and the nonblocking network can realize every per-
fective partitioning and assignment are essential for effi- mutation function, but using these networks for align-
cient multiprocessing. The criterion is to match memory ment requires considerable effort to calculate control set-
bandwidth with the processor processing load, and the in- tings. A recursive routing mechanism has been provided
terconnection network is a critical factor in this matching. for a few families of permutations needed for parallel pro-
Below, we address some problems and results regarding cessing35; however, the problem remains for the realization
the role of the interconnection network in concurrent pro- of general permutations. Many articles45,51'66,67 concen-
cessing. trating on the permutation capabilities of single-stage net-
24 COMPUTER
works and blocking multistage networks have shown that Bandwidth of interconnection networks. The band-
these networks cannot realize arbitrary permutations in a width can be defined as the expected number of requests
single pass. Recent results show that the baseline network accepted per unit time. Since the bus system cannot pro-
can realize arbitrary permutations in just two passes51 vide sufficient bandwidth for a large-scale multiprocessor
while other blocking multistage networks, such as the system and the crossbar switch is too expensive, it is par-
omega network, need at least three passes.66 As men- ticularly interesting to know what kind of bandwidth
tioned previously, the shuffle-exchange network can various interconnection networks can provide.
realize arbitrary permutations in 3(1og2N) - 1 passes The analytic method has been used to estimate band-
where Nis the network size.48 width.31'71'72 However, one cannot obtain a closed-form
solution, and the analytic model is sometimes too simpli-
fied. Just for example, one result showed that for a block-
Task assignments and reconfiguration. Consider a ing multistage interconnection network of size 256 x 256,
parallel program segment using M memory modules and the bandwidth is 77 requests (per memory cycle) and for a
N processing elements. During execution, data is usually crossbar switch of the same size, the bandwidth is 162.
transferred from memory modules to processing elements However, the crossbar costs about 20 times as much as the
or vice versa. It is also necessary to transfer data among multistage network, and with buffering (packet switch-
processing elements for data sharing and synchroniza- ing), the performance of the multistage network is quite
tion. Simultaneous data transfers through the intercon- comparable to the crossbar switch.71
nection network, which implements the transfers, may re- Numerical simulation, also used to estimate the band-
sult in contention for communication links and switching width,71 can simulate actual PE connection requests by
elements. In case of conflict, some of the data transfers analyzing the program to be executed. The access con-
must be deferred; consequently, throughput decreases flicts in the network and memory modules can be detected
because the processing elements which need the deferred as shown by Wu and Feng.25 Using the simulation meth-
data cannot proceed as originally expected. To minimize od, Barnes73 concluded that the baseline network is more
delays caused by communication conflicts, program than adequate to support connection needs of a proposed
codes must be assigned to proper processing elements and MIMD system which can execute one billion floating-
data assigned to proper memory modules. The assign- point instructions per second.
ment of data to memory modules, called mapping,68 has
recently been extended to include assignment of program
modules to processing elements.69 Reliability. Reliable operation of interconnection net-
A configuration concept has been proposed to better works is important to overall system performance. The
use the interconnection network.51 Under this concept, a reliability issue can be thought of as two problems: fault
network is just a configuration of another one in the diagnosis and fault tolerance. The fault-diagnosis prob-
same, topologically equivalent class.25 To configure a lem has been studied for a class of multistage interconnec-
permutation function as an interconnection network, we tion networks constructed of switching elements with two
can assign input/output link names in a way that realizes valid states.74 The problem is approached by generating
the permutation function in one conflict-free pass. The suitable fault-detection and fault-location test sets for
problem of assigning logical names that realize various every fault in the assumed fault model. The test sets are
permutation functions without conflicts is called a recon- then trimmed to a mimimal or nearly minimal set. Detect-
figuration problem. It has been shown that, through the ing a single fault (link fault or switching-element fault) re-
reconfiguration process, the baseline network can realize quires only four tests, which are independent of network
every permutation in one pass without conflicts.69 This size. The number of tests for locating single faults and
implies that concurrent processing throughput could be detecting multiple faults are also workable.
enhanced by proper assignment of tasks to processing The second reliability problem mainly concerns the
elements and data to memory modules. degree of fault tolerance.75 It is important to design a net-
work that combines full connection capability with grace-
ful degradation-in spite of the existence of faults. U
Partitioning. In partitioning-that is, dividing the net-
work into independent subnetworks of different sizes-
each subnetwork must have all the interconnection capa-
bilities of a complete network of the same type and size. Acknowledgment
Hence, with a partitionable network, a system can sup-
port multiple SIMD machines. By dynamically recon- The author wishes to acknowledge the original con-
figuring the system into independent SIMD machines and tribution of Dr. C. Wu in preparing this article.
properly assigning tasks to each partition, we can use
resources more efficiently.
Several authors have noted the importance of parti-
tioning.30 One recent study70 shows that single-stage net- References
works, such as the shuffle-exchange and Illiac networks,
cannot be partitioned into independent subnetworks, but 1. T. Feng, editor's introduction, special issue on parallel
blocking multistage networks, such as the baseline and processors and processing, Computing Surveys, Vol. 9,
data manipulator, can be partitioned. No. 1, Mar. 1977, pp. 1-2.
December 1981 25
2. C. V. Ramamoorthy, T. Krishnarao, and P. Jahanian, 23. H. S. Stone, "Parallel Processing with the Perfect
"Hardware Software Issues in Multi-Microprocessor Shuffle," IEEE Trans. Computers, Vol. C-20, No.2, Feb.
Computer Architecture," Proc. First Annual Rocky 1971, pp. 153-161.
Mountain Symp. Microcomputers, 1977, pp. 235-261. 24. T. Feng, Parallel Processing Characteristics and Im-
3. K. J. Thurber, "Interconnection Networks-A Survey and plementation of Data Manipulating Functions, Rome Air
Assessment," AFIPS Conf. Proc., Vol. 43, 1974 NCC, pp. Development Center report, RADC-TR-73-189, July
909-919. 1973.
4. K. J. Thurber, "Circuit Switching Technology: A State-of- 25. C. Wu and T. Feng, "On a Class of Multistage Intercon-
the-Art Survey," Proc. Compcon Fall 1978, Sept. 1978, nection Networks," IEEE Trans. Computers, Vol. C-29,
pp. 116-124. No. 8, Aug. 1980, pp. 694-702.
5. K. J. Thurber and G. M. Masson, Distributed-Processor 26. C. Wu and T. Feng, "On a Distributed-Processor Com-
Communication Architecture, Lexington Books, Lex- munication Architecture," Proc. Compcon Fall 1980, pp.
ington, Mass., 1979, 252 pp. 599-605.
6. H. J. Siegel, "Interconnection Networks for SIMD Ma- 27. L. R. Goke and G. J. Lipovski, "Banyan Networks for
chines," Computer, Vol. 12, No. 6, Junc 1979, pp. 57-66. Partitioning Multiprocessing Systems," Proc. First An-
7. H. J. Siegel, R. J. McMillen, and P. T. Mueller, Jr., "A nual Computer Architecture Conf., Dec. 1973, pp. 21-28.
Survey of Interconnection Methods for Reconfigurable 28. D. H. Lawrie, "Access and Alignment of Data in an Array
Parallel Processing Systems," AFIPS Conf. Proc., Vol. Processor," IEEE Trans. Computers, Vol. C-24, No. 12,
48, 1979 NCC, pp. 387-400. Dec. 1975, pp. 1145-1155.
8. G. M. Masson, G. C. Gingher, and Shinji Nakamura, "A 29. K. E. Batcher, "The Flip Network in STARAN," Proc.
Sampler of Circuit Switching Networks," Computer, Vol. 1976Int'l Conf. ParallelProcessing, Aug. 1976, pp.65-71.
12, No. 6, June 1979, pp. 32-48.
30. M. C. Pease, "The Indirect Binary n-Cube Microprocessor
9. T. Feng and C. Wu, Interconnection Networks in Multiple- Array," IEEE Trans. Computers, Vol. C-26, No. 5, May
Processor Systems, Rome Air Development Center report, 1977, pp. 548-573.
RADC-TR-79-304, Dec. 1979, 244 pp.
31. J. H. Patel, "Processor-Memory Interconnections for
10. C. Wu and T. Feng, "A VLSI Interconnection Network Multiprocessors," Proc. Sixth Annual Symp. Computer
for Multiprocessor Systems," Digest Compcon Spring Architecture, Apr. 1979, pp. 168-177.
1981, pp. 294-298.
32. A. Waksman, "A Permutation Network," J. ACM, Vol.
11. T. Feng, "Data Manipulating Functions in Parallel 9, No. 1, Jan. 1968, pp. 159-163.
Processors and Their Implementations," IEEE Trans.
Computers, Vol. C-23, No. 3, Mar. 1974, pp. 309-318. 33. A. E. Joel, Jr., "On Permutation Switching Networks,"
B.S.T.J., Vol. 67, 1968, pp. 813-822.
12. V. Benes, Mathematical Theory of Connecting Networks,
Academic Press, N.Y., 1965. 34. D. C. Opferman and N. T. Tsao-Wu, "On a Class of Rear-
13. H. T. Kung, "The Structure of Parallel Algorithms," in rangeable Switching Networks-Part I: Control Algo-
Advances in Computers, Vol. 19, M. C. Yovits, ed., rithm; Part II: Enumeration Studies of Fault Diagnosis,"
Academic Press, N.Y., 1980. B.S. T.J., 1971, pp. 1579-1618.
14. D. J. Farber and K. C. Larson, "The System Architecture 35. J. Lenfant, "Parallel Permutations of Data: A Benes Net-
of the Distributed Computer System-the Communica- work Control Algorithm for Frequently Used Permuta-
tions System," Proc. Symp. Computer Comm. Networks tions," IEEE Trans. Computers, Vol. C-27, No. 7, July
and Teletraffic, Brooklyn Polytechnic Press, Apr. 1972, 1978, pp. 637-647.
pp. 21-27. 36. T. Feng, C. Wu, and D. P. Agrawal, "A Microprocessor-
15. C. C. Reames and M. T. Liu, "A Loop Network for Simul- Controlled Asynchronous Circuit Switching Network,"
taneous Transmission of Variable Length Messages," Proc. Sixth Annual Symp. Computer Architecture, 1979,
Proc. Second Symp. Computer Architecture, Jan. 1975, pp. 202-215.
pp. 7-12.
37. Y-C. Chow, R. D. Dixon, T. Feng, and C. Wu, "Routing
16. S. I. Saffer et al., "NODAS-The Net Oriented Data Techniques for Rearrangeable Interconnection Networks,"
Acquisition System for the Medical Environment," AFIPS Proc. Workshop on Interconnection Networks, Apr. 1980,
Conf. Proc., Vol. 46, 1977 NCC, pp. 295-300. pp. 64-69.
17. J. A. Harris and D. R. Smith, "Hierarchical Multi- 38. C. Clos, "A Study of Nonblocking Switching Networks,"
processor Organization," Proc. Fourth Symp. Computer Bell System Tech. J., Vol. 32, 1953, pp. 406-424.
Architecture, Mar. 1977, pp. 41-48.
18. G. H. Barnes et al., "The Illiac IV Computer," IEEE 39. C. D. Thompson,'"Generalized Connection Networks for
Trans. Computers, Vol. C-17, No. 8, Aug. 1968, pp. Parallel Processor Intercommunication," IEEE Trans.
746-757. Computers, C-27, No. 12, Dec. 1978, pp. 1119-1125.
19. E. M. Aupperle, "MERIT Computer Network: Hardware 40. J. Gecsei, "Interconnection Networks from Three-State
Considerations," in Computer Networks, R. Rustin, ed., Cells," IEEE Trans. Computers, Vol. C-26, No. 8, Aug.
Prentice-Hall, Englewood Cliffs, N.J., 1972, pp. 49-63. 1977, pp. 705-711.
20. B. W. Arden and H. Lee, "Analysis of Chordal Ring Net- 41. Y-C. Chow, R. D. Dixon, and T. Feng, "An Interconnec-
work," IEEE Trans. Computers, Vol. C-30, No. 4, April tion Network for Processor Communication with Opti-
1981, pp. 291-295. mized Local Connections," Proc. 1980 Int'l Conf. Parallel
Processing, Aug. 1980, pp. 65-74.
21. H. Sullivan, T. R. Bashkow, and K. Klappholz, "A Large
Scale Homogeneous, Fully Distributed Parallel Machine," 42. W. A. Wulf and C. G. Bell, "C.mmp-A Multimicropro-
Proc. Fourth Symp. Computer Architecture, Nov. 1977, cessor," AFIPS Conf. Proc., Vol. 41, 1972 FJCC, pp.
pp. 105-125. 765-777.
22. F. P. Preparata and J. Vuillemin, "The Cube-Connected 43. T. Feng, The Design of a Versatile Line Manipulator,
Cycles: A Versatile Network for Parallel Computation," Rome Air Development Center report, RADC-TR-73-292,
Comm. ACM, Vol. 24, No. 5, May 1981, pp. 300-309. Sept. 1973.
26 COMPUTER
44. W. W. Gaertner, Design, Construction,and Installation of 65. D. H. Lawrie and C. Vora, "The Prime Memory System
Data Manipulator, Rome Air Development Center report, for Array Access," Proc. 1980 Int'l Conf. Parallel Pro-
RADC-TR-77-166, May 1977, 80 pp. cessing, pp. 81-87.
45. S. E. Orcutt, "Implementation of Permutations Functions 66. A. Shimer and S. Ruhman, "Toward a Generalization of
in an Illiac IV-Type Computer," IEEE Trans. Computers, Two- and Three-Pass Multistage, Blocking Interconnec-
Vol. C-25, No. 9, Sept. 1976, pp. 929-936. tion Networks," Proc. 1980 Int'l Conf. Parallel Process-
46. C. D. Thompson and H. T. Kung, "Sorting on a Mesh- ing, pp. 337-346.
Connected Parallel Computer," Comm. ACM, Vol.20, 67. T. Lang and H. S. Stone, "A Shuffle-Exchange Network
No. 4, Apr. 1977, pp. 263-271. with Simplified Control," IEEE Trans. Computers, Vol.
47. D. Nassimi and S. Sahni, "Bitonic Sort on a Mesh- C-25, No. 6, Jan. 1976, pp. 55-65.
Connected Parallel Computer," IEEE Trans. Computers, 68. H. T. Kung and D. Stevenson, "A Software Technique for
Vol. C-28, No. 1, Jan. 1979, pp. 2-7. Reducing the Routing Time on a Parallel Computer with a
Fixed Interconnection Network," in High Speed Com-
48. C. Wu and T. Feng, "Universality of the Shuffle-Exchange puterandAlgorithm Organization, Academic Press, N.Y.,
Network," IEEE Trans. Computers, Vol. C-30, No. 5, 1977, pp. 423-433.
May 1981.
49. D. E. Knuth, The Art of Computer Programming, Vol. 3: 69. C. Wu and T. Feng, "A Software Technique for Enhanc-
Sorting and Searching, Addison-Wesley, Reading, Mass., ing Performance of a Distributed Computer System,"
1973. Proc. Compsac 80, Oct. 1980, pp. 274-280.
50. R. J. McMillen and H. J. Siegel, "MIMD Machine Com- 70. H. J. Siegel, "The Theory Underlying the Partitioning of
munication Using the Augmented Data Manipulator Net- Permutation Networks," IEEE Trans. Computers, Vol.
work," Proc. Seventh Symp. Computer Architecture, C-29, No. 9, Sept. 1980, pp. 791-801.
June 1980, pp. 51-58. 71. D. M. Dias and J. R. Jump, "Analysis and Simulation of
51. C. Wu and T. Feng, "The Reverse-Exchange Interconnec- Buffered Delta Networks," IEEE Trans. Computers, Vol.
tion Network," IEEE Trans. Computers, Vol. C-29, No. C-30, No. 4, Apr. 1981, pp. 273-282.
9, Sept. 1980, pp. 801-811; also Proc. 1979 Int'l Conf. 72. D. A. Padua, D. J. Kuck, and D. H. Lawrie, "High-Speed
Parallel Processing, pp. 160-174. Multiprocessors and Compilation Techniques," IEEE
52. S. Anderson, "The Looping Algorithm Extended to Base Trans. Computers, Vol. C-29, No. 9, Sept. 1980, pp.
2t Rearrangeable Switching Networks," IEEE Trans. 763-776.
Comm., Vol. COM-25, No. 10, Oct. 1977, pp. 1057-1063. 73. G. H. Barnes, "Design and Validation of a Connection
53. G. Lev, N. Pippenger, and L. G. Valiant, "A Fast Parallel Network foi Many-Processor Multiprocessing Systems,"
Algorithm for Routing in Permutation Networks," IEEE Proc. 1980 lnt'l Conf. Parallel Processing, pp. 79-80.
Trans. Computers, Vol. C-30, No. 2, Feb. 1981, pp. 74. C. Wu and T. Feng, "Fault Diagnosis for a Class of
93-100. Multistage Interconnection Networks," Proc. 1979 Int'l
54. D. H. Lawrie, Memory-Processor Conneciton Networks, Conf. Parallel Processing, pp. 269-278.
UIUCDCS-R-73-557, University of Ilinois, Urbana, Feb. 75. J. P. Shen and J. P. Hayes, "Fault Tolerance of a Class of
1973. Connecting Networks," Proc. Seventh Symp. Computer
55. Numerical Aerodynamic Simulation Facility Feasibility Architecture, 1980, pp. 61-71.
Study, Burroughs Corporation, Mar. 1979.
56. U. V. Premkuma, R. Kapur, M. Malek, G. J. Lipovski,
and P. Horne, "Design and Implementation of the Banyan
Interconnection Network in TRAC, " AFIPS Conf. Proc.,
Vol. 49, 1980 NCC, pp. 643-653.
57. M. A. Franklin, "VLSI Performance Comparison of Ban-
yan and Crossbar Communication Networks," IEEE
Trans. Computers, Vol. C-30, No. 4, Apr. 1981, pp.
283-290.
58. P. G. Jansen and J. L. W. Kessels, "The DIMOND: A
Component for the Modular Construction of Switching
Networks," IEEE Trans. Computers, Vol. C-29, No. 10, Tse-yun Feng is a professor in the Depart-
Oct. 1980, pp. 884-889. ment of Computer and Information Sci-
ence, Ohio State University, Columbus.
59. D. J. Kuck, "A Survey of Parallel Machine Organization Previously, he was on the faculty at Wayne
and Programming," Computing Surveys, Vol. 9, No. 1, State University, Detroit, and Syracuse
Mar. 1977, pp. 29-59. Also in Proc. 1975 Sagamore Com- University, New York. He has extensive
puter Conf. Parallel Processing, pp. 15-39. technical publications in the areas of asso-
60. B. J. Smith, "A Pipelined, Shared Resource MIMD Com- ciative processing, parallel and concurrent
puter," Proc. 1978 Int'l Conf. Parallel Processing, pp. 6-8. processors, computer architecture, switch-
ing theory, and logic design, and has re-
61. J. B. Dennis, "Data Flow Supercomputers," Computer, ceived a number of awards for his technical contributions and
Vol. 13, No. 11, Nov. 1980, pp. 48-56. scholarship.
62. S. F. Lundstrom and G. Barnes, "A Controllable MIMD A past president of the IEEE Computer Society (1979-80),
Architecture, " Proc. 1980 Int'l Conf. Parallel Processing, Feng was a distinguished visitor (1973-78), and has served as a
pp. 19-27. reviewer, panelist, or session chairman for various technical
magazines and conferences. He also initiated the Sagamore
63. D. J. Kuck, "ILLIAC IV Software and Application Pro- Computer Conference on Parallel Processing and the Interna-
gramming," IEEE Trans. on Computers, Vol. C-17, No. tional Conference on Parallel Processing.
8, Aug. 1968, pp. 758-770. He received the BS degree from the National Taiwan Universi-
64. K. E. Batcher, "The Multi-Dimensional Access Memory in ty, Taipei, the MS degree from Oklahoma State University,
STARAN," IEEE Trans. Computers, Vol. C-26, No. 2, Stillwater, and the PhD degree from the University of Michigan,
Feb. 1977, pp. 174-177. Ann Arbor, all in electrical engineering.
December 1981 27