0% found this document useful (0 votes)
5 views4 pages

Router Isocc05

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

Router Isocc05

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Design of a High-Performance Scalable CDMA Router

for On-Chip Switched Networks


Daewook Kim, Manho Kim and Gerald E. Sobelman
Department of Electrical and Computer Engineering
University of Minnesota, Minneapolis, MN 55455, USA
{daewook,mhkim,sobelman}@umn.edu

Abstract – Performance results and synthesized area over- nications systems. Recently, some researchers have pro-
head for a code division multiple access (CDMA) router posed various ways of applying CDMA to wired communi-
intended for network-on-chip (NoC) applications are pre- cations environments. References [5, 6] develop a bus inter-
sented. Specific architectural block diagrams of the main face which features multi-bit simultaneous data transmission
components of the router are given and synthesis results are and multi-valued CDMA techniques for higher bandwidth,
provided for 0.18 micron and 0.25 micron structured ASIC but these did not consider a packet-based network approach.
libraries. Post-synthesis VHDL simulations verify the func- The paper of Bell et al [7] proposed a method using pseudo-
tionality of the router and provide values for packet transmis- noise (PN) sequences to route packets between processors in
sion latency and throughput as functions of the payload size. a multi-processor network. However, it used only one large
The router can be used to construct star+star and star+mesh central switching element to perform all of the routing and
network architectures which can be scaled to meet the needs did not consider issues such as buffering, packet contention
of high-performance applications. or the on-chip environment in which some of the resources
may not be generating traffic at certain times.
Keywords: router, CDMA, network-on-chip Our CDMA-based router is developed to address NoC
applications. It can seamlessly handle situations in which
1 Introduction some of the resources do not have any data to send during
a given packet interval. The implementation uses traditional
Future system-on-chip (SoC) designs will have hundreds
binary signaling and includes capabilities for packet buffer-
of intellectual property (IP) blocks on a single chip. Based on
ing and contention resolution. This paper describes the de-
the semiconductor industry association (SIA) road map [1],
tailed register-transfer level design of the components within
by the end of the decade SoCs designed using 50nm technol-
the routers and presents synthesis results to determine its
ogy will have up to 4 billion transistors running at 10GHz
performance and area overhead. Synthesis was performed
[2]. The on-chip communication requirements for these sys-
using the Synplify ASIC 3.3 tool with the Chip Express
tems are very demanding because many or all of those IPs
CX5000 and CX4000 structured ASIC libraries for 0.18µm
need to communicate in the Gbps range. In particular, con-
and 0.25µm technologies, respectively [11]. We show how
cerns related to the interconnections and their delays [4]
the router is easily scalable and how it can be applied to
have given rise to router-based on-chip interconnects, also
other types of NoC topologies. Also, the CDMA router-
known as network-on-chip (NoC) or on-chip switched net-
based star+star and star+mesh on-chip network topologies
work (OCSN) architectures. The design of such scalable and
have a better value for the ratio of the number of resources
modular high performance router-based on-chip networks
to the number of routers compared to the general crossbar
are crucial to the success of this NoC design paradigm.
switch topologies.
NoCs provide a way to overcome the limitations inher-
ent in traditional bus-based interconnection schemes [2, 3].
NoCs can have the following benefits: (a) throughput in- 2 CDMA Router Architecture
crease via high performance switching technology, (b) lower The architecture of the CDMA-based router is presented in
energy dissipation, (c) flexible scalability and (d) design Figure 1. It is composed of seven functional modules: FIFO
reusability. On the other hand, the routers require a certain buffer, Walsh codeword storage, header decoder, scheduler,
amount of overhead. Therefore, it is important to determine modulator, code adder and demodulator. The resources at-
the area overhead that is required so that the cost and perfor- tached to the router can be any type of digital IP modules
mance trade-off between bus-based and NoC-based designs that may be found in an SoC such as a processor, memory,
can be understood. DSP core, controller or other specialized logic blocks.
CDMA techniques are widely used in wireless commu- Data is transferred from one resource to another using a
Walsh
packet format, which is divided into three fields: 3-bit source Scheduler Codewords
Memory
address, 3-bit destination address and parameterizable size DST Req Gnt 8 8 8 8 8 8 8 8

of payload. We assign seven of the eight available non-zero [3]


Header 8 MUXes
Selected Codeword:
01100110
Decoder DST

8-bit Walsh codewords to seven attached IP resources, re- [3]

01100110
1
DST
Gnt
serving the all-zero codeword for use in the case where no

...
...

...

...
[3]
8

data is to be sent. Larger routers can be constructed using SRC[3]


0
0
1
0/1

... ...
... ...

... ...
... ...
0

larger codeword sets. For example, if we use 16-bit Walsh DST[3] 1


1
0 Code
0

codewords, then the number of IP resources attached to each 0 Adder

... ...

... ...
1
0
1

... ...

... ...
router can be extended up to 15. The router can support a PAYLOAD[16]
1
1
0

10011001
1 1

large aggregate data throughput because all resources can 0

...

...
1

...
...
1

simultaneously send packets to their destination due to the 1


0
1
8

0/1

orthogonality of the Walsh codewords. Our CDMA router BUFFER

is modular and reusable, and this leads to a very regular


and predictable communications infrastructure which can be Figure 2: Modulator block diagram.
used in a wide range of design applications.
uses the decision factor λ, which is a modified form of the
IP 1 IP 2
calculation in [7] to make our algorithm work with Walsh
HD
BUFF BUFF
HD
codes.
The mathematical equations and algorithms necessary for
Scheduler
Walsh M
O
DE
M
DE
M
M
O
demodulation are summarized below and in Table 2.
Codes O O
D D
D D
HD (
(2S[i] − L) if codeword[i] is 0
D[i] =
HD
MOD BUFF

IP 3 (1)
BUFF MOD
DEMOD
(−2S[i] + L) if codeword[i] is 1
IP 7 DEMOD Code Adder
DEMOD
IP 4 L−1
MOD BUFF
D[i]
HD
λ= ∑ L
(2)
M
DE
M
DE
M
M i=0
O O
O O
D D
D D

• L is the codeword length.


BUFF BUFF
HD HD

• D[i] is the decision variable.


IP 6 IP 5
• λ is the decision factor.
Figure 1: CDMA router architecture.

Table 2: Demodulation algorithm


2.1 Modulator
Decision Factor(λ) Demodulated Data[bit]
The grant signal from the header decoder causes a packet
+1 1
in the buffer to be forwarded to the Modulator (MOD), where -1 0
the corresponding Walsh codeword is selected. The code- 0 No data sent
word is modulated with each bit of the packet in a parallel
fashion via MUXes as shown in Figure 2. The modulation al-
gorithm is described in Table 1, which illustrates how the as-
signed destination-oriented codeword is modulated in terms 3 Scalable CDMA-based NoC
of the original data. The router is easily scalable by simply using a longer size
of Walsh codewords. If we use 16-bit and 32-bit Walsh
codewords, then the number of IP resources attached to each
Table 1: Modulation algorithm router can be extended easily up to 15 and 31, respectively.
Data Codeword Assignment Moreover, as shown in Figure 3 and Figure 4, one CDMA
0 Codeword itself router with its attached IP blocks can be hierarchically ex-
1 Inverted codeword tended to construct larger NoC topologies. Figure 3 shows
No data All-zero codeword the hierarchical star+star on-chip topology that interconnect
each group of local CDMA routers with IP blocks through
a CDMA central router, using a specific packet format. In
2.2 Demodulator this case, a group field to distinguish each local switch group
The demodulator (DEMOD) recovers the original data should be included in the packet format for the simultaneous
from the summation value produced by the code adder. It packet transmission via the central router. Figure 4 shows
an alternative topology in which each router group is inter- timal estimated frequency to avoid negative slack, total num-
connected using a mesh structure. This hybrid topology has ber of gates and the area for the various payload size from
a better resource to router ratio compared to a pure mesh 8 bits to 128 bits. The optimal estimated frequency range of
topology NoC [3]. Table 3 summarizes the overall compari- the 0.25µm and 0.18µm technologies are around 50MHz and
son to other types of the on-chip topologies in terms of hop 94MHz, respectively.
count, resource to router ratio, routing overhead, wire length
overhead and wiring complexity.
Table 4: Synthesis area report for 0.25µm
R1 R2 R3 R8 R9 R10 R15 R16 R17

0.25 µm ChipExpress cx4001 structured ASIC library


CDMA CDMA CDMA
R7 Local
Router
R4 R14
Local
Router
R11 R21
Local
Router
R18 Optimal Optimal Cell Usage
1 2 3
Payload Estimated Estimated Gate Area
R6 R5 R13 R12 R20 R19
[bits] Frequency Period Count [µm2 ]
8 50.0 MHz 20.005 ns 17838 39304.2
R43 R44 R45 R22 R23 R24 16 49.4 MHz 20.223 ns 27153 59780.5
CDMA CDMA CDMA
32 50.0 MHz 20.005 ns 44314 99480.8
Local
R49 Local
Router
Central Router
R25
64 50.0 MHz 20.000 ns 81266 181060.0
7 Router 4

128 50.0 MHz 19.997 ns 160906 354877.0


R48 R47 R46 R28 R27 R26

R35 R37 R29 R30

CDMA CDMA
Table 6 shows detailed area overhead of each CDMA NoC
Local Local
R42
Router
6
R38 R35
Router
5
R31
router component within a CDMA NoC router for the case
R41 R40 R39 R34 R33 R32
of an 8-bit payload size in 0.25µm technology. From the
result, the things we should note are: First, we can see the
demodulation part has the largest area overhead amongst all
Figure 3: Scalable CDMA star+star topology.
of the components. This is expected due to the computational
complexity of that part of the algorithm. Second, the area of
R1 R2 R3 R8 R9 R10 R15 R16 R17 the 0.18µm technology is 34% on average smaller than the
CDMA
Router
CDMA CDMA
area of 0.25µm technology while the gate counts are almost
R4 R5 R11 Router R12 R18 Router R19
1 2 3
the same.
R6 R7 R13 R14 R20 R21

R22 R23 R24 R29 R30 R36 R37 R38


Table 5: Synthesis area report for 0.18µm
R31 R32

R25
CDMA
Router
CDMA
Router
CDMA
Router R39 0.18 µm ChipExpress cx5000 structured ASIC library
4 5 6
R33
Optimal Optimal Cell Usage
R26 R27 R28 R34 R35 R40 R41 R42 Payload Estimated Estimated Gate Area
[bits] Frequency Period Count [µm2 ]
R43 R44 R50 R51 R57 R58 8 94.9 MHz 10.539 ns 19416 26390.0
16 94.7 MHz 10.560 ns 29113 39736.0
CDMA CDMA CDMA
R45 Router
7
R46 R52 Router
8
R53 R59 Router
9
R60 32 94.2 MHz 10.615 ns 47754 65746.0
64 93.8 MHz 10.660 ns 86912 119000.0
R47 R48 R49 R54 R55 R56 R61 R62 R63
128 93.4 MHz 10.706 ns 167740 2284480.0

Figure 4: Scalable CDMA star+mesh topology.


While the area overhead of modulator, buffer, code adder
and demodulator increase as the packet size increases, the
4 Synthesis Results and Analysis scheduler and the header decoder keep the same area over-
In this section we give the implementation results for our head regardless of packet size, since these are only functions
CDMA router. We have used the Chip Express CX4000 and of the number of attached IP resources.
CX5000 structured ASIC libraries for 0.25µm and 0.18µm Figure 5 shows the maximum aggregate throughput values
technologies, respectively, and synthesized with the CAD of a 7-port CDMA NoC router for different payload sizes in
tool Synplify ASIC 3.3. The architecture overhead is de- 0.25µm and 0.18µm technologies, respectively. The through-
termined in terms of gate count and area. The design was put values, in bits per second, can be computed for each
successfully simulated and verified with the ModelSim simu- payload size by forming the following product: (7 IP re-
lator using the post-synthesis netlist and randomly generated sources)*(number of payload bits)*(clock frequency). (Note
traffic patterns. The results provide values for the aggregate that we assume that the interconnections between the IP re-
throughput and the packet transmission latency as a function sources and the router are parallel paths whose width is equal
of packet size, which are the performance metrics of interest. to the total packet length). Therefore, we arrive at through-
The synthesis results for both 0.25µm and 0.18µm tech- put values up to 44.8Gbps and 83.7Gbps for the case of 128-
nologies are presented in Table 4 and Table 5. We list the op- bit payload size in 0.25µm and 0.18µm technologies, respec-
Table 3: Performance comparison of the different network-on-chip topologies
Sum of Sum of Min. Max. RSC/ Routing Wire Wiring
Topology Resources Routers Hop Hop RTR Complexity Length Complexity
(RSC) (RTR) Count Count Ratio Overhead [8] Overhead [8] Overhead [8]
Mesh [3] 64 64 2 15 1 Medium Low Low
Fat-tree [9] 64 48 1 3 0.75 High High High
Butterfly Fat-tree [10] 64 28 1 5 0.44 High High High
CDMA Star+Star 49 8 1 3 0.16 Low Low Low
CDMA Star+Mesh 63 9 1 5 0.14 Medium Low Low

tively. In addition, simulation results in the same value for NoC approach. In addition, we intend to investigate the use
packet latency of 160 ns since each bit of the packet is trans- of more sophisticated scheduling algorithms.
mitted in parallel.
6 Acknowledgments
Table 6: Components area overhead We thank Euiseok Kim, Sangwoo Rhim and Bumhak Lee
of Samsung advanced institute of technology (SAIT) for
Components area overhead of 8 bits payload case their help with this manuscript. This research work is sup-
in 0.25 µm technology ported by a grant from SAIT.
Components Area [µm2 ]
Modulator (MOD) 3930.4
Buffer (BUFF) 6681.7 References
Header Decoder (HD) 432.4 [1] “International Technology Roadmap for Semiconductors”,
Scheduler (SCHE) 786.1
Code Adder (CA) 2358.2
http://public.itrs.net/.
Demodulator (DEMOD) 24761.6
[2] L. Benini and G. D. Micheli, “Networks on chip: a new
Others 353.8 paradigm for systems on chip design,” Design, Automation
TOTAL 39304.2 and Test in Europe Conf., pp. 418–419, 2002.
[3] S. Kumar, A. Jatsch, J.-P. Soininen, M. Forsell, M. Millberg,
J. Öberg, K. Tiensyrjä and A. Hemani, “A network on chip ar-
chitecture and design methodology,” IEEE Computer Society
Annual Symposium on VLSI (ISVLSI), pp. 105–112, 2002.
90 [4] R. Ho, K. W. Wai and M. A. Horowizs, “The future of wires,”
83.686
80 in proc. of the IEEE, pp. 490–504, 2001.
70 [5] R. Yoshimura, T. B. Keat, T. Ogawa, S. Hatanaka, T. Mat-
Throughput [Gbps]

60 suoka and K. Taniguchi, “DS-CDMA wired bus with simple


50
44.8
cx5000 0.18 µm interconnection topology for parallel processing system LSIs,
42.022 cx4001 0.25 µm
40 IEEE International Solid-State Circuits Conference, pp. 370–
30
371, Feb. 2000.
22.4
20 21.1
[6] Y. Yuminaka, O. Katoh, Y. Sasaki, T. afumi Aoki and
10
5.314
10.606
5.532
11.2
T. Higuchi, “An efficient data transmission technique for
2.8
0 VLSI systems based on multiple-valued code-division multi-
8 16 32 64 128

Packet size
ple access, 30th IEEE International Symposium on Multiple-
Payload size[bits]
[bits]
Valued Logic (ISMVL 2000), pp. 430–437, May 2000.
[7] R. H. Bell, Jr., C. Y. Kang, L. John and E. E. Swartzlander,
Jr., “CDMA as a multiprocessor interconnect strategy,” 35th
Figure 5: Throughput performance [Gbps]. IEEE Asilomar Conference on Signals, Systems and Comput-
ers, Volume 2, pp. 1246–1250, Nov. 2001.
[8] D. Wiklund and D. Liu, “Design of a system-on-chip
5 Conclusions switched network and its design support,” IEEE 2002 Inter-
national Conference on Communications, Circuits and Sys-
We have presented the detailed design and synthesis re-
tems and West Sino Expositions, Volume 2, pp. 1279–1283,
sults of a CDMA-based NoC router architecture. The system
July 2002.
has been modeled with VHDL and successfully synthesized [9] P. Guerrier and A. Greiner, “A generic architecture for on-
using 0.18µm and 0.25µm structured CMOS ASIC technolo- chip packet-switched interconnections,” IEEE Proceedings of
gies. The CAD tool Synplify ASIC 3.3 was used to per- the Design, Automation and Test in Europe Conference and
form the synthesis, and gate count and area results have been Exhibition, pp. 250-256, March 2000.
tabulated. Simulation results verify the function of the pro- [10] C. Grecu, P. P. Pande, A. Ivanov and R. Saleh, “Structured
posed CDMA algorithm and provide values for the aggregate interconnect architecture: a solution for the non-scalability of
throughput for various payload sizes as well as the packet bus-based SoCs,” IEEE Proceedings of the 14th ACM Great
Lakes symposium on VLSI, pp. 192-195, April 2004.
transmission latency. In our future work, we plan to model [11] http://www.chipexpress.com/Pages/products.asp.
an MPEG-4 system to determine its performance using this

You might also like