Paper 09
Paper 09
Software-Defined Radio
To other circuits
CPU
Reference clock PLL Clock ARM946 SRAM DMAC Memory
Memory
16 KB controller
generator AHB BUS
Control Bus
Synchronization
Sample timing
transmission
Pilot symbol
FFT/IFFT
ADC/DAC
recovery
interface
FIR filter
interface
MAC
PSM
RSP
RSP
RSP
FEC
ADC
MAC
DAC
Output
Input
Figure 1
Block diagram of SDR LSI.
controlled by a central processing unit (CPU). calculator, and arctangent tables. These macro
By changing the network configuration, elements are more effective for reducing the
programs of the reconfigurable processors, and latency.
accelerator parameters, the LSI can be The RSP cluster has a sequencer for config-
reconfigured to operate with many wireless uring the PEs and the networks between PEs.
communication systems. This sequencer reduces the number of PEs that
are needed for signal processing. Figures 3 (a),
2.2 RSPs 3 (b), and 3 (c) show an example of cyclic recon-
The RSPs provide the core features of pow- figuration using this sequencer. The data
erful processing and high flexibility. In this work, processing flow in Figure 3 (a) shows the process-
we used a coarse-grained RSP designed by ing flow for four inputs and outputs and eight
Fujitsu. This RSP has the advantages of a processing steps using 30 PEs. Figures 3 (b) and
short latency and an area-effective mapping 3 (c) show the data processing flow for cyclic
architecture.5) reconfiguration. The sequencer alternates oper-
Figure 2 shows the structure of the RSP. To ation between the odd and even sequences. As
reduce the operation latency, the RSP is designed shown in these figures, cyclic reconfiguration
to minimize the physical data transfer delay enables signal processing to be performed using
between each processor element (PE) by dividing half of the PE resources required for ordinary data
a large PE array into small reconfigurable logic processing. Therefore, this architecture makes the
cores called clusters. The RSP also includes mapping more area-effective.
macro elements that contain a divider, square-root
Cluster 1 Cluster 5 PE PE
Sequencer
PE PE
Cluster 2 Cluster 6
PE PE
Cluster 3 Cluster 7
PE PE
Macro
Cluster 4
elements
Networks
Figure 2
Structure of reconfigurable signal processor (RSP).
Sequencer
1 1 1 1
Odd state
Data from networks 3 3 3 3
Cluster 5 5 5 5
1 1 1 1 Even state
7 7 7 7
2 2 2
Sequencer
1/2 1/2 1/2 1/2
3 3 3 3
(a) Data processing flow (b) Flow mapping of PEs (c) PE behavior
Figure 3
Example of cyclic reconfiguration using sequencer.
2.3 Programmable state machine parametric processing needed to cover the mod-
The major wireless communication systems ern mobile-wireless communication systems.
require a state transition unit to control their Table 1 summarizes the functions of the
communications state transitions. To realize a accelerators.
software-defined state flow controller, we imple- The FFT module executes 2n-point FFT and
mented a programmable, scalable state machine inverse FFT, where n is 6 to 13. This covers the
in the SDR LSI. 64 points of IEEE802.11a and the 2048 and 8192
Figure 4 shows the structure of the program- points of IEEE802.16x (WiMAX) and future
mable state machine (PSM). The PSM consists of digital broadcasting standards.
16 state memories, 27 input events, and 27 out- The Viterbi module decodes signals that are
put events. The conditions and flow of state encoded with any set of the three generator-poly-
transitions are defined and programmed in the nomial types (G0, G1, G2) shown in Table 1. The
state memories for each wireless communication constraint lengths can be set to 7 and 9, and the
system. Figure 5 shows an example mapping coding rates are 1/2 and 1/3. These parameter
of the state flow of the IEEE802.11a and ranges are suitable for the IEEE802.11a, 11b,
IEEE802.11b standards. With this mapping, a W-CDMA, and WiMAX standards.
radio system is realized using 9 states and 15 The programmable flip-flop-array module
events. operates as a scrambler/descrambler, CRC circuit,
The PSM has an extensible structure so the or convolution encoder, depending on the array
LSI can control the state transitions of multi-chip combination setting. The FIR module covers up
systems. This feature is described in Section 2.5. to 32 filter taps.
Figure 5
Figure 4 Example mapping of state flow of IEEE802.11a and
Programmable state machine (PSM). IEEE802.11b standards.
Table 1
Function of accelerators.
Function Parameters
FFT/IFFT 2 points. n is 6 to 13.
n
80 0.04
feature enables multi-chip processing so the Optimum point
70
50
power. The I/Os can be expanded to two pairs of 40 0.02
three 16-bit data channels with a maximum trans- 30
fer rate of 4.8 Gb/s. This high transfer rate will 20 0.01
be sufficient for most wireless communication 10
systems. 0 0
0 1 2 3 4 5 6 7 8
The signal networks between each module Number of blocks
in the LSI must have a wide bandwidth. The data
Figure 6
network in the LSI consists of four 16-bit cross-
Calculated crossbar area versus number of crossbar
bar channels with a maximum transfer rate of blocks.
6.4 Gb/s. Because the full-channel crossbar
occupies a large area of the LSI, we divided it into
three blocks based on an optimization calculation the core area, while the other processing circuits,
(Figure 6). including the SRAM, occupy the remaining area.
A photograph of the chip is shown in Figure 7.
3. Specifications and evaluation The chip is mounted on a 1156-pin flip chip ball
board grid array (FCBGA) package. Other specifications
The SDR LSI integrates 774 PEs, which op- of the LSI are summarized in Table 2.
erate at a maximum clock speed of 160 MHz and We constructed an evaluation board for this
a peak performance of 103 GOPS. The control LSI (Figures 8 and 9). The board contains two
CPU operates at 66 MHz, while the accelerators SDR LSIs and three FPGAs. One of the FPGAs is
and other signal processing units operate at a used to interconnect the two SDR LSIs, and the
maximum of 100 MHz. The PEs occupy 75% of other two perform media access control (MAC) and
Cluster Cluster
Cluster Cluster
Cluster Cluster
Cluster Cluster
Analog interface External interface
Tx: Transmitter, Rx: Receiver
RSP: Reconfigurable Signal Processor
Figure 8
Figure 7 Evaluation board.
SDR LSI chip.
Table 2
SDR chip specifications.
CPU ARM946
Internal memory 370 KB
External memory Flash (16 MB), SDRAM (256 MB)
I/O for control GPIO, UART, IRQ interface, control bus
Power supply 1.2 V (I/O: 2.5 V)
ARM: 66 MHz
Clock speed Accelerator circuits: Up to 100 MHz
Reconfigurable signal processors: Up to 160 MHz
Bit width of crossbar data networks 16-bit × 4 channels
Bit width of expansion I/O 16-bit × 3 channels
Package 1156-pin FCBGA
Performance of RSPs Up to 103 GOPS
Number of processing elements 774
FFT/IFFT
Viterbi decoder
Accelerators Scrambler/descrambler
CRC
Convolution encoder
FIR filter
External controller
Data signal Configuration programs
FPGA
for
MAC
Analog signal Analog interface
DAC FPGA
SDR LSI for SDR LSI
ADC connection
Control bus
Flash
SDRAM
memory
Figure 9
Block diagram of evaluation board.
Table 3
Specifications of evaluation board.
Number of SDR LSIs 2
Peak performance 103 GOPS × 2
Number of FPGAs 3
External interfaces 16-bit parallel I/O, RS232C
Power supply 24 V
Board size Width: 35 cm, Length: 25 cm
Download time per wireless system 20 ms
Number of downloadable wireless systems Up to 7
Reconfiguration time 5 ms