802.11a Transmitter: A Case Study in Microarchitectural Exploration
802.11a Transmitter: A Case Study in Microarchitectural Exploration
return(final[63:0]);
endfunction
4. The Pipelined IFFT
This new organization makes no change to the represented
hardware. After the compiler unrolls the for-loops and does
constant propagation the result is the exact same gate struc-
ture as ifftA. At a high level, pipelining is simply partitioning a task
Here we can see another level of regularity. This time it into a sequence of smaller sub-tasks which can be done in
lies across all of the stages. We can rewrite the rule as: parallel. We can start processing the next chunk of data
function ifftC(Vector#(64,Complex#(16)) x); before we finish processing the previous chunk. Generally,
the stages of a pipeline operate in lockstep (see Figure 5).
//compute following constants at compile time
twid[2:0][47:0] = ... ;
permute[2:0][63:0] = ... ;
function f (stage,sx);
let sxt = case (stage)
In a folded pipeline, since the same hardware is used 0: return fu0(sx);
for conceptually different stages, we often need some extra 1: return fu1(sx);
state elements and muxes to choose the appropriate combi- 2: return fu2(sx);
national logic. For example it is common to have a stage endcase;
return fs(sxt);
counter and associated control logic to remember where endfunction
the data is in the pipeline. The code for an n-way folded
pipeline such as shown in Figure 6 may be written as fol- where fs(sx) represents the shared logic among the three
lows: stages (Figure 8). A compiler may be able to do this level of
6. Further Hardware Reuse: Super-Folded
Pipeline IFFT
Figure 10. Performance of 802.11a Transmitters for Various Implementations of the IFFT Block
a RAM-based implementation uses less area, it requires a like to thank Hadar Agam of Bluespec Inc., for her invalu-
significantly higher clock-speed, and thus is less power- able help in getting power estimates using Sequence Design
efficient overall. PowerTheatre.
9. Conclusions References
In this paper we explored various microarchitectures of [1] Arvind, R. S. Nikhil, D. L. Rosenband, and N. Dave. High-
an IFFT block, the critical resource-intensive module in an level Synthesis: An Essential Ingredient for Designing Com-
802.11a transmitter. We demonstrated how languages with plex ASICs. In Proceedings of ICCAD’04, San Jose, CA,
powerful static elaboration capabilities can result in hard- 2004.
ware descriptions which are both more concise and more [2] E. Czeck, R. Nanavati, and J. Stoy. Reliable design with mul-
general. We used such generalized descriptions to explore tiple clock domains. In Proceedings of Formal Methods and
the physical properties of a wide variety of microarchitec- Models for Codesign (MEMOCODE), 2006.
[3] J. C. Hoe and Arvind. Synthesis of Operation-Centric Hard-
tures early in the design process. We argue that such high-
ware Descriptions. In Proceedings of ICCAD’00, pages 511–
level language capabilities are essential if future architec- 518, San Jose, CA, 2000.
tural decisions are to be based on empirical evidence rather [4] IEEE. IEEE standard 802.11a supplement. Wireless LAN
than designer intuition. Medium Access Control (MAC) and Physical Layer (PHY)
All the folded designs were generated from the same Specifications, 1999.
source description, only varying the input parameters. Even [5] K. Maharatna, E. Grass, and U. Jagdhold. A 64-Point Fourier
the other versions share a lot of common structure, such as Transform Chip for High-Speed Wireless LAN Application
the bfly4 definition and the representation of complex num- Using OFDM. IEEE JOURNAL OF SOLID-STATE CIR-
bers and operators. This has a big implication for verifica- CUITS, 39(3), March 2004.
[6] D. L. Rosenband and Arvind. Modular Scheduling of
tion, because instead of verifying seven designs, we had to
Guarded Atomic Actions. In Proceedings of DAC’04, San
verify only three, and even these three leveraged submod-
Diego, CA, 2004.
ules which had been unit-tested independently. [7] B. S. Son, B. G. Jo, M. H. Sunwoo, and Y. S. Kim. A High-
The six Pareto optimal designs we generated during ex- Speed FFT Processor for OFDM Systems. In Proceedings of
ploration provide some good intuition into the area-power the IEEE International Symposium on Circuits and Systems,
tradeoff possible in our design. To reduce our the area of pages 281–284, 2002.
our initial (combinational) design by 20% (the folded de- [8] N. Zhang and R. W. Brodersen. Architectural evaluation of
sign), we increase our power usage by 75%. This tradeoff flexible digital signal processing for wireless receivers. In
becomes less costly as we further reduce the design; if we Proceedings of the Asilomar Conference on Signals, Systems
wish to reduce the size by 70%, we increase the power us- and Computers, pages 78–83, 2000.
age by 760%.
In the future we wish to apply this methodology to more
complex designs, such as the H.264 video decoder blocks.
Such designs would have many more critical design blocks,
and better emphasize the benefits of out methodology.