VLSI Architectures For Iterative Decoders in Magnetic Recording Channels
VLSI Architectures For Iterative Decoders in Magnetic Recording Channels
2, MARCH 2001
II. INTERLEAVER
The randomness of the interleaver output sequence makes Fig. 5. Add-compare-select unit for an iterator (either forward or backward)
it difficult to realize in-place storage. A direct interleaver 9
using the (.) operator as indicated within the box.
implementation uses two banks of buffers alternating between
read/write for consecutive sectors of data (Fig. 4). The latency 3) Depending on position (inner/outer) of decoder, the
through an interleaver is therefore equal to the block size. required a posteriori probability, is either
The basic block interleaver design uses a minimal amount of or respectively.
control logic. Using static random-access memory (SRAM) for
high-speed implementation, the interleaver inputs are written
row-wise in the memory array, while outputs are read column-
wise. For a block interleaver of size arranged as an by
matrix, such that , this assures that bits located
within a distance of before interleaving are separated (4)
by a minimum distance of after interleaving. The sequential
write/read pattern along rows/columns allows the memory ac- The structures for both forward and backward iterations are
cess operations of this interleaver to make use of cycle counters identical, and similar to the Add–Compare–Select units used in
to activate both word (row) lines and bit (column) lines, thereby Viterbi decoders. Thus only the forward iterator (Fig. 5) will
eliminating the necessity to perform memory-address decoding. be described. The current branch metrics ( ) are added to the
More sophisticated interleaver designs [8], [9] yield improved corresponding state metrics ( ) from the previous iteration:
error rate performance, but result in increased implementation
complexity. Therefore, the implementation of the described
(5)
basic interleaver provides a lower limit on complexity.
The logarithm of the sum of exponentials is then evaluated with
III. MAP DECODER a new operator, . It uses a comparator, a lookup table, and
A MAP decoder implements the BCJR [2] algorithm. It is a final adder (Fig. 5) to approximate the second term in the
used to obtain the a posteriori information for partial response equation [10]:
channel decoding, as well as outer decoding when a convolu-
tional code is employed as the outer code. Given the prior prob-
(6)
abilities, , and channel likelihood estimates, , the
The forward/backward iteration structures are therefore termed
log-domain computations of the BCJR algorithm are divided
the Add-Compare-Select-Add (ACSA) units. A number of
into three groups:
operators are also used in the computation of the a posteriori
1) Branch metric computation for each branch between values using a tree structure shown in Fig. 8.
states to : To implement the original BCJR algorithm, the backward it-
eration can only begin after complete observation of the block
of 4k bits, resulting in large memory requirements and long la-
(1)
tencies. Variations of the BCJR algorithm avoid these effects by
2) Forward/Backward iteration for each state, , assuming windowing or limiting the number of backward iteration steps.
a radix-2 trellis: Forward state metric; valid transitions are
A. Backward Propagation of Windowed BCJR
( ), ( ):
An implementation of windowed BCJR with asymptotically
(2) equivalent performance can be achieved using two overlapping
windows for the -computation. Each window spans a width of
Backward state metric; valid transitions are ( ), , and overlaps with the other window in both trellis position
( ): and time by steps, as shown in Fig. 6. The initial outputs are
always discarded while the latter outputs, having satisfied a
criterion for minimum number of steps, , through the trellis,
(3)
are retained and eventually combined with the appropriate
750 IEEE TRANSACTIONS ON MAGNETICS, VOL. 37, NO. 2, MARCH 2001
MSB of all inputs. The result is fed into the output LUT to direct
1) Check to bit messaging (parity check):
an output with the appropriate sign. In addition, the final
lookup table could be precoded to account for the deterministic
term, , in (7).
TABLE I
COMPUTATIONAL UNITS AND MEMORY REQUIREMENTS FOR ITERATIVE DECODER MODULES
proposed implementation, which uses a small number of com- opportunity for reduced complexity implementations. Since
putational units: 16 adders and 8 LUT’s. decisions become increasingly confident after each stage,
However, the lack of any structural regularity in the parity decoders that are later in the pipeline can trade off some BER
check matrix results in memory requirements that are 2 performance for reduced complexities. A number of choices are
orders of magnitude larger than those in the MAP or SOVA available, ranging from replacing MAP decoders with SOVA
decoders. It was shown by example in Section V-C that a single decoders or using shorter window lengths to trellis pruning in
LDPC iteration would have a memory requirement upwards of the trellis-based decoders [15].
73 000 words. To make an LDPC decoder implementation more The immediate difficulty with LDPC decoders lies in the
feasible, it will be necessary to introduce regularity into the memory requirement, which should be addressed by designing
parity check matrix. Recent publications, [13], [14], suggesting structured LDPC codes. Without removing the memory bot-
the construction of LDPC-like codes based on difference-set tleneck, further reduced-complexity LDPC decoding, such as
cyclic codes may provide the necessary foundation for building approximating the summations in (7) and (9) with minimum
a practical LDPC decoder with reduced memory requirement. and maximum functions respectively, would have little effect
The memory problem is not restricted to LDPC decoders. In- on the overall decoder implementation.
terleavers, which are necessary between concatenated convolu-
tion decoders, also require significant memory due to the ran-
REFERENCES
domness of the output sequences. Interleavers that allow some
form of ordered permutation and compact representation will [1] T. Souvignier, M. Oberg, P. Siegel, R. Swanson, and J. Wolf, “Turbo
decoding for partial response channels,” IEEE Trans. Commun., vol. 48,
permit efficient implementations of Turbo decoders with no per- no. 8, Aug. 2000.
formance loss. [2] L. R. Bahl, J. Cocke, F. Jelinek, and J. Rajiv, “Optimal decoding of linear
Finally, an iterative decoder implementation for magnetic codes for minimizing symbol error rate,” IEEE Trans. Inform. Theory,
vol. IT-20, pp. 284–287, Mar. 1974.
storage application requires timing recovery methods that can [3] J. Hagenauer and L. Papke, “Decoding turbo codes with the soft output
tolerate the increased latencies through multiple decoding viterbi algorithm (SOVA),” in Proc. IEEE ISIT 1994, Trondheim,
iterations. Norway, June 1994, p. 164.
[4] R. G. Gallager, “Low density parity check codes,” IRE Trans. Inform.
Theory, vol. IT-8, pp. 21–28, Jan. 1962.
VII. CONCLUSION [5] J. Fan and J. Cioffi, “Constrained coding techniques for soft iterative
decoders,” in Proc. GLOBECOM ’99, vol. 16, Rio de Janeiro, Brazil,
We have proposed datapath-intensive architectures as well as Dec. 1999, pp. 723–727.
timing and data arrangement schedules for each kind of SISO [6] G. Masera, G. Piccinini, M. Roch, and M. Zamboni, “VLSI architectures
for turbo codes,” IEEE Trans. VLSI Systems, vol. 7, no. 3, Sept. 1999.
decoder in order to minimize the critical path delay and simplify [7] Y. Wu and B. Woerner, “The influence of quantization and fixed point
the control logic. arithmetic upon the BER performance of turbo codes,” in Proc IEEE
Unrolling and pipelining of iterative decoders is necessary VTC 1999, Houston, TX, USA, May 1999, pp. 1683–1687.
[8] S. Dolinar and D. Divsalar, “Weight distributions for turbo codes using
to sustain high throughputs, but leads to a linear increase in random and nonrandom permutations,” JPL, TDA Progress Rep., Aug.
implementation complexity; however, it provides an excellent 1995.
YEO et al.: VLSI ARCHITECTURES FOR ITERATIVE DECODERS IN MAGNETIC RECORDING CHANNELS 755
[9] K. Andrews, C. Heegard, and D. Kozen, “Interleaver design methods [13] D. J. C. Mackay and M. C. Davey, “Evaluation of gallager codes for
for turbo codes,” in Proc. IEEE ISIT 1998, Cambridge, MA, USA, Aug. short block length and high rate applications,” in Proc. IMA Workshop on
1998, p. 420. Codes, Systems and Graphical Models 1999, Minneapolis, MN, USA,
[10] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and Aug. 1999.
sub-optimal MAP decoding algorithms operating in the log domain,” in [14] Y. Kou, S. Lin, and M. P. C. Fossorier, “Low density parity check codes
Proc. IEEE ICC 1995, Seattle, WA, USA, Jun. 1995, pp. 1009–1013. based on finite geometries: A rediscovery,” in Proc. IEEE ISIT 2000,
[11] A. Viterbi, “An intuitive justification and a simplified implementation Sorrento, Italy, Jun. 2000.
of the MAP decoder for convolutional codes,” IEEE J. Select. Areas [15] B. Frey and F. Kschischang, “Early detection and trellis splicing: Re-
Commun., vol. 16, no. 2, pp. 260–264, Feb. 1998. duced-complexity iterative decoding,” IEEE J. Select. Areas Commun.,
[12] C. Berrou, P. Adde, E. Angui, and S. Faudeil, “A low complexity soft- vol. 16, no. 2, pp. 153–159, Feb. 1999.
output viterbi decoder architecture,” in Proc. IEEE ICC 1993, Geneva,
Switzerland, May 1993, pp. 737–740.