0% found this document useful (0 votes)
46 views8 pages

An Improved Pipelined MSB-First Add-Compare Select Unit Structure For Viterbi Decoders

virtribbri

Uploaded by

Darshan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views8 pages

An Improved Pipelined MSB-First Add-Compare Select Unit Structure For Viterbi Decoders

virtribbri

Uploaded by

Darshan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

504 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO.

3, MARCH 2004

An Improved Pipelined MSB-First Add-Compare


Select Unit Structure for Viterbi Decoders
Keshab K. Parhi, Fellow, IEEE

Abstract—Convolutional codes are widely used in many commu- The survivor path memory unit (SMU) processes the deci-
nication systems due to their excellent error-control performance. sions made in the BMU and ACS and outputs the decoded data.
High-speed Viterbi decoders for convolutional codes are of great The BMU and the SMU are only composed of feedforward
interest for high-data-rate applications. In this paper, an improved
most-significant-bit (MSB) -first bit-level pipelined add-compare paths. It is relatively easy to shorten the critical path in these two
select (ACS) unit structure is proposed. The ACS unit is the main units by utilizing pipelining and parallel processing techniques.
bottleneck on the decoding speed of a Viterbi decoder. By balancing However, the feedback loop in the ACS unit is the major bottle-
the settling time of different paths in the ACS unit, the length of the neck for the design of a high-speed Viterbi decoder. Although
critical path is reduced as close as possible to the iteration bound it is possible to implement the high-speed ACS unit with par-
in the ACS unit. With the proposed retimed structure, it is possible
to decrease the critical path of the ACS unit by 12% to 15% com- allel processing techniques [2], it is still desirable to reduce the
pared with the conventional MSB-first structures. This reduction length of the critical path in the ACS unit. The reduction of the
in critical path can reduce the level of parallelism (and area) re- critical path can significantly lower the level of parallelism and
quired for a very high-speed Viterbi decoder. the hardware complexity for a specified speed.
Index Terms—Add compare select, MSB first, pipelining, redun- Several structures have been proposed to speed up the compu-
dant arithmetic, retiming, Viterbi decoder. tation of ACS unit. In [2] and [3], the look-ahead technique was
utilized to speed up the ACS unit. With the -step look ahead,
one iteration in the ACS unit is equivalent to iterations in the
I. INTRODUCTION
nonlook-ahead implementation. Thus, the speed requirements

T HE Viterbi algorithm (VA) was first introduced as a


method for convolutional decoding in 1967 [1]. A Viterbi
decoder is composed of three basic computation units as shown
on the ACS units for a given decoding data rate are reduced by
times. The trellis corresponding to 2-step look-ahead in a
4-state Viterbi decoder is shown in Fig. 2(b). Compared with
in Fig. 1. The branch metric unit (BMU) calculates the branch Fig. 2(a), the number of branch metrics in the trellis increases
metrics , which is the metric of the branch from state exponentially as increases linearly. However, this exponen-
to state at time instance . The branch metrics are fed into tial complexity can be reduced to linear hardware complexity.
the add-compare select (ACS) unit to calculate the state metrics Least significant bit (LSB) first computation is useful for ac-
for state . At time instance , the state metrics cumulation operation, but most significant bit (MSB) first com-
can be computed recursively using putation is more suitable for compare and selection operations
in the ACS unit. An ACS structure combining MSB-first com-
pare-select with carry-propagation-free addition was proposed
based on redundant number representation in [4].
(1)
To reduce the complexity in maximum selection operation,
a code converter (CC) was introduced for removing the coding
A state metrics is the maximum of intermediate state metrics redundancy for the digit “1” in the carry-save digit [5].
(ISM) , which are the summations of the branch met- The circuit of CC and its truth table are summarized in Fig. 3. In
rics , and the state metrics , at the previous time Fig. 3, and are the sum and carry bits with the same weight,
instance. The trellis representation of the computation is illus- respectively. One redundant representation of digital “1” (
trated in Fig. 2(a). In this paper, the state metrics are the max- and ) is removed at the output of the CC. After the
imum ones among the intermediate state metrics. In some ap- CC, the maximum selection operation reduces to a bit-wise OR
plications, they can be the minimum ones. The ACS units for operation.
maximum selection and minimum selection have similar struc- It was also discovered that comparing the local intermediate
tures. The improvement and conclusion on the maximum selec- metric with the maximum one of all other ISMs is more area
tion ACS unit in this paper can also be applied to the minimum efficient than pairwise comparison [6], [7].
selection ACS unit. This paper is organized as follows. In Section II, more precise
estimates of the critical path and iteration bound of the ACS unit
Manuscript received June 6, 2003; revised October 29, 2003. This paper was are obtained. Based on the critical path analysis, it is feasible to
recommended by Associate Editor Y. Wang. achieve even shorter critical path by moving the retiming cutset
The author is with the Department of Electrical and Computer Engi- to balance the settling time of different paths. The improvement
neering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail:
[email protected]). on the critical path by applying a novel retiming design is dis-
Digital Object Identifier 10.1109/TCSI.2004.823657 cussed in Section III. With the proposed approach, for a 4-state
1057-7122/04$20.00 © 2004 IEEE
PARHI: IMPROVED PIPELINED MSB-FIRST ACS UNIT STRUCTURE 505

Fig. 1. Basic computation units in Viterbi decoder.

Fig. 2. Trellis diagram of a 4-state Viterbi decoder. (a) One computation step in a 4-state Viterbi decoder. Each state has two branches connected to other states.
(b) One computation step in a 4-state Viterbi decoder with 2-step look-ahead. There are four branches connected to each state. For simplicity, only those branches
pointing to state 0 are shown in the figure.

compared with pairwise comparisons [6], [7]. The results of the


comparisons are fed back to the adders.
The bit-level circuits in an ACS unit are illustrated in Fig. 5.
For simplicity, in this figure, only functional units of one ISM in
one state are shown. Due to the similarity of the circuits, without
loss of generality, in this paper, we only analyze the circuits for
one intermediate state metric of state 0, , in a 4-state Viterbi
decoder.
The iteration bound can be obtained as the bound of the
loop shown by the dashed line in Fig. 5 [8], i.e.,

(2)
Fig. 3. Circuit and function of a CC. (a) Circuit of a CC. (b) Truth table of a
CC.
where , and are the computational latencies of a
full adder, a CC, and a maximum selection (MS) unit, respec-
Viterbi decoder, the critical path can be shortened by about 15%. tively. The critical path of the circuit is shown in Fig. 5 as
For an 8-state Viterbi decoder, it is possible to increase the speed a dotted line
by about 13%.
(3)
II. CRITICAL PATH OF BIT-LEVEL PARALLEL ACS UNIT
where is the word length used in calculating the state met-
An MSB-first ACS unit can be further divided into 3 basic rics. In this example, the word length equals 8. The circuit
function blocks as shown in Fig. 4. The three basic function of a MS unit is more complex than a full adder or a CC. To re-
blocks are adders, CCs, and maximum selection (MS) units. duce the length of the critical path and increase the speed of the
The adders compute the summation of the state metrics and the computation, the number of MS units in the critical path must
branch metrics, i.e., the intermediate state metrics. The CCs re- be reduced.
move a redundant state in the sum-carry representations of the After applying the retiming cutsets every 2 bits as shown in
ISM, as illustrated in Fig. 3. In the MS units, the local ISM Fig. 6 [5], [6], the length of the critical path is reduced to
is compared with the maximum one of all other ISMs for the
same state. The comparison scheme can achieve higher locality (4)
506 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 3, MARCH 2004

Fig. 4. Function blocks in MSB-first ACS unit.

Fig. 5. Bit-wise structure of MSB-first ACS unit.

Fig. 6. Two-bit-level retimed version of the ACS structure. The critical path is significantly reduced and is no longerproportional to the word-length B . The
cutsets are shown with dashed lines and the critical path is shown with dotted line.

Although the numbers of full adders and CCs in the critical path different ISMs. The output of the pm-block is the maximum one
are doubled, the number of the MS units is significantly reduced. of all other ISMs. With the signals from the pm-block ( and
At the same time, the length of the critical path is independent ), the CC ( and ) and the d-block in the last more signif-
of the word length used to compute the state metrics. icant bit ( and ), m-block computes the maximum value
To reduce the critical path by novel retiming, the detailed among all of the intermediate state metrics and feeds it back to
structures of the MS units in the bit-level ACS structure must be the full adder in the ACS unit as the state metric . The
explored. The structure of the MS unit is shown in Fig. 7, which state metric at the output of m-block is expressed in the form of
compares the local ISM with the maximum one among all other sum and carry, ( , ). The inputs of the d-block
ISMs of the same state [6]. It can be found that the MS unit can include the signals from the pm-block, the CC, and the d-block
be further divided into three blocks, from left to right: the max- in the circuits of last more significant bit. The outputs of the
imum value block (m-block), the decision block (d-block), and d-block are sent to the MS unit of the next LSB.
the partial maximum block (pm-block). The pm-block uses the The MS unit, according to Fig. 7, can be logically divided
signals from other bit-level ACS structure for the same state with into two subunits (the m-subunit and the d-subunit, as shown
PARHI: IMPROVED PIPELINED MSB-FIRST ACS UNIT STRUCTURE 507

Fig. 7. Circuit of an MS unit for a 2-step look-ahead 4-state Viterbi decoder.

If we redraw the bit-level structure of the ACS unit, as shown


in Fig. 9(a), it is clear that the critical path is actually shorter
than the one determined by either (4) or (5). Although the critical
path passes through the MS unit twice, it passes the m-subunit
and the d-subunit separately, i.e., the critical path can be more
precisely computed as

(6)

Similarly, the iteration bound of the ACS unit can be calculated


as

(7)

III. NOVEL PIPELINED AND RETIMED ACS UNIT


A. Improvement of Retiming Cutsets
In Section II, the precise estimate of the critical path of the
Fig. 8. Block diagram of an MS unit. The m-subunit includes pm-block and
m-block. The d-subunit includes pm-block and d-block. The pm-block, with ACS unit is obtained by exploring the structure of the MS
inputs from other states, is shared by both m-subunit and d-subunit. unit. Based on the fact that the d-subunit is more complex
than m-subunit and requires longer propagation delay, if the
in Fig. 8. The m-subunit, which includes the pm-block and the register at the output of the d-subunit can be moved backward
m-block, computes the state metrics. The d-subunit, which in- by redesigning the retiming cutset which balances the settling
cludes the pm-block and the d-block, computes the preliminary time of the m-subunit and the d-subunit, the critical path in
decision information for the next less significant bit. Thus, the the bit-level ACS structures can be further reduced. Fig. 9(b)
settling time of the MS unit is equal to the maximum one of the illustrates a possible retiming cutset design. The possible
two subunits. The circuits in the d-subunit are more complex candidates of critical path are the dotted lines marked with ,
than the m-subunit. Thus , and . Path is a shortened version of the critical path
in Fig. 9(a), while paths and are new candidates due to
(5) the retiming cutset applied. The complexity and the settling
time of the m-subunit and the d-subunit depend on the number
where and are the settling times for the d-subunit and the of the ISMs for a state in the Viterbi decoder. With a large
m-subunit, respectively. number of states (such as a 3-step look-ahead 8-state Viterbi
508 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 3, MARCH 2004

Fig. 9. Bit-level ACS unit structure by considering the MS unit as two logically separated subunits: m-subunit and d-subunit. Due to the similarity of the ACS
structure, only a portion of the bit-level structure is shown in the figure to illustrate the cutset and the candidates of the critical path. (a) Detailed circuits of the
bit-wise ACS unit by exploring the structure of the MS block. Only the circuits for 2 bits are shown. (b) Retimed bit-level ACS unit by considering the MS unit
as two separated subunits.

decoder), the settling time of d-subunit may be greater than the composed of three levels of multiplexers as shown in Fig. 10,
sum of the settling times of a full adder and a coder converter. we conclude that the condition in (8)
The critical path is the maximum one among paths , and always holds true, i.e., . Thus, the critical path of
, i.e., as shown in (8) at the bottom of the page, where the retimed structure shown in Fig. 9(b) is as shown in (9) at
is the settling time of the path , which includes two the bottom of the page. The critical path is smaller than
full adders, two CCs and two m-subunits, is the settling the critical path obtained inSection II, if any one of the
time of the path , which includes one full adder, one CC, following conditions holds:
two m-subunits, and one d-subunit and is the settling
time of the path , which includes two d-subunits and one
m-subunit. Noticing that the m-subunit and d-subunit share (10a)
the same pm-block and the decision logic in the d-subunit is and (10b)

if
if (8)
if

if
if (9)
PARHI: IMPROVED PIPELINED MSB-FIRST ACS UNIT STRUCTURE 509

It is concluded that

(13)

The equality in (13) holds true if and only if . If


the number of the ISMs in the Viterbi decoder for a single state
is large, such as in a 3-step look-ahead 8-state Viterbi decoder,
due to complexity of the pm-block, will be always larger
than . The critical path is always smaller than
the critical path obtained in Section II, independent of
the number of the ISMs.

B. Critical Path Comparison


The analysis of the critical path and the improved retiming
cutset are not limited to 4-state Viterbi decoders. They can be
extended to Viterbi decoder with arbitrary number of states. At
the same time, they can also be applied to decoders involving
Fig. 10. Decision logic in a d-block.
look-ahead [3] or sliding block [9].
The assumed settling times of the basic units on the critical
It is worth noticing that the d-subunit contains the same
paths are shown in Table I. If the number of the ISMs for a single
pm-block as the m-subunit. To achieve the critical path , the
state in the trellis is large, the settling time of the pm-block
pm-block must be physically duplicated for both the m-subunit
increases due to the increase of the complexity of the logic.
and the d-subunit. The complexity of the decoder will signifi-
Thus, the settling time of the m-subunit and the d-subunit also
cantly increase and the efficiency of the hardware will decrease.
Instead of using the retiming cutset which achieves a crit- increases. The critical paths for different pipelining cases are
ical path , a more efficient retiming cutset can be applied listed in Table II. The critical path of a 4-state Viterbi decoder
inside the d-subunit, which is illustrated by the dashed line in is reduced from 2.2 ns as in the conventional approach to 1.9
Fig. 11. With this retiming design, the pm-block can be physi- ns in the novel proposed approach. For an 8-state decoder, the
cally shared by both d-subunit and m-subunit. The d-subunit is critical path reduces from 3.1 ns to 2.7 ns. Therefore, the crit-
divided by the cutset into two parts. The path only passes ical path reduction is about 14% in a 4-state and about 13% in
through d-block instead of the whole d-subunit. The settling an 8-state Viterbi decoder. The values of are only 15% and
times on the paths and are more balanced. Without mod- 5.9% larger than the iteration bound, , for a 4-state Viterbi
ifying the settling time on the path , the critical path is decoder and an 8-state Viterbi decoder, respectively.
With the improved retiming cutset, it is possible to implement
a high-speed Viterbi decoder with relatively lower hardware
complexity. For instance, to implement a 8-state 10 Gb/s Viterbi
(11) decoder, a parallel Viterbi decoder based on either look-ahead
where is the settling time of the d-block, which is shorter or sliding-block needs to be implemented. For a design using
than , the settling time of the d-subunit. conventional retiming design with critical path , requires
By exploring the structure of the decision logic, the settling a clock period of at least 3.4 ns, if an additional 0.3-ns clock
time of the path can be further reduced by moving the de- setup/hold time is counted in. However, for a 32-parallel de-
lays at the output of the d-block in Fig. 9(a) inside the block. sign, the maximum allowed clock period is 3.2 ns to achieve
Fig. 12 illustrates the decision logic after applying the new re- a decoding speed of 10 Gb/s. Thus, either a 64-parallel design
timing cutset. The dashed line in Fig. 12 is the retiming cutset, in the look-ahead Viterbi decoder1 or a 48-parallel design in
which moves the registers from the output of the circuit to the sliding block Viterbi decoder2 is mandatory. With the improved
inputs and some wires inside the logic. By applying the retiming retiming cutsets, the critical path, , is 2.7 ns. After consid-
cutset inside the decision logic, the settling time on the path ering the clock setup/hold time, the clock period required can be
is further reduced to as small as 3.0 ns, which is shorter than 3.2 ns. Thus, it is feasible
to implement 10 Gb/s Viterbi decoder with only 32-parallel de-
where isthedelayofthemultiplexers insidetheretimingcutset sign. The hardware complexity of the hardware is significantly
in the decision logic. As shown in Fig. 12, equals the sum of reduced compared with a 48-parallel or 64-parallel design.
the settling time of two multiplexers. It is independent of the num-
bers of states and ISMs in the Viterbi decoder. Thus, is always IV. CONCLUSION
smaller than the sum of the settling time of a full adder and a coder
converter.Itisworthnoticingthatthesettlingtimeoftheremaining In this paper, after exploring the detailed circuits in the
partofthed-subunitisjustequaltothesettlingtimeofthepm-block bit-level ACS structure in the Viterbi decoder, a more precise
which is always smaller than the settling time of the m-subunit. estimate of the critical path of the bit-level parallel ACS
With the cutset inside the decision logic, the critical path be- 1For a look-ahead Viterbi decoder, the level of the parallelism is constrained
comes to be a power of 2.
2For a sliding-block Viterbi decoder, the level of parallelism is assumed to be
(12) a multiple of eight.
510 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 3, MARCH 2004

Fig. 11. Bit-level ACS structure after applying retiming cutsets inside the d-subunit, which separate the d-subunit into the pm-block and the d-block. The inputs
of the pm-blocks are represented with dotted lines, because these are from the ACS sturctures of other ISMs. The retiming cutsets are illustrated as dashed lines.

Fig. 12. Decision logic after retiming.

TABLE I
ASSUMPTION ON SETTLING TIME OF COMPUTATION COMPONENTS IN BIT-LEVEL ACS STRUCTURE

structure is obtained. Several improvements on the retiming ment the high-speed Viterbi decoders with a relatively low
cutset are proposed. Among them, the one with the cutset inside level of parallelism. The reduction of the level of parallelism
the decision logic achieves the smallest settling time, which significantly decreases the complexity of the hardware, inde-
leads to a reduction of the critical path by about 12%–15%. pendent of whether the parallel decoder is implemented using
The shortening of the critical path makes it feasible to imple- look-ahead or sliding-block technique.
PARHI: IMPROVED PIPELINED MSB-FIRST ACS UNIT STRUCTURE 511

TABLE II
CRITICAL PATH

ACKNOWLEDGMENT Keshab K. Parhi (S’85–M’88–SM’91–F’96)


received the B.Tech., M.S.E.E., and Ph.D. degrees
The author would like to thank A. Abnous for useful discus- from the Indian Institute of Technology, Kharagpur,
sions during the course of this work and J. Tang for his help in India, the University of Pennsylvania, Philadelphia,
and the University of California at Berkeley, in 1982,
preparation of the paper. This work was carried out while the au- 1984, and 1988, respectively.
thor was at Broadcom Corporation, Irvine, CA, while on leave Since 1988, he has been with the University of
from the University of Minnesota. Minnesota, Minneapolis, where he is currently a
Distinguished McKnight University Professor in the
Department of Electrical and Computer Engineering.
REFERENCES His research addresses VLSI architecture design
and implementation of physical layer aspects of broadband communications
[1] A. J. Viterbi, “Error bounds for convolutional coding and an asymptot-
ically optimum decoding algorithm,” IEEE Trans. Inform. Theory, vol. systems. He is currently working on error-control coders and cryptography
architectures, high-speed transceivers, ultra wideband systems, and quantum
IT-13, pp. 260–269, Apr. 1967.
[2] G. Fettweis and H. Meyr, “Parallel Viterbi algorithm implementation: error-control coders and quantum cryptography. He has published over 350
papers, has authored the text book VLSI Digital Signal Processing Systems
Breaking the ACS-bottleneck,” IEEE Trans. Commun., vol. 37, pp.
785–790, Aug. 1989. (New York: Wiley, 1999) and coedited the reference book Digital Signal
[3] P. Black and T. Meng, “A 140 Mb/s 32-state radix-4 Viterbi decoder,” Processing for Multimedia Systems (New York: Marcel Dekker, 1999).
Dr. Parhi is the recipient of numerous awards including the 2003 IEEE
IEEE J. Solid-State Circuits, vol. 27, pp. 1877–1885, Dec. 1992.
[4] G. Fettweis and H. Meyr, “High rate Viterbi processor: A systolic array Kiyo Tomiyasu Technical Field Award, the 2001 IEEE W.R.G. Baker prize
paper award, and a Golden Jubilee award from the IEEE Circuits and Systems
solution,” IEEE J. Select. Areas Commun., vol. 8, pp. 1520–1534, Oct.
1990. Society in 1999. He has served on Editorial Boards of IEEE TRANSACTIONS ON
VLSI SYSTEMS, IEEE TRANSACTIONS ON SIGNAL PROCESSING, IEEE SIGNAL
[5] A. Yeung and J. Rabaey, “A 210 Mb/s radix-4 bit-level Viterbi decoder,”
in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1995, pp. 88–89. PROCESSING LETTERS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS,
[6] V. S. Gierenz, O. Weiss, T. G. Noll, I. Carew, J. Ashley, and R. Karabed, and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II, currently serves
“A 550 Mb/s radix-4 bit-level pipelined 16-state 0.25- CMOS m on Editorial Boards of the IEEE Signal Processing Magazine and Journal
of VLSI Signal Processing Systems, and is the current Editor-in-Chief of the
Viterbi decoder,” in Proc. IEEE Int. Conf. Application-Specific Systems,
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, for
Architectures, and Processors, 2000, pp. 195–201.
[7] T. Gemmeke, M. Gansen, and T. G. Noll, “Implementation of scalable 2004-2005. He served as Technical Program Cochair of the 1995 IEEE VLSI
Signal Processing Workshop and the 1996 ASAP Conference, and as the
power and area efficient high-throughput Viterbi decoders,” IEEE J.
Solid-State Circuits, vol. 37, pp. 941–948, July 2002. General Chair of the 2002 IEEE Workshop on Signal Processing Systems. He
was a Distinguished Lecturer for the IEEE Circuits and Systems Society from
[8] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Im-
plementation. New York: Wiley, 1999. 1997 to 1999.
[9] P. J. Black and T. H.-Y. Meng, “A 1-Gb/s, four-state, sliding block
Viterbi decoder,” IEEE J. Solid-State Circuits, vol. 32, pp. 797–805,
June 1997.

You might also like