An Improved Pipelined MSB-First Add-Compare Select Unit Structure For Viterbi Decoders
An Improved Pipelined MSB-First Add-Compare Select Unit Structure For Viterbi Decoders
3, MARCH 2004
Abstract—Convolutional codes are widely used in many commu- The survivor path memory unit (SMU) processes the deci-
nication systems due to their excellent error-control performance. sions made in the BMU and ACS and outputs the decoded data.
High-speed Viterbi decoders for convolutional codes are of great The BMU and the SMU are only composed of feedforward
interest for high-data-rate applications. In this paper, an improved
most-significant-bit (MSB) -first bit-level pipelined add-compare paths. It is relatively easy to shorten the critical path in these two
select (ACS) unit structure is proposed. The ACS unit is the main units by utilizing pipelining and parallel processing techniques.
bottleneck on the decoding speed of a Viterbi decoder. By balancing However, the feedback loop in the ACS unit is the major bottle-
the settling time of different paths in the ACS unit, the length of the neck for the design of a high-speed Viterbi decoder. Although
critical path is reduced as close as possible to the iteration bound it is possible to implement the high-speed ACS unit with par-
in the ACS unit. With the proposed retimed structure, it is possible
to decrease the critical path of the ACS unit by 12% to 15% com- allel processing techniques [2], it is still desirable to reduce the
pared with the conventional MSB-first structures. This reduction length of the critical path in the ACS unit. The reduction of the
in critical path can reduce the level of parallelism (and area) re- critical path can significantly lower the level of parallelism and
quired for a very high-speed Viterbi decoder. the hardware complexity for a specified speed.
Index Terms—Add compare select, MSB first, pipelining, redun- Several structures have been proposed to speed up the compu-
dant arithmetic, retiming, Viterbi decoder. tation of ACS unit. In [2] and [3], the look-ahead technique was
utilized to speed up the ACS unit. With the -step look ahead,
one iteration in the ACS unit is equivalent to iterations in the
I. INTRODUCTION
nonlook-ahead implementation. Thus, the speed requirements
Fig. 2. Trellis diagram of a 4-state Viterbi decoder. (a) One computation step in a 4-state Viterbi decoder. Each state has two branches connected to other states.
(b) One computation step in a 4-state Viterbi decoder with 2-step look-ahead. There are four branches connected to each state. For simplicity, only those branches
pointing to state 0 are shown in the figure.
(2)
Fig. 3. Circuit and function of a CC. (a) Circuit of a CC. (b) Truth table of a
CC.
where , and are the computational latencies of a
full adder, a CC, and a maximum selection (MS) unit, respec-
Viterbi decoder, the critical path can be shortened by about 15%. tively. The critical path of the circuit is shown in Fig. 5 as
For an 8-state Viterbi decoder, it is possible to increase the speed a dotted line
by about 13%.
(3)
II. CRITICAL PATH OF BIT-LEVEL PARALLEL ACS UNIT
where is the word length used in calculating the state met-
An MSB-first ACS unit can be further divided into 3 basic rics. In this example, the word length equals 8. The circuit
function blocks as shown in Fig. 4. The three basic function of a MS unit is more complex than a full adder or a CC. To re-
blocks are adders, CCs, and maximum selection (MS) units. duce the length of the critical path and increase the speed of the
The adders compute the summation of the state metrics and the computation, the number of MS units in the critical path must
branch metrics, i.e., the intermediate state metrics. The CCs re- be reduced.
move a redundant state in the sum-carry representations of the After applying the retiming cutsets every 2 bits as shown in
ISM, as illustrated in Fig. 3. In the MS units, the local ISM Fig. 6 [5], [6], the length of the critical path is reduced to
is compared with the maximum one of all other ISMs for the
same state. The comparison scheme can achieve higher locality (4)
506 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 3, MARCH 2004
Fig. 6. Two-bit-level retimed version of the ACS structure. The critical path is significantly reduced and is no longerproportional to the word-length B . The
cutsets are shown with dashed lines and the critical path is shown with dotted line.
Although the numbers of full adders and CCs in the critical path different ISMs. The output of the pm-block is the maximum one
are doubled, the number of the MS units is significantly reduced. of all other ISMs. With the signals from the pm-block ( and
At the same time, the length of the critical path is independent ), the CC ( and ) and the d-block in the last more signif-
of the word length used to compute the state metrics. icant bit ( and ), m-block computes the maximum value
To reduce the critical path by novel retiming, the detailed among all of the intermediate state metrics and feeds it back to
structures of the MS units in the bit-level ACS structure must be the full adder in the ACS unit as the state metric . The
explored. The structure of the MS unit is shown in Fig. 7, which state metric at the output of m-block is expressed in the form of
compares the local ISM with the maximum one among all other sum and carry, ( , ). The inputs of the d-block
ISMs of the same state [6]. It can be found that the MS unit can include the signals from the pm-block, the CC, and the d-block
be further divided into three blocks, from left to right: the max- in the circuits of last more significant bit. The outputs of the
imum value block (m-block), the decision block (d-block), and d-block are sent to the MS unit of the next LSB.
the partial maximum block (pm-block). The pm-block uses the The MS unit, according to Fig. 7, can be logically divided
signals from other bit-level ACS structure for the same state with into two subunits (the m-subunit and the d-subunit, as shown
PARHI: IMPROVED PIPELINED MSB-FIRST ACS UNIT STRUCTURE 507
(6)
(7)
Fig. 9. Bit-level ACS unit structure by considering the MS unit as two logically separated subunits: m-subunit and d-subunit. Due to the similarity of the ACS
structure, only a portion of the bit-level structure is shown in the figure to illustrate the cutset and the candidates of the critical path. (a) Detailed circuits of the
bit-wise ACS unit by exploring the structure of the MS block. Only the circuits for 2 bits are shown. (b) Retimed bit-level ACS unit by considering the MS unit
as two separated subunits.
decoder), the settling time of d-subunit may be greater than the composed of three levels of multiplexers as shown in Fig. 10,
sum of the settling times of a full adder and a coder converter. we conclude that the condition in (8)
The critical path is the maximum one among paths , and always holds true, i.e., . Thus, the critical path of
, i.e., as shown in (8) at the bottom of the page, where the retimed structure shown in Fig. 9(b) is as shown in (9) at
is the settling time of the path , which includes two the bottom of the page. The critical path is smaller than
full adders, two CCs and two m-subunits, is the settling the critical path obtained inSection II, if any one of the
time of the path , which includes one full adder, one CC, following conditions holds:
two m-subunits, and one d-subunit and is the settling
time of the path , which includes two d-subunits and one
m-subunit. Noticing that the m-subunit and d-subunit share (10a)
the same pm-block and the decision logic in the d-subunit is and (10b)
if
if (8)
if
if
if (9)
PARHI: IMPROVED PIPELINED MSB-FIRST ACS UNIT STRUCTURE 509
It is concluded that
(13)
Fig. 11. Bit-level ACS structure after applying retiming cutsets inside the d-subunit, which separate the d-subunit into the pm-block and the d-block. The inputs
of the pm-blocks are represented with dotted lines, because these are from the ACS sturctures of other ISMs. The retiming cutsets are illustrated as dashed lines.
TABLE I
ASSUMPTION ON SETTLING TIME OF COMPUTATION COMPONENTS IN BIT-LEVEL ACS STRUCTURE
structure is obtained. Several improvements on the retiming ment the high-speed Viterbi decoders with a relatively low
cutset are proposed. Among them, the one with the cutset inside level of parallelism. The reduction of the level of parallelism
the decision logic achieves the smallest settling time, which significantly decreases the complexity of the hardware, inde-
leads to a reduction of the critical path by about 12%–15%. pendent of whether the parallel decoder is implemented using
The shortening of the critical path makes it feasible to imple- look-ahead or sliding-block technique.
PARHI: IMPROVED PIPELINED MSB-FIRST ACS UNIT STRUCTURE 511
TABLE II
CRITICAL PATH