0% found this document useful (0 votes)

2 views9 pages

Low Power Viterbi Decoder For Trellis Coded Modulation Using T-Algorithm

This paper presents a low power Viterbi decoder architecture for Trellis Coded Modulation (TCM) using a precomputation T-algorithm, achieving up to 70% power reduction with minimal speed loss. The proposed design effectively addresses the high complexity and power consumption issues associated with Viterbi decoders in TCM systems. Implementation results demonstrate the architecture's efficiency for a ¾ convolutional code rate with a constraint length of 7.

Uploaded by

Javeed Mohammad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views9 pages

Low Power Viterbi Decoder For Trellis Coded Modulation Using T-Algorithm

Uploaded by

Javeed Mohammad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 1

ISSN 2229-5518

Low power Viterbi decoder for Trellis coded

Modulation using T-algorithm
Md.Javeed, B.Sri lakshmi

Abstract: The viterbi decoder which is low power with the convolutional encoder for the trellis coded modulation is shown in this paper. Convolutional
encoding with Viterbi decoding is a good forward error correction technique suitable for channels affected by noise degradation. In this paper it
shows the viterbi decoder architecture with convolutional encoder with proposed precomputation T -algorithm which can effectively reduce the power
consumption with negligible decrease in the speed. Implementation result is for ¾ convolutional code rate with constraint length 7 used for trellis coded
modulation. This architecture reduces the power consumption up to 70% without any performance loss, while the degradation in clock speed is
negligible.

Key words: Convolutional code, T-algorithm, Trellis coded modulation (TCM), viterbi decoder, VLSI.

I. INTRODUCTION: supply voltage[6].Over scaling of the supply voltage is

having a problem that it needs to take whole system into
The use of convolutional codes with probabilistic decoding consideration including with VD at which we are not
can significantly improve the error performance of a focusing of our research. In practical application RSSD is
communication system [1]. Trellis coded modulation more commonly used than M-Algorithm which is generally
schemes are used in many bandwidth efficient systems. not as efficient as M-algorithm[3] and T-Algorithm.
Typically a TCM system employs a high rate convolutional Basically M-Algorithm requires a sorting process in a
code, which leads to high complexity of viterbi decoder for feedback loop where as T–
the TCM decoder, when the constraint length of
Convolutional code is also normal. For example the rate ¾ Algorithm only searches for the optimal path metric [P]
convolutional code used in trellis coded modulation system that is the maximum value or the minimum value of Ps.
for any application has a constraint length of 7 will be in the
complexity of the corresponding viterbi decoder for a rate T-Algorithm has been shown to very efficient in
½ convolutional code with constraint length of 9 [2] due to reducing the power consumption [7],[8]. However,
the large number of transitions in the trellis. So, In terms of searching for the optimal path metric in the feedback loop
power consumption, the viterbi decoder is dominant still reduces the decoding speed. To overcome this
module in a TCM decoder. In order to reduce the drawback, T-Algorithm has proposed in two variations, the
computational complexity as well as power consumption, relaxed adaptive VD [7], Which suggests using an
low power schemes should be exploited for the VD in a estimated optimal path metric, instead of finding the real
TCM decoder. one each cycle and the limited-search parallel state VD
based on scarce state transition [SST][8].
General solutions for low power viterbi decoder design will
be studied in our implementation work. Power reduction in When applied to high rate convolutional codes, the
VDs could be achieved by reducing the number of states, relaxed adaptive VD suffers a severe degradation of bit-
(for example reduced state sequence decoding [3], M- error-rate(BER) performance due to the inherent drifting
algorithm [4] and T-algorithm [1],[5],) or by over scaling the error between the estimated optimal path metric and the
accurate one[9]. On the other hand the SST based scheme
requires predecoding and re encoding process and is not
suitable for TCM decoders. In TCM, the encoded data are
MD. Javeed is pursuing his master of technology in VLSI systems in Bomma
always associated with a complex multi level modulation
institute of technology and science , Jawaharlal Nehru technological
university, India. (E-mail: [email protected])
scheme like 8-ary phase shift keying (8PSK) OR 16/64-ary
quadrature amplitude modulation (16/64QAM) through a
B. Sri lakshmi is currently working as an assi.prof. in electronics and constellation point mapper. At the receiver, a soft input VD
communication engineering in Jawaharlal Nehru technological university, should be employed to guarantee a good coding gain. So,
India.( E-mail: [email protected])
the computational over head and decoding latency due to
predecoding and re encoding of the TCM signal become
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 2
ISSN 2229-5518

high. An add-compare select unit (ACSU) architecture

based on precomputation for VDs incorporating T- A. Precomputation Algorithm
Algorithm [9], which efficiently improves the clock speed
The Basic idea of the precomputation algorithm
of a VD with T-Algorithm for a rate ¾ code. Now, we
was presented in [9]. The Branch metric can be calculated
further analyze the precomputation algorithm. A
by two types: Hamming distance and Euclidean distance
systematic way to determine the optimal precomputation
[10]. Consider a VD for a convoluional code with a
steps is shown, where the minimum number of steps for
constraint length k, where each state receives p candidate
critical path to achieve the theoretical iteration bound is
paths. First, we expand Ps at the current time slot n(Ps(n))
calculated and the computational complexity overhead due
as a function of Ps(n-1)to form a look-ahead computation of
to precomputation is evaluated. Then, we discuss a
the optimal P-Popt (n). If branch metrics are calculated
complete low-power VD design for the rate ¾
based on the Euclidean distance, popt(n) is the minimum
convolutional code[2]. Finally ASIC implementation results
value of Ps(n) can be get as
of VD with convolutional encoding are shown.
popt(n) =min{p0(n),p1(n),……..p2-1k(n)}
In this paper section II gives Information about VDs
.Section III presents the precomputation architecture with =min{min[p0,0(n-1)+B0,0(n),p0,1(n-1)+B0,1(n)……….,
T-algorithm. Design example with the modifications of p0,p(n-1)+B0,p(n)],
survivor path memory unit(SMU) are discussed In section
IV. Synthesis and power estimation results are shown in Min[p1,0(n-1)+B1,0(n),p1,1(n-1)+B1,1(n),……,p1,p(n-1) +B1,p(n)],
section V. ……..,

II. VITERBI DECODER Min[p2k-1-1,0(n-1)+B2k-1-1,0(n),P2k-1-1,1(n-1)+B2k-1-1,1(n),…..,P2k-1-1,p

(n-1) +B2k-1- 1,p(n)]}

=min{P0,0(n-1)+B0,0(n),

P0,1(n-1)+B0,1(n),…….,

P0,p(n-1)+B0,p(n),

P1,0(n-1)+B1,0(n),

A general diagram for a viterbi decoder is shown in fig. 1. P1,1(n-1)+B1,1(n),…..,

First , branch metrics are calculated in the B unit (BMU)
P1,p(n-1)+B1,p(n),…….,
from the received symbols. In a TCM decoder, this module
is replaced by transition metrics unit (TMU), which is more P2k-1-1,0(n-1)+B2k-1-1,0(n),
complex than the BMU. Then, Bs are fed into the ACSU that
recursively compute the path metrics (Ps) and outputs P2k-1-1,1(n-1)+B2k-1-1,1(n),……,
decision bits for each possible state transition. After that,
the decision bits are stored in and retrieved from the SMU P2k-1-1,p(n-1)+B2k-1-1,p(n)}. (1)
in order to decode the source bits along the final survivor
Now, we group the states into several clusters to reduce the
path. The Ps of the current iteration are stored in the path
metric unit (PMU). computational overhead caused by look-ahead
computation. The trellis butterflies for a VD usually have a
For calculating the optimal Ps and puncturing states T- symmetric structure. In other words, the states can be
Algorithm requires extra computation in the ACSU loop. grouped into m clusters, where all the clusters have the
Therefore, a straight forward implementation of T- same number of states and all the states in the same cluster
Algorithm will dramatically reduce the decoding speed. will be extended by the same Bs. Thus (1) can be rewritten
The key point of improving the clock speed of T-Algorithm as
is to quickly find the optimal path metric.
Popt=min{min(Ps(n-1)in cluster 1)
III. PRECOMPUTATION ARCHITECTURE
+min(Bs(n) for cluster 1),

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 3
ISSN 2229-5518

Min(Ps(n-1) in cluster 2) determine the survivor path(the path with the minimum
metric) for each state. If T-algorithm is employed in the VD,
+min(Bs(n) for cluster 2), ……… , the iteration bound is slightly longer than TACSU because
there will be another two input comparator in the loop to
Min(Ps(n-1) in cluster m)
compare the new Ps with a threshold value obtained from
+min(Bs(n) for cluster m)}. the optimal Path metric and preset T as shown in (3)

The minimum (Bs) for each cluster can be easily obtained Tbound=Tadder+Tp_in_comp+T2-in_comp. (3)
from the BMU or TMU and min(Ps) at time n-1 in each
To achieve the iteration bound expressed in(3), for the
cluster can be precalculated at the same time when the
precomputation in each pipelining stage, we limit the
ACSU is updating the new Ps for time n. Theoretically,
comparison to be among only p 0r 2p metrics. To simplify
when we continuously decompose Ps(n-1), Ps(n-2),……, the
our evaluation , we assume that each stage reduces the
precomputation scheme can be extended to Q steps. Where
number of the metrics to 1/p(or2-R) of its input metrics
q is any positive integer that is less than n. Hence Popt(n)
meeting the theoretical iteration bound should satisfy
can be calculated directly from Ps(n-q) in q cycles.
(2R)qb ≥ 2k-1. Therefore qb≥ (k-1)/R and qb is expressed as
(4), with a ceiling function.

In the design example shown in[9], with a coding rate of ¾

and constraint length of 7, the minimum precomputation
steps for the VD to meet the iteration bound is 2 according
to (4). It is the same value as we obtained from direct
architecture design [9]. In some cases, the number of
remaining metrics may slightly expand during a certain
pipeline stage after addition with Bs. Usually, the extra
delay can be absorbed by an optimized architecture or
circuit design. Even if the extra delay is hard to eliminate,
the resultant clock speed is very close to the theoretical
bound. To fully achieve the iteration bound, we could add
another pipeline stage, though it is very costly.

B. Choosing Precomputation steps Computational overhead (compared with

conventional T-algorithm) is an important factor that
In [9], through a design example that, q -step pre-
should be carefully evaluated. Most of the computational
computation can be pipelined into q stages, where the logic
overhead comes from adding Bs to the metrics at each stage
delay of each stage is continuously reduced as q increases.
as indicated in (2). In other words, If there are m remaining
As a result, the de-coding speed of the low-power VD is
metrics after comparison in a stage, the computational
greatly improved. However, after reaching a certain
overhead from this stage is at least m addition operations.
number of steps, qb, further precomputation would not
The exact overhead varies from case to case based on the
result in additional beneﬁts because of the inherent
convolutional code’s trellis diagram. Again, to simplify the
iteration bound of the ACSU loop. Therefore, it is worth to
evaluation, we consider, a code with a constraint length k
discuss the optimal number of precomputation steps.
and q precomputation steps. Also, we assume that each
In a TCM system, the convolutional code usually has a remaining metric would cause a computational overhead of
coding rate of R/(R+1) , R=2,3,4,……, so that in (1), p=2R one addition operation. In this case, the number of metrics
and the logic delay of the ACSU is TACSU=Taddder+Tp-in_comp, will reduce at a ratio of 2(k-1)/q and the overall
where Tadder is the logic delay of the adder to compute Ps computational overhead is (measured with addition
of each candidate path that reaches the same state and Tp- operation)
in_comp is the logic delay of a p-input comparator to
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 4
ISSN 2229-5518

Noverhead=20+2(k-1)/q+22(k—1)/q……..+2(q-1)(k-1)/q

=2q.(k-1)/q-1/2(k-1)/q-1

=2k-1-1/2(k-1)/q-1 (5)

The estimated computational overhead according to (5) is

63/ (26/q-1) when k=7 and q.≤ 6, which almost exponentially
to q. In a real design the overhead increases even faster
than what is given by (5) when other factors (such as
comparisons or expansion of metrics as we mentioned
above) are taken into consideration. Therefore, a small
number of precomputational steps is preferred even though
the iteration bound may not be fully satisfied. In most
cases, one or two-step precomputation is a good choice.
BER performance of the VD employing T-algorithm with
The above analysis also reveals that precomputation is
different values of T over an additive white Gaussian noise
not a good option for low rate convolutional codes (rate of
channel is shown in Fig. 4. The simulation is based on a 4-D
1/RC, RC=2,3,…..), because it usually needs more than two
8PSK TCM system employing the rate -3/4 code [11]. The
steps to effectively reduce the critical path(in that case, R=1
overall coding rate is 11/12 after due to other uncoded bits
in(4) and qb is k-1). However, for TCM systems, where
in TCM system. Compared with over ideal viterbi
high-rate convolutional codes are always employed, Two
algorithm, the threshold ‚Tpm‛ can be lowered to 0.3 with
steps of precomputation could achieve the iteration bound
less than 0.1 dB of performance loss, while the
or make a big difference in terms of clock speed. In
computational complexity could be reduced by upto 90%
addition, the computational overhead is a small.
[9] ideally. Since the performance is the same as that of the
conventional T-algorithm.

A. One step precomputation

For the convenience of our discussion we define the

left most register in Fig. 3 as the most significant bit (MSB)
and right most register as the least significant bit (LSB). The
64 states and path metrics are labeled from 0 to 63.

IV. LOW POWER VITERBI DECODER DESIGN

We use the 4-D 8PSK TCM system described in[2] as the
example. The rate ¾ convolutional code employed in the
TCM system is shown in Fig. 3. Preliminary BER
performance and architecture design for the ACSU unit
have been shown in [9]. In his section, we further address
the SMU design issue. Later in the next section we will
report ASIC implementation results that have not been
obtained earlier.
A careful study reveals that the 64 states can be
partitioned into two groups odd numbered Ps( when ‘LSB’
is 1) And even numbered (when ‘LSB’ is 0) The odd PMs
are all extended by odd Bs (when Z0 is ‘1’) and the even
PMs are all extended by even Bs (when Z0 is ‘0’). The
minimum P becomes:
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 5
ISSN 2229-5518

Popt (n) = min {min (even Ps (n-1)) + while the odd states extend to states with lower indices (the
MSB is ‘0’ in Fig. 3). This information allows us to obtain
min(even Bs(n)), min (odd Ps (n-1)) the 2-step pre-computation data path. This process is
straightforward, although the mathematical details are
+min(odd Bs(n)) }.
tedious. For clarity, we only provide the main conclusion
The functional diagram of the 1-step pre-computation here.
scheme is shown in Fig. 5. In general (Path metric purge
The states are further grouped into 4 clusters as described
algorithm) PPAU have to wait for the new Ps from the
by (7). The BMs are categorized in the same way and are
ACSU to calculate the optimal Path metric [12], while in
described by (8).
Fig. 5 the optimal Path metric is calculated directly from Ps
in the previous cycles at the same time when the ACSU is cluster3 = {Pm | 0≤m≤ 63, m mod 4 = 3}
calculating the new Ps. The details of the PPAU are shown
in Fig. 6. cluster2 = ,Pm | 0 ≤m≤63, m mod 4 = 1}

cluster1 = ,Pm | 0 ≤m≤ 63, m mod 4 = 2}

cluster0 = ,Pm | 0 ≤m≤ 63, m mod 4 = 0} (7)

BMG3 = {Bm | 0≤m≤ 15, m mod 4 = 3}

BMG2 = {Bm | 0≤m≤ 15, m mod 4 = 1}

BMG1 = {Bm | 0≤m≤ 15, m mod 4 = 2}

BMG0 = {Bm | 0≤m≤ 15, m mod 4 = 0} (8)

The optimal PM at time n is calculated as

Popt (n) = min [min {min (cluster0 (n-2))+ min

(BMG0 (n-1)), min (cluster1 (n-2))+

The critical path of the 1-step pre-computation scheme is min (BMG1 (n-1)), min (cluster2 (n-

T1-step-pre-T = 2TAdder+ 2T4-in_comp +3T2-in-comp (6) 2)) + min (BMG3 (n-1)), min (cluster3 (n-2))+ min (BMG2(n-
1)) }+ min (even Bs(n)),
The hardware overhead of the 1-step pre-
computation scheme is about 4 adders, which is negligible. min {min (cluster0 (n-2))+ min (BMG1(n-1)),
Compared with the SEPC-T algorithm, however, the critical
path of the 1-sept pre-computation scheme is still long[12]. min (cluster1 (n-2))+ min (BMG0(n-1)),
In order to further shorten the critical path, we explore the
min (cluster2 (n-2)) + min (BMG2 (n-1)),
2-step pre-computation design next.
min (cluster3 (n-2))+ min (BMG3(n-1))
B. Two step precomputation
}+ min (odd Bs(n)) (9)
a. Acsu design

We again need to analyze the trellis transition of the

original code. In the 1-step pre-computation architecture,
we have pointed out that for the particular code shown in
Fig. 3, odd-numbered states are extended by odd Bs, while
even-numbered states are extended by even Bs.
Furthermore, the even states all extend to states with higher
indices (the MSB in Fig. 3 is ‘1’) in the trellis transition,

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 6
ISSN 2229-5518

b. SMU design

In this section, we address an important issue regarding

SMU design when T -algorithm is employed. There are two
different types of SMU in the literature: register exchange
(RE) and trace back (TB) schemes. In the regular VD
without any low-power schemes, SMU always out-puts the
decoded data from a fixed state (arbitrarily selected in
advance) if RE scheme is used, or traces back the survivor
path from the fixed state if TB scheme is used, for low-
complexity purpose. For VD in-corporated with T-
algorithm, no state is guaranteed to be active at all clock
cycles. Thus it is impossible to appoint a fixed state for
The functional block diagram of viterbi decoder with two either out-putting the decoded bit (RE scheme) or starting
step precomputation T-algorithm is shown in fig. 7 The the trace-back process (TB scheme). In the conventional
minimum value of each branch metric group (BMG) can be implementation of T -algorithm, the decoder can use the
calculated in BMU or TMU and then passed to threshold optimal state (state with Popt ), which is always enabled, to
generator unit (TGU) to calculate(Popt+T)- (Popt+T) and the output or trace back data. The process of searching for Popt
new Ps are compared in the ‚purge unit‛. The architecture can find out the index of the optimal state as a byproduct.
of TGU is shown in fig. 8 which implements the key However, when the estimated Popt is used [8], or in our
functions of two stem precomputation scheme. In figure 8 case Popt is calculated from PMs at the previous time slot,
the ‚MIN 16‛ unit for finding the minimum value in each it is difficult to find the index of the optimal state.
cluster is constructed with two stages of four-input
comparators. This architecture has been optimized to meet A practical method is to find the index of an
the iteration bound [9]. Com-pared with the conventional enabled state through a 2k-1 to k-1 priority encoder. Suppose
T-algorithm, the computational overhead of this that we have labeled the states from 0 to 63. The output of
architecture is 12 addition operations and a comparison, the priority encoder would be the unpurged state with the
which is slightly more than the number obtained from the lowest index. Assuming the purged states have the flag ‚0‛
evaluation in (5) and other states are assigned the flag ‚1‛, the truth table of
such a priority encoder is shown in Table I, where ‚flag‛ is
the input and ‚index‛ is the output. Implementation of
such a table is not trivial. In our design, we employ efficient
architecture for the 64-to-6 priority encoder based on three
4-to-2 priority encoders, as shown in Fig. 7. The 64 flags are
first divided into 4 groups, each of which contains 16 flags.
The priority encoder at level 1 detects which group contains
at least one ‚1‛ and determines ‚index [5:4+‛. Then MUX2
selects one group of flags based on ‚index *5:4+‛. The input
of the priority encoder at level 2 can be computed from the
output of MUX2 by ‚OR‛ operations. We can also reuse the
intermediate results by introducing another MUX (MUX1).
The output of the priority encoder at level 2 is ‚index [3:2+‛.
Again, ‚index [3:2+‛ selects four flags (MUX3) as the input
of the priority en-coder at level 3. Finally, the last encoder
will determine ‚index *1:0+‛

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 7
ISSN 2229-5518

Implementing the 4-to-2 priority encoder is much simpler

than implementing the 64-to-6 priority encoder. Its truth
table is shown in Table II and the corresponding logics are
shown in (10) and (11)
V. IMPLEMENTATION RESULTS
The full-trellis VD, the VD with the two-step
precomputation architecture and one with the conventional
algorithm are modeled with Verilog HDL code. The soft
inputs of all VDs are quantized with 7 bits. Each PM in all
VDs is quantized as 12 bits. RE scheme with survival length
of 42 is used for SMU and the register arrays associated
Table I
Truth table of 64-to-6 Priority Encoder with the purged states are clock-gated to reduce the power
Flag[63:0] Index[5:0] consumption in SMU. For ASIC synthesis, we use TSMC
x x ……………………………x x x x x 1 000000 90-nm CMOS standard cell. The synthesis targets to achieve
the maximum clock speed for each case and the results are
x x ……………………………x x x x 1 0 000001
shown in Table III. Table III shows that the VD with two-
x x ……………………………x x x 1 0 0 000010
step precomputation architecture only decreases the clock
x x ……………………………x x 1 0 0 0 000011
speed by 11% compared with the full trellis VD.
x x ……………………………x 1 0 0 0 0 000100
Meanwhile, the increase of the hardware area is about 17%.
: : The decrease of clock speed is inevitable since the iteration
: : bound for VD with T -algorithm is inherently longer than
: : that of the full-trellis VD. Also, any kinds of low-power
X 1 0 ………………………..0 0 0 0 0 0 111110 scheme would introduce extra hardware like the purge unit
1 0 0 ………………………..0 0 0 0 0 0 111111 shown in Fig. 5 or the clock-gating module in the SMU.
Therefore, the hardware overhead of the proposed VD is
Table II expected. On the other hand, the VD with conventional T-
Truth table of 4-to-2 priority encoder
algorithm cannot achieve half of the clock speed of the full
Input(I[3:0]) Output(O[1:0]) trellis VD.
xxx1 00
xx10 01 Therefore, for high-speed applications, it should
x100 10 not be considered. It is worth to mention that the
1000 11 conventional T -algorithm VD takes slightly more hardware
than the proposed architecture, which is counterintuitive.
Table III This is because the former decoder has a much longer
Synthesis results for maximum clock speed critical path and the synthesis tool took extra measures to
Max Cell area(mm2) improve the clock speed. It is clear that the conventional T-
speed(MHZ) algorithm is not suitable for high-speed applications. If the
Full-trellis VD 505 0.58
target throughput is moderately high, the proposed
VD with 2-step 446.4(-11.6%) 0.68(+17.2%)
pre-computation architecture can operate at a lower supply voltage, which
Conventional 232(-54.1%) 0.685(+18%) will lead to quadratic power reduction compared to the
T-algorithm conventional scheme. Thus we next focus on the power
comparison between the full trellis VD and the proposed
scheme. We estimate the power consumption of these two
Table IV designs with Synopsys Prime Power under the clock speed
Power estimation results of 200 Mb/s (power supply of 1.0 V, temperature of 300 K).
Power(mw) A total of 1133 received symbols (12 000 bits) are simulated.
Full-trellis VD 21.473(100%) The results are shown in Table IV. With the ﬁnite word-
VD with 2-step Tpm=0.75 20.069(93.5%) length implementation, the threshold can only be changed
pre-computation Tpm=0.625 17.186(80.0%) by a step of 0.125. Therefore, to maintain a good BER
architecture Tpm=0.5 11.754(54.7%) performance, the minimum threshold we chose is 0.375.
Tpm=0.375 6.6127(30.8%) Table IV shows that, as the threshold decreases, the power
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 8
ISSN 2229-5518

consumption of the pro-posed VD is reduced accordingly. transition,‛ IEEE Trans. Very Large Scale Integr. (VLSI)
In order to achieve the same BER performance, the Syst. , vol. 15, no. 11, pp. 1172–1176, Oct. 2007.
proposed VD only consumes 30.8% the power of the full-
trellis VD. *8+ F. Sun and T. Zhang, ‚Low power state-parallel relaxed
adaptive viterbi decoder design and implementation,‛ in
VI. CONCLUSION Proc. IEEE ISCAS, M ay 2006, pp. 4811–4814.

We have proposed a low-power VD design for TCM *9+ J. He, H. Liu, and Z. Wang, ‚A fast ACSU architecture
systems. The precomputation architecture that incorporates for viterbi de-coder using T-algorithm,‛ in Proc. 43rd IEEE
T-algorithm efficiently reduces the power consumption of Asilomar Conf. Signals,Syst. Comput. , Nov. 2009, pp. 231–
VDs without reducing the decoding speed appreciably. We 235.
have also analyzed the precomputation algorithm, where
the optimal precomputation steps are calculated and [10] K. S. Arunlal and Dr. S. A. Hariprasad‛ An efficient
discussed. This algorithm is suitable for TCM systems viterbi decoder‛ International Journal of Advanced
which always employ high-rate convolutional codes. Information Technology (IJAIT) Vol. 2, No.1, February 2012
Finally, we presented a design case. Both the ACSU and
[11] J. He, Z. Wang, and H. Liu, efﬁcient 4-D 8PSK TCM
SMU are modiﬁed to correctly de-code the signal. ASIC
decoder architecture,‛ IEEE Trans. Very Large Scale Integr.
synthesis and power estimation results show that,
(VLSI) Syst. , vol. 18, no. 5, pp. 808–817, May 2010.
compared with the full-trellis VD without a low-power
scheme, the precomputation VD could reduce the power *12+. A.A. Peshattiwar & Tejaswini G. Panse ‚High Speed
consumption by 70% with only 11% reduction of the ACSU Architecture for Viterbi Decoder Using T-
maximum decoding speed. Algorithm‛ International Journal of Electrical and
Electronics Engineering (IJEEE) ISSN (PRINT): 2231 – 5284,
VII. REFERENCES
Vol-1, Iss-3, 2012
*1+ F. Chan and D. Haccoun, ‚Adaptive viterbi decoding of
convolutional codes over memory less channels,‛ IEEE
Trans. Commun. , vol. no. 45,

[2] 11, pp. 1389–1400, Nov. 1997. ‚Bandwidth- efﬁcient

modulations,‛ Consultative Committee For Space Data
System, Matera, Italy, CCSDS 401(3.3.6) Green Book, Issue
1, Apr. 2003.

*3+ J. B. Anderson and E. Offer, ‚Reduced-state sequence

detection with convolutional codes,‛ IEEE Trans. Inf.
Theory , vol. 40, no. 3, pp. 965–972, May 1994.

*4+ C. F. Lin and J. B. Anderson, ‚ T-algorithm decoding of

channel convolutional codes,‛ presented at the Princeton
Conf. Info. Sci. Syst., Princeton, NJ, Mar. 1986.

*5+ S. J. Simmons, ‚Breadth-ﬁrst trellis decoding with

adaptive effort,‛IEEE Trans. Commun. , vol. 38, no. 1, pp.
3–12, Jan. 1990.

[6] R. A. Abdallah and N. R. Shanbhag, ‚Error-resilient

low-power viterbi decoder architectures,‛ IEEE Trans.
Signal Process. , vol. 57, no. 12, pp. 4906–4917, Dec. 2009.

[7] J. Jin and C.-Y. Tsui, ‚Low-power limited-search parallel

state viterbi decoder implementation based on scarece state

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 9
ISSN 2229-5518

ASTM F2245-07 Airplanes PDF
No ratings yet
ASTM F2245-07 Airplanes PDF
29 pages
Culinary Math
100% (1)
Culinary Math
11 pages
General Systems Theory
100% (1)
General Systems Theory
7 pages
Low-Power Viterbi Decoder Design For TCM Decoders
No ratings yet
Low-Power Viterbi Decoder Design For TCM Decoders
5 pages
High-Speed Low-Power Viterbi Decoder Design For TCM Decoders
100% (1)
High-Speed Low-Power Viterbi Decoder Design For TCM Decoders
16 pages
High-Speed Low-Power Viterbi Decoder Design For TCM Decoders
No ratings yet
High-Speed Low-Power Viterbi Decoder Design For TCM Decoders
13 pages
Design of High Speed Low Power Viterbi Decoder For TCM System
No ratings yet
Design of High Speed Low Power Viterbi Decoder For TCM System
6 pages
High-Speed Low-Power Viterbi Decoder Design For TCM Decoders
No ratings yet
High-Speed Low-Power Viterbi Decoder Design For TCM Decoders
20 pages
Document Content 4 Am MINE
No ratings yet
Document Content 4 Am MINE
65 pages
Implementation of Forward Error Correction Technique Using Convolutional Encoding With Viterbi Decoding
No ratings yet
Implementation of Forward Error Correction Technique Using Convolutional Encoding With Viterbi Decoding
4 pages
A Viterbi Decoder Architecture For A Standard-Agile and Reprogrammable Transceiver
No ratings yet
A Viterbi Decoder Architecture For A Standard-Agile and Reprogrammable Transceiver
10 pages
Fpga Implementation For The Linking of Cell Tracks Using New Structure Algorithm
No ratings yet
Fpga Implementation For The Linking of Cell Tracks Using New Structure Algorithm
6 pages
FPGA Implementation of Soft Output Viterbi Algorithm Using Memoryless Hybrid Register Exchange Method
No ratings yet
FPGA Implementation of Soft Output Viterbi Algorithm Using Memoryless Hybrid Register Exchange Method
9 pages
Viterbi Algorithm
100% (1)
Viterbi Algorithm
21 pages
Minimized Method
No ratings yet
Minimized Method
5 pages
Implementation of Viterbi Encoder and Decoder On Fpga: Batch 28 Pramod M and Makarand K Patil
No ratings yet
Implementation of Viterbi Encoder and Decoder On Fpga: Batch 28 Pramod M and Makarand K Patil
18 pages
Supervisor: A.Sammaiah, Associate Professor Name: Srinivas - Ambala H.T.No: 09016T7213 Branch: Vlsi & Es
No ratings yet
Supervisor: A.Sammaiah, Associate Professor Name: Srinivas - Ambala H.T.No: 09016T7213 Branch: Vlsi & Es
5 pages
Viterbi Decoding Techniques For The TMS320C55x DSP Generation
No ratings yet
Viterbi Decoding Techniques For The TMS320C55x DSP Generation
27 pages
Design and Implementation of Viterbi Decoder Using VHDL: IOP Conference Series: Materials Science and Engineering
No ratings yet
Design and Implementation of Viterbi Decoder Using VHDL: IOP Conference Series: Materials Science and Engineering
7 pages
Implementation of Adaptive Viterbi Decoder
No ratings yet
Implementation of Adaptive Viterbi Decoder
7 pages
Turbo Decoding Using SOVA
100% (1)
Turbo Decoding Using SOVA
87 pages
Turbo Decoding Using SOVA PDF
No ratings yet
Turbo Decoding Using SOVA PDF
87 pages
A Reconfigurable, Power-Efficient Adaptive Viterbi Decoder
No ratings yet
A Reconfigurable, Power-Efficient Adaptive Viterbi Decoder
5 pages
A Low Power VITERBI Decoder Design With Minimum Transition Hybrid Register Exchange Processing For Wireless Applications
No ratings yet
A Low Power VITERBI Decoder Design With Minimum Transition Hybrid Register Exchange Processing For Wireless Applications
9 pages
A Fast Maximum-Likelihood Decoder For Convolutional Codes
No ratings yet
A Fast Maximum-Likelihood Decoder For Convolutional Codes
5 pages
Hardware Implementation: Branch Metric Unit (BMU)
No ratings yet
Hardware Implementation: Branch Metric Unit (BMU)
6 pages
Multi Core Processing For Software Radio PDF
No ratings yet
Multi Core Processing For Software Radio PDF
6 pages
High Speed Convolution Encoding and Viterbi Decoding Using Dynamic Shift Register
No ratings yet
High Speed Convolution Encoding and Viterbi Decoding Using Dynamic Shift Register
8 pages
Viterbi Decoding
No ratings yet
Viterbi Decoding
4 pages
RTL Implementation of Viterbi Decoder Using VHDL: Hiral Pujara, Pankaj Prajapati
No ratings yet
RTL Implementation of Viterbi Decoder Using VHDL: Hiral Pujara, Pankaj Prajapati
7 pages
Algebraic Survivor Memory Management Design For Viterbi Detectors
No ratings yet
Algebraic Survivor Memory Management Design For Viterbi Detectors
6 pages
PPHXL 4r01j
No ratings yet
PPHXL 4r01j
5 pages
ITC Project
No ratings yet
ITC Project
9 pages
A Low Power Asynchronous Viterbi Decoder Using LEDR Encoding
No ratings yet
A Low Power Asynchronous Viterbi Decoder Using LEDR Encoding
6 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Module 5 DC
No ratings yet
Module 5 DC
47 pages
Performance Evaluation For Convolutional Codes Using Viterbi Decoding
No ratings yet
Performance Evaluation For Convolutional Codes Using Viterbi Decoding
6 pages
Bass Paper 1
No ratings yet
Bass Paper 1
4 pages
Intelligent Technologies for Research and Engineering
From Everand
Intelligent Technologies for Research and Engineering
S. Kannadhasan
No ratings yet
Correspondence: Management in A
No ratings yet
Correspondence: Management in A
3 pages
Israsena, Kale 2006 - 2 Final
No ratings yet
Israsena, Kale 2006 - 2 Final
5 pages
500Mb/s Soft Output Viterbi Decoder: Engling Yeo Stephanie Augsburger, Wm. Rhett Davis, Borivoje Nikolić
No ratings yet
500Mb/s Soft Output Viterbi Decoder: Engling Yeo Stephanie Augsburger, Wm. Rhett Davis, Borivoje Nikolić
32 pages
 L38: Viterbi Decoder 저전력 설계: Sungkyunkwan Univ. Vada Lab
No ratings yet
 L38: Viterbi Decoder 저전력 설계: Sungkyunkwan Univ. Vada Lab
33 pages
An Information-Theoretic Framework For Receiver Quantization in Communication - Single
No ratings yet
An Information-Theoretic Framework For Receiver Quantization in Communication - Single
35 pages
Mmse Asip
No ratings yet
Mmse Asip
93 pages
Viterbi Implementation
No ratings yet
Viterbi Implementation
19 pages
Viterbi Detector: Review of Fast Algorithm and Implementation
No ratings yet
Viterbi Detector: Review of Fast Algorithm and Implementation
12 pages
A Low-Power Viterbi Decoder Design For Wireless Communications Applications
No ratings yet
A Low-Power Viterbi Decoder Design For Wireless Communications Applications
5 pages
Design of Convolutional Encoder and Viterbi Decoder Using MATLAB
No ratings yet
Design of Convolutional Encoder and Viterbi Decoder Using MATLAB
7 pages
Viterbi Decoding of Convolutional Codes: Ecture
No ratings yet
Viterbi Decoding of Convolutional Codes: Ecture
11 pages
Computationally Efficient Vector Perturbation Precoding Using Thresholded Optimization
No ratings yet
Computationally Efficient Vector Perturbation Precoding Using Thresholded Optimization
11 pages
SVM Ecc
No ratings yet
SVM Ecc
6 pages
Chap 4
No ratings yet
Chap 4
178 pages
A Novel Area-Efficient VLSI Architecture For Recursion Computation in LTE
No ratings yet
A Novel Area-Efficient VLSI Architecture For Recursion Computation in LTE
5 pages
Digital Communications
No ratings yet
Digital Communications
325 pages
FPGA Implementation of Viterbi Decoder
No ratings yet
FPGA Implementation of Viterbi Decoder
6 pages
Low Constraint Length and High Performance Viterbi Decoder Using VHDL
No ratings yet
Low Constraint Length and High Performance Viterbi Decoder Using VHDL
7 pages
Example: Viterbi Algorithm: XN G N G N
No ratings yet
Example: Viterbi Algorithm: XN G N G N
7 pages
13964-Article Text-24831-1-10-20230710
No ratings yet
13964-Article Text-24831-1-10-20230710
9 pages
Design and Optimization Techniques of High Speed VLSI Circuit
No ratings yet
Design and Optimization Techniques of High Speed VLSI Circuit
310 pages
Design and Optimization Techniques of High-Speed VLSI Circuits
No ratings yet
Design and Optimization Techniques of High-Speed VLSI Circuits
310 pages
Lecture 9 Viterbi Decoding of Convolutional Code
No ratings yet
Lecture 9 Viterbi Decoding of Convolutional Code
9 pages
Viterbi Decoding of Convolutional Codes: Hapter
No ratings yet
Viterbi Decoding of Convolutional Codes: Hapter
15 pages
Basis Worksheet
No ratings yet
Basis Worksheet
52 pages
H53015302 TRQ XXX
No ratings yet
H53015302 TRQ XXX
2 pages
W5-Group III Cations
No ratings yet
W5-Group III Cations
10 pages
Chapter 4 Cheat Sheet
No ratings yet
Chapter 4 Cheat Sheet
4 pages
Grundfosliterature 5769232
No ratings yet
Grundfosliterature 5769232
14 pages
Arholwr Yn Unig: Examiner Only
No ratings yet
Arholwr Yn Unig: Examiner Only
4 pages
Din 653
No ratings yet
Din 653
5 pages
Mimo Introduction
No ratings yet
Mimo Introduction
13 pages
3 Phase Power Measurement
No ratings yet
3 Phase Power Measurement
6 pages
Grouping of Resistances-1
No ratings yet
Grouping of Resistances-1
13 pages
Perio Instruments
100% (3)
Perio Instruments
32 pages
Motion and Its Types - What Is Motion - Types of Motion PPT 2
No ratings yet
Motion and Its Types - What Is Motion - Types of Motion PPT 2
1 page
Application of Machine Learning
No ratings yet
Application of Machine Learning
11 pages
Paper 1 Topic 4 - SL Questions
No ratings yet
Paper 1 Topic 4 - SL Questions
2 pages
Yu 2017 Centrifugal Microfluidics For Sorti
No ratings yet
Yu 2017 Centrifugal Microfluidics For Sorti
12 pages
Varian Catalog GPC-SEC
No ratings yet
Varian Catalog GPC-SEC
40 pages
Space, Number, and Geometry: From Helmholtz To Cassirer
100% (3)
Space, Number, and Geometry: From Helmholtz To Cassirer
258 pages
Classification of Air Masses and Fronts - Geography Optional - UPSC - Digitally Learn
No ratings yet
Classification of Air Masses and Fronts - Geography Optional - UPSC - Digitally Learn
14 pages
Multiflex Assembly Instructions
No ratings yet
Multiflex Assembly Instructions
52 pages
13-Universal Bridgeless Non Isolated Battery Charger With Wide Output Voltage Range
No ratings yet
13-Universal Bridgeless Non Isolated Battery Charger With Wide Output Voltage Range
12 pages
CGMT Syllabus
No ratings yet
CGMT Syllabus
1 page
Re Order Level
No ratings yet
Re Order Level
23 pages
Cie 15 2004 Tables
No ratings yet
Cie 15 2004 Tables
34 pages
Rotational Motion - Torque and Center of Gravity
No ratings yet
Rotational Motion - Torque and Center of Gravity
39 pages
Frafos ABC SBC Brochure
No ratings yet
Frafos ABC SBC Brochure
4 pages
Module 2 Lab: Creating Data Types and Tables
No ratings yet
Module 2 Lab: Creating Data Types and Tables
5 pages
Wellcare Oil Tools Private Limited
No ratings yet
Wellcare Oil Tools Private Limited
4 pages

Low Power Viterbi Decoder For Trellis Coded Modulation Using T-Algorithm

Uploaded by

Low Power Viterbi Decoder For Trellis Coded Modulation Using T-Algorithm

Uploaded by

International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 1

Low power Viterbi decoder for Trellis coded

I. INTRODUCTION: supply voltage[6].Over scaling of the supply voltage is

high. An add-compare select unit (ACSU) architecture

II. VITERBI DECODER Min[p2k-1-1,0(n-1)+B2k-1-1,0(n),P2k-1-1,1(n-1)+B2k-1-1,1(n),…..,P2k-1-1,p

A general diagram for a viterbi decoder is shown in fig. 1. P1,1(n-1)+B1,1(n),…..,

In the design example shown in[9], with a coding rate of ¾

B. Choosing Precomputation steps Computational overhead (compared with

The estimated computational overhead according to (5) is

A. One step precomputation

For the convenience of our discussion we define the

IV. LOW POWER VITERBI DECODER DESIGN

cluster1 = ,Pm | 0 ≤m≤ 63, m mod 4 = 2}

cluster0 = ,Pm | 0 ≤m≤ 63, m mod 4 = 0} (7)

BMG3 = {Bm | 0≤m≤ 15, m mod 4 = 3}

BMG2 = {Bm | 0≤m≤ 15, m mod 4 = 1}

BMG1 = {Bm | 0≤m≤ 15, m mod 4 = 2}

BMG0 = {Bm | 0≤m≤ 15, m mod 4 = 0} (8)

The optimal PM at time n is calculated as

Popt (n) = min [min {min (cluster0 (n-2))+ min

(BMG0 (n-1)), min (cluster1 (n-2))+

We again need to analyze the trellis transition of the

In this section, we address an important issue regarding

Implementing the 4-to-2 priority encoder is much simpler

[2] 11, pp. 1389–1400, Nov. 1997. ‚Bandwidth- efﬁcient

*3+ J. B. Anderson and E. Offer, ‚Reduced-state sequence

*4+ C. F. Lin and J. B. Anderson, ‚ T-algorithm decoding of

*5+ S. J. Simmons, ‚Breadth-ﬁrst trellis decoding with

[6] R. A. Abdallah and N. R. Shanbhag, ‚Error-resilient

[7] J. Jin and C.-Y. Tsui, ‚Low-power limited-search parallel

You might also like