0% found this document useful (0 votes)
2 views9 pages

Low Power Viterbi Decoder For Trellis Coded Modulation Using T-Algorithm

This paper presents a low power Viterbi decoder architecture for Trellis Coded Modulation (TCM) using a precomputation T-algorithm, achieving up to 70% power reduction with minimal speed loss. The proposed design effectively addresses the high complexity and power consumption issues associated with Viterbi decoders in TCM systems. Implementation results demonstrate the architecture's efficiency for a ¾ convolutional code rate with a constraint length of 7.

Uploaded by

Javeed Mohammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views9 pages

Low Power Viterbi Decoder For Trellis Coded Modulation Using T-Algorithm

This paper presents a low power Viterbi decoder architecture for Trellis Coded Modulation (TCM) using a precomputation T-algorithm, achieving up to 70% power reduction with minimal speed loss. The proposed design effectively addresses the high complexity and power consumption issues associated with Viterbi decoders in TCM systems. Implementation results demonstrate the architecture's efficiency for a ¾ convolutional code rate with a constraint length of 7.

Uploaded by

Javeed Mohammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 1

ISSN 2229-5518

Low power Viterbi decoder for Trellis coded


Modulation using T-algorithm
Md.Javeed, B.Sri lakshmi

Abstract: The viterbi decoder which is low power with the convolutional encoder for the trellis coded modulation is shown in this paper. Convolutional
encoding with Viterbi decoding is a good forward error correction technique suitable for channels affected by noise degradation. In this paper it
shows the viterbi decoder architecture with convolutional encoder with proposed precomputation T -algorithm which can effectively reduce the power
consumption with negligible decrease in the speed. Implementation result is for ¾ convolutional code rate with constraint length 7 used for trellis coded
modulation. This architecture reduces the power consumption up to 70% without any performance loss, while the degradation in clock speed is
negligible.

Key words: Convolutional code, T-algorithm, Trellis coded modulation (TCM), viterbi decoder, VLSI.

I. INTRODUCTION: supply voltage[6].Over scaling of the supply voltage is


having a problem that it needs to take whole system into
The use of convolutional codes with probabilistic decoding consideration including with VD at which we are not
can significantly improve the error performance of a focusing of our research. In practical application RSSD is
communication system [1]. Trellis coded modulation more commonly used than M-Algorithm which is generally
schemes are used in many bandwidth efficient systems. not as efficient as M-algorithm[3] and T-Algorithm.
Typically a TCM system employs a high rate convolutional Basically M-Algorithm requires a sorting process in a
code, which leads to high complexity of viterbi decoder for feedback loop where as T–
the TCM decoder, when the constraint length of
Convolutional code is also normal. For example the rate ¾ Algorithm only searches for the optimal path metric [P]
convolutional code used in trellis coded modulation system that is the maximum value or the minimum value of Ps.
for any application has a constraint length of 7 will be in the
complexity of the corresponding viterbi decoder for a rate T-Algorithm has been shown to very efficient in
½ convolutional code with constraint length of 9 [2] due to reducing the power consumption [7],[8]. However,
the large number of transitions in the trellis. So, In terms of searching for the optimal path metric in the feedback loop
power consumption, the viterbi decoder is dominant still reduces the decoding speed. To overcome this
module in a TCM decoder. In order to reduce the drawback, T-Algorithm has proposed in two variations, the
computational complexity as well as power consumption, relaxed adaptive VD [7], Which suggests using an
low power schemes should be exploited for the VD in a estimated optimal path metric, instead of finding the real
TCM decoder. one each cycle and the limited-search parallel state VD
based on scarce state transition [SST][8].
General solutions for low power viterbi decoder design will
be studied in our implementation work. Power reduction in When applied to high rate convolutional codes, the
VDs could be achieved by reducing the number of states, relaxed adaptive VD suffers a severe degradation of bit-
(for example reduced state sequence decoding [3], M- error-rate(BER) performance due to the inherent drifting
algorithm [4] and T-algorithm [1],[5],) or by over scaling the error between the estimated optimal path metric and the
accurate one[9]. On the other hand the SST based scheme
requires predecoding and re encoding process and is not
suitable for TCM decoders. In TCM, the encoded data are
MD. Javeed is pursuing his master of technology in VLSI systems in Bomma
always associated with a complex multi level modulation
institute of technology and science , Jawaharlal Nehru technological
university, India. (E-mail: [email protected])
scheme like 8-ary phase shift keying (8PSK) OR 16/64-ary
quadrature amplitude modulation (16/64QAM) through a
B. Sri lakshmi is currently working as an assi.prof. in electronics and constellation point mapper. At the receiver, a soft input VD
communication engineering in Jawaharlal Nehru technological university, should be employed to guarantee a good coding gain. So,
India.( E-mail: [email protected])
the computational over head and decoding latency due to
predecoding and re encoding of the TCM signal become
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 2
ISSN 2229-5518

high. An add-compare select unit (ACSU) architecture


based on precomputation for VDs incorporating T- A. Precomputation Algorithm
Algorithm [9], which efficiently improves the clock speed
The Basic idea of the precomputation algorithm
of a VD with T-Algorithm for a rate ¾ code. Now, we
was presented in [9]. The Branch metric can be calculated
further analyze the precomputation algorithm. A
by two types: Hamming distance and Euclidean distance
systematic way to determine the optimal precomputation
[10]. Consider a VD for a convoluional code with a
steps is shown, where the minimum number of steps for
constraint length k, where each state receives p candidate
critical path to achieve the theoretical iteration bound is
paths. First, we expand Ps at the current time slot n(Ps(n))
calculated and the computational complexity overhead due
as a function of Ps(n-1)to form a look-ahead computation of
to precomputation is evaluated. Then, we discuss a
the optimal P-Popt (n). If branch metrics are calculated
complete low-power VD design for the rate ¾
based on the Euclidean distance, popt(n) is the minimum
convolutional code[2]. Finally ASIC implementation results
value of Ps(n) can be get as
of VD with convolutional encoding are shown.
popt(n) =min{p0(n),p1(n),……..p2-1k(n)}
In this paper section II gives Information about VDs
.Section III presents the precomputation architecture with =min{min[p0,0(n-1)+B0,0(n),p0,1(n-1)+B0,1(n)……….,
T-algorithm. Design example with the modifications of p0,p(n-1)+B0,p(n)],
survivor path memory unit(SMU) are discussed In section
IV. Synthesis and power estimation results are shown in Min[p1,0(n-1)+B1,0(n),p1,1(n-1)+B1,1(n),……,p1,p(n-1) +B1,p(n)],
section V. ……..,

II. VITERBI DECODER Min[p2k-1-1,0(n-1)+B2k-1-1,0(n),P2k-1-1,1(n-1)+B2k-1-1,1(n),…..,P2k-1-1,p


(n-1) +B2k-1- 1,p(n)]}

=min{P0,0(n-1)+B0,0(n),

P0,1(n-1)+B0,1(n),…….,

P0,p(n-1)+B0,p(n),

P1,0(n-1)+B1,0(n),

A general diagram for a viterbi decoder is shown in fig. 1. P1,1(n-1)+B1,1(n),…..,


First , branch metrics are calculated in the B unit (BMU)
P1,p(n-1)+B1,p(n),…….,
from the received symbols. In a TCM decoder, this module
is replaced by transition metrics unit (TMU), which is more P2k-1-1,0(n-1)+B2k-1-1,0(n),
complex than the BMU. Then, Bs are fed into the ACSU that
recursively compute the path metrics (Ps) and outputs P2k-1-1,1(n-1)+B2k-1-1,1(n),……,
decision bits for each possible state transition. After that,
the decision bits are stored in and retrieved from the SMU P2k-1-1,p(n-1)+B2k-1-1,p(n)}. (1)
in order to decode the source bits along the final survivor
Now, we group the states into several clusters to reduce the
path. The Ps of the current iteration are stored in the path
metric unit (PMU). computational overhead caused by look-ahead
computation. The trellis butterflies for a VD usually have a
For calculating the optimal Ps and puncturing states T- symmetric structure. In other words, the states can be
Algorithm requires extra computation in the ACSU loop. grouped into m clusters, where all the clusters have the
Therefore, a straight forward implementation of T- same number of states and all the states in the same cluster
Algorithm will dramatically reduce the decoding speed. will be extended by the same Bs. Thus (1) can be rewritten
The key point of improving the clock speed of T-Algorithm as
is to quickly find the optimal path metric.
Popt=min{min(Ps(n-1)in cluster 1)
III. PRECOMPUTATION ARCHITECTURE
+min(Bs(n) for cluster 1),

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 3
ISSN 2229-5518

Min(Ps(n-1) in cluster 2) determine the survivor path(the path with the minimum
metric) for each state. If T-algorithm is employed in the VD,
+min(Bs(n) for cluster 2), ……… , the iteration bound is slightly longer than TACSU because
there will be another two input comparator in the loop to
Min(Ps(n-1) in cluster m)
compare the new Ps with a threshold value obtained from
+min(Bs(n) for cluster m)}. the optimal Path metric and preset T as shown in (3)

The minimum (Bs) for each cluster can be easily obtained Tbound=Tadder+Tp_in_comp+T2-in_comp. (3)
from the BMU or TMU and min(Ps) at time n-1 in each
To achieve the iteration bound expressed in(3), for the
cluster can be precalculated at the same time when the
precomputation in each pipelining stage, we limit the
ACSU is updating the new Ps for time n. Theoretically,
comparison to be among only p 0r 2p metrics. To simplify
when we continuously decompose Ps(n-1), Ps(n-2),……, the
our evaluation , we assume that each stage reduces the
precomputation scheme can be extended to Q steps. Where
number of the metrics to 1/p(or2-R) of its input metrics
q is any positive integer that is less than n. Hence Popt(n)
meeting the theoretical iteration bound should satisfy
can be calculated directly from Ps(n-q) in q cycles.
(2R)qb ≥ 2k-1. Therefore qb≥ (k-1)/R and qb is expressed as
(4), with a ceiling function.

In the design example shown in[9], with a coding rate of ¾


and constraint length of 7, the minimum precomputation
steps for the VD to meet the iteration bound is 2 according
to (4). It is the same value as we obtained from direct
architecture design [9]. In some cases, the number of
remaining metrics may slightly expand during a certain
pipeline stage after addition with Bs. Usually, the extra
delay can be absorbed by an optimized architecture or
circuit design. Even if the extra delay is hard to eliminate,
the resultant clock speed is very close to the theoretical
bound. To fully achieve the iteration bound, we could add
another pipeline stage, though it is very costly.

B. Choosing Precomputation steps Computational overhead (compared with


conventional T-algorithm) is an important factor that
In [9], through a design example that, q -step pre-
should be carefully evaluated. Most of the computational
computation can be pipelined into q stages, where the logic
overhead comes from adding Bs to the metrics at each stage
delay of each stage is continuously reduced as q increases.
as indicated in (2). In other words, If there are m remaining
As a result, the de-coding speed of the low-power VD is
metrics after comparison in a stage, the computational
greatly improved. However, after reaching a certain
overhead from this stage is at least m addition operations.
number of steps, qb, further precomputation would not
The exact overhead varies from case to case based on the
result in additional benefits because of the inherent
convolutional code’s trellis diagram. Again, to simplify the
iteration bound of the ACSU loop. Therefore, it is worth to
evaluation, we consider, a code with a constraint length k
discuss the optimal number of precomputation steps.
and q precomputation steps. Also, we assume that each
In a TCM system, the convolutional code usually has a remaining metric would cause a computational overhead of
coding rate of R/(R+1) , R=2,3,4,……, so that in (1), p=2R one addition operation. In this case, the number of metrics
and the logic delay of the ACSU is TACSU=Taddder+Tp-in_comp, will reduce at a ratio of 2(k-1)/q and the overall
where Tadder is the logic delay of the adder to compute Ps computational overhead is (measured with addition
of each candidate path that reaches the same state and Tp- operation)
in_comp is the logic delay of a p-input comparator to
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 4
ISSN 2229-5518

Noverhead=20+2(k-1)/q+22(k—1)/q……..+2(q-1)(k-1)/q

=2q.(k-1)/q-1/2(k-1)/q-1

=2k-1-1/2(k-1)/q-1 (5)

The estimated computational overhead according to (5) is


63/ (26/q-1) when k=7 and q.≤ 6, which almost exponentially
to q. In a real design the overhead increases even faster
than what is given by (5) when other factors (such as
comparisons or expansion of metrics as we mentioned
above) are taken into consideration. Therefore, a small
number of precomputational steps is preferred even though
the iteration bound may not be fully satisfied. In most
cases, one or two-step precomputation is a good choice.
BER performance of the VD employing T-algorithm with
The above analysis also reveals that precomputation is
different values of T over an additive white Gaussian noise
not a good option for low rate convolutional codes (rate of
channel is shown in Fig. 4. The simulation is based on a 4-D
1/RC, RC=2,3,…..), because it usually needs more than two
8PSK TCM system employing the rate -3/4 code [11]. The
steps to effectively reduce the critical path(in that case, R=1
overall coding rate is 11/12 after due to other uncoded bits
in(4) and qb is k-1). However, for TCM systems, where
in TCM system. Compared with over ideal viterbi
high-rate convolutional codes are always employed, Two
algorithm, the threshold ‚Tpm‛ can be lowered to 0.3 with
steps of precomputation could achieve the iteration bound
less than 0.1 dB of performance loss, while the
or make a big difference in terms of clock speed. In
computational complexity could be reduced by upto 90%
addition, the computational overhead is a small.
[9] ideally. Since the performance is the same as that of the
conventional T-algorithm.

A. One step precomputation

For the convenience of our discussion we define the


left most register in Fig. 3 as the most significant bit (MSB)
and right most register as the least significant bit (LSB). The
64 states and path metrics are labeled from 0 to 63.

IV. LOW POWER VITERBI DECODER DESIGN


We use the 4-D 8PSK TCM system described in[2] as the
example. The rate ¾ convolutional code employed in the
TCM system is shown in Fig. 3. Preliminary BER
performance and architecture design for the ACSU unit
have been shown in [9]. In his section, we further address
the SMU design issue. Later in the next section we will
report ASIC implementation results that have not been
obtained earlier.
A careful study reveals that the 64 states can be
partitioned into two groups odd numbered Ps( when ‘LSB’
is 1) And even numbered (when ‘LSB’ is 0) The odd PMs
are all extended by odd Bs (when Z0 is ‘1’) and the even
PMs are all extended by even Bs (when Z0 is ‘0’). The
minimum P becomes:
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 5
ISSN 2229-5518

Popt (n) = min {min (even Ps (n-1)) + while the odd states extend to states with lower indices (the
MSB is ‘0’ in Fig. 3). This information allows us to obtain
min(even Bs(n)), min (odd Ps (n-1)) the 2-step pre-computation data path. This process is
straightforward, although the mathematical details are
+min(odd Bs(n)) }.
tedious. For clarity, we only provide the main conclusion
The functional diagram of the 1-step pre-computation here.
scheme is shown in Fig. 5. In general (Path metric purge
The states are further grouped into 4 clusters as described
algorithm) PPAU have to wait for the new Ps from the
by (7). The BMs are categorized in the same way and are
ACSU to calculate the optimal Path metric [12], while in
described by (8).
Fig. 5 the optimal Path metric is calculated directly from Ps
in the previous cycles at the same time when the ACSU is cluster3 = {Pm | 0≤m≤ 63, m mod 4 = 3}
calculating the new Ps. The details of the PPAU are shown
in Fig. 6. cluster2 = ,Pm | 0 ≤m≤63, m mod 4 = 1}

cluster1 = ,Pm | 0 ≤m≤ 63, m mod 4 = 2}

cluster0 = ,Pm | 0 ≤m≤ 63, m mod 4 = 0} (7)

BMG3 = {Bm | 0≤m≤ 15, m mod 4 = 3}

BMG2 = {Bm | 0≤m≤ 15, m mod 4 = 1}

BMG1 = {Bm | 0≤m≤ 15, m mod 4 = 2}

BMG0 = {Bm | 0≤m≤ 15, m mod 4 = 0} (8)

The optimal PM at time n is calculated as

Popt (n) = min [min {min (cluster0 (n-2))+ min

(BMG0 (n-1)), min (cluster1 (n-2))+

The critical path of the 1-step pre-computation scheme is min (BMG1 (n-1)), min (cluster2 (n-

T1-step-pre-T = 2TAdder+ 2T4-in_comp +3T2-in-comp (6) 2)) + min (BMG3 (n-1)), min (cluster3 (n-2))+ min (BMG2(n-
1)) }+ min (even Bs(n)),
The hardware overhead of the 1-step pre-
computation scheme is about 4 adders, which is negligible. min {min (cluster0 (n-2))+ min (BMG1(n-1)),
Compared with the SEPC-T algorithm, however, the critical
path of the 1-sept pre-computation scheme is still long[12]. min (cluster1 (n-2))+ min (BMG0(n-1)),
In order to further shorten the critical path, we explore the
min (cluster2 (n-2)) + min (BMG2 (n-1)),
2-step pre-computation design next.
min (cluster3 (n-2))+ min (BMG3(n-1))
B. Two step precomputation
}+ min (odd Bs(n)) (9)
a. Acsu design

We again need to analyze the trellis transition of the


original code. In the 1-step pre-computation architecture,
we have pointed out that for the particular code shown in
Fig. 3, odd-numbered states are extended by odd Bs, while
even-numbered states are extended by even Bs.
Furthermore, the even states all extend to states with higher
indices (the MSB in Fig. 3 is ‘1’) in the trellis transition,

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 6
ISSN 2229-5518

b. SMU design

In this section, we address an important issue regarding


SMU design when T -algorithm is employed. There are two
different types of SMU in the literature: register exchange
(RE) and trace back (TB) schemes. In the regular VD
without any low-power schemes, SMU always out-puts the
decoded data from a fixed state (arbitrarily selected in
advance) if RE scheme is used, or traces back the survivor
path from the fixed state if TB scheme is used, for low-
complexity purpose. For VD in-corporated with T-
algorithm, no state is guaranteed to be active at all clock
cycles. Thus it is impossible to appoint a fixed state for
The functional block diagram of viterbi decoder with two either out-putting the decoded bit (RE scheme) or starting
step precomputation T-algorithm is shown in fig. 7 The the trace-back process (TB scheme). In the conventional
minimum value of each branch metric group (BMG) can be implementation of T -algorithm, the decoder can use the
calculated in BMU or TMU and then passed to threshold optimal state (state with Popt ), which is always enabled, to
generator unit (TGU) to calculate(Popt+T)- (Popt+T) and the output or trace back data. The process of searching for Popt
new Ps are compared in the ‚purge unit‛. The architecture can find out the index of the optimal state as a byproduct.
of TGU is shown in fig. 8 which implements the key However, when the estimated Popt is used [8], or in our
functions of two stem precomputation scheme. In figure 8 case Popt is calculated from PMs at the previous time slot,
the ‚MIN 16‛ unit for finding the minimum value in each it is difficult to find the index of the optimal state.
cluster is constructed with two stages of four-input
comparators. This architecture has been optimized to meet A practical method is to find the index of an
the iteration bound [9]. Com-pared with the conventional enabled state through a 2k-1 to k-1 priority encoder. Suppose
T-algorithm, the computational overhead of this that we have labeled the states from 0 to 63. The output of
architecture is 12 addition operations and a comparison, the priority encoder would be the unpurged state with the
which is slightly more than the number obtained from the lowest index. Assuming the purged states have the flag ‚0‛
evaluation in (5) and other states are assigned the flag ‚1‛, the truth table of
such a priority encoder is shown in Table I, where ‚flag‛ is
the input and ‚index‛ is the output. Implementation of
such a table is not trivial. In our design, we employ efficient
architecture for the 64-to-6 priority encoder based on three
4-to-2 priority encoders, as shown in Fig. 7. The 64 flags are
first divided into 4 groups, each of which contains 16 flags.
The priority encoder at level 1 detects which group contains
at least one ‚1‛ and determines ‚index [5:4+‛. Then MUX2
selects one group of flags based on ‚index *5:4+‛. The input
of the priority encoder at level 2 can be computed from the
output of MUX2 by ‚OR‛ operations. We can also reuse the
intermediate results by introducing another MUX (MUX1).
The output of the priority encoder at level 2 is ‚index [3:2+‛.
Again, ‚index [3:2+‛ selects four flags (MUX3) as the input
of the priority en-coder at level 3. Finally, the last encoder
will determine ‚index *1:0+‛

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 7
ISSN 2229-5518

Implementing the 4-to-2 priority encoder is much simpler


than implementing the 64-to-6 priority encoder. Its truth
table is shown in Table II and the corresponding logics are
shown in (10) and (11)
V. IMPLEMENTATION RESULTS
The full-trellis VD, the VD with the two-step
precomputation architecture and one with the conventional
algorithm are modeled with Verilog HDL code. The soft
inputs of all VDs are quantized with 7 bits. Each PM in all
VDs is quantized as 12 bits. RE scheme with survival length
of 42 is used for SMU and the register arrays associated
Table I
Truth table of 64-to-6 Priority Encoder with the purged states are clock-gated to reduce the power
Flag[63:0] Index[5:0] consumption in SMU. For ASIC synthesis, we use TSMC
x x ……………………………x x x x x 1 000000 90-nm CMOS standard cell. The synthesis targets to achieve
the maximum clock speed for each case and the results are
x x ……………………………x x x x 1 0 000001
shown in Table III. Table III shows that the VD with two-
x x ……………………………x x x 1 0 0 000010
step precomputation architecture only decreases the clock
x x ……………………………x x 1 0 0 0 000011
speed by 11% compared with the full trellis VD.
x x ……………………………x 1 0 0 0 0 000100
Meanwhile, the increase of the hardware area is about 17%.
: : The decrease of clock speed is inevitable since the iteration
: : bound for VD with T -algorithm is inherently longer than
: : that of the full-trellis VD. Also, any kinds of low-power
X 1 0 ………………………..0 0 0 0 0 0 111110 scheme would introduce extra hardware like the purge unit
1 0 0 ………………………..0 0 0 0 0 0 111111 shown in Fig. 5 or the clock-gating module in the SMU.
Therefore, the hardware overhead of the proposed VD is
Table II expected. On the other hand, the VD with conventional T-
Truth table of 4-to-2 priority encoder
algorithm cannot achieve half of the clock speed of the full
Input(I[3:0]) Output(O[1:0]) trellis VD.
xxx1 00
xx10 01 Therefore, for high-speed applications, it should
x100 10 not be considered. It is worth to mention that the
1000 11 conventional T -algorithm VD takes slightly more hardware
than the proposed architecture, which is counterintuitive.
Table III This is because the former decoder has a much longer
Synthesis results for maximum clock speed critical path and the synthesis tool took extra measures to
Max Cell area(mm2) improve the clock speed. It is clear that the conventional T-
speed(MHZ) algorithm is not suitable for high-speed applications. If the
Full-trellis VD 505 0.58
target throughput is moderately high, the proposed
VD with 2-step 446.4(-11.6%) 0.68(+17.2%)
pre-computation architecture can operate at a lower supply voltage, which
Conventional 232(-54.1%) 0.685(+18%) will lead to quadratic power reduction compared to the
T-algorithm conventional scheme. Thus we next focus on the power
comparison between the full trellis VD and the proposed
scheme. We estimate the power consumption of these two
Table IV designs with Synopsys Prime Power under the clock speed
Power estimation results of 200 Mb/s (power supply of 1.0 V, temperature of 300 K).
Power(mw) A total of 1133 received symbols (12 000 bits) are simulated.
Full-trellis VD 21.473(100%) The results are shown in Table IV. With the finite word-
VD with 2-step Tpm=0.75 20.069(93.5%) length implementation, the threshold can only be changed
pre-computation Tpm=0.625 17.186(80.0%) by a step of 0.125. Therefore, to maintain a good BER
architecture Tpm=0.5 11.754(54.7%) performance, the minimum threshold we chose is 0.375.
Tpm=0.375 6.6127(30.8%) Table IV shows that, as the threshold decreases, the power
IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 8
ISSN 2229-5518

consumption of the pro-posed VD is reduced accordingly. transition,‛ IEEE Trans. Very Large Scale Integr. (VLSI)
In order to achieve the same BER performance, the Syst. , vol. 15, no. 11, pp. 1172–1176, Oct. 2007.
proposed VD only consumes 30.8% the power of the full-
trellis VD. *8+ F. Sun and T. Zhang, ‚Low power state-parallel relaxed
adaptive viterbi decoder design and implementation,‛ in
VI. CONCLUSION Proc. IEEE ISCAS, M ay 2006, pp. 4811–4814.

We have proposed a low-power VD design for TCM *9+ J. He, H. Liu, and Z. Wang, ‚A fast ACSU architecture
systems. The precomputation architecture that incorporates for viterbi de-coder using T-algorithm,‛ in Proc. 43rd IEEE
T-algorithm efficiently reduces the power consumption of Asilomar Conf. Signals,Syst. Comput. , Nov. 2009, pp. 231–
VDs without reducing the decoding speed appreciably. We 235.
have also analyzed the precomputation algorithm, where
the optimal precomputation steps are calculated and [10] K. S. Arunlal and Dr. S. A. Hariprasad‛ An efficient
discussed. This algorithm is suitable for TCM systems viterbi decoder‛ International Journal of Advanced
which always employ high-rate convolutional codes. Information Technology (IJAIT) Vol. 2, No.1, February 2012
Finally, we presented a design case. Both the ACSU and
[11] J. He, Z. Wang, and H. Liu, efficient 4-D 8PSK TCM
SMU are modified to correctly de-code the signal. ASIC
decoder architecture,‛ IEEE Trans. Very Large Scale Integr.
synthesis and power estimation results show that,
(VLSI) Syst. , vol. 18, no. 5, pp. 808–817, May 2010.
compared with the full-trellis VD without a low-power
scheme, the precomputation VD could reduce the power *12+. A.A. Peshattiwar & Tejaswini G. Panse ‚High Speed
consumption by 70% with only 11% reduction of the ACSU Architecture for Viterbi Decoder Using T-
maximum decoding speed. Algorithm‛ International Journal of Electrical and
Electronics Engineering (IJEEE) ISSN (PRINT): 2231 – 5284,
VII. REFERENCES
Vol-1, Iss-3, 2012
*1+ F. Chan and D. Haccoun, ‚Adaptive viterbi decoding of
convolutional codes over memory less channels,‛ IEEE
Trans. Commun. , vol. no. 45,

[2] 11, pp. 1389–1400, Nov. 1997. ‚Bandwidth- efficient


modulations,‛ Consultative Committee For Space Data
System, Matera, Italy, CCSDS 401(3.3.6) Green Book, Issue
1, Apr. 2003.

*3+ J. B. Anderson and E. Offer, ‚Reduced-state sequence


detection with convolutional codes,‛ IEEE Trans. Inf.
Theory , vol. 40, no. 3, pp. 965–972, May 1994.

*4+ C. F. Lin and J. B. Anderson, ‚ T-algorithm decoding of


channel convolutional codes,‛ presented at the Princeton
Conf. Info. Sci. Syst., Princeton, NJ, Mar. 1986.

*5+ S. J. Simmons, ‚Breadth-first trellis decoding with


adaptive effort,‛IEEE Trans. Commun. , vol. 38, no. 1, pp.
3–12, Jan. 1990.

[6] R. A. Abdallah and N. R. Shanbhag, ‚Error-resilient


low-power viterbi decoder architectures,‛ IEEE Trans.
Signal Process. , vol. 57, no. 12, pp. 4906–4917, Dec. 2009.

[7] J. Jin and C.-Y. Tsui, ‚Low-power limited-search parallel


state viterbi decoder implementation based on scarece state

IJSER © 2012
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012 9
ISSN 2229-5518

IJSER © 2012
http://www.ijser.org

You might also like