Design of The BCJR Decoding Algorithm With Reduced Space Complexity
Design of The BCJR Decoding Algorithm With Reduced Space Complexity
Dejan Spasov
Faculty of Computer Science and Engineering,
Skopje, Macedonia
[email protected]
Abstract - Given an M-state (recursive) convolutional the expense of acceptable 0.1 dB loss of the error
encoder, we show that, in theory, computing the forward correction capability of the Turbo system.
alpha probabilities of the BCJR decoding algorithm can be
done with 2M memory elements. Building on this idea we
propose new design with reduced space complexity for the
II. THEORETICAL FOUNDATIONS
original BCJR algorithm. Initial experiments with rate-1/2 In binary case, the Maximum A Posteriori (MAP)
1025-bit-long Turbo Codes show possibility for compressing probability estimate at the time i, Li, is given as logarithm
the size of memory for the alpha probabilities for about of the ratio of two probability functions, i.e. when the
97%. observed bit is 1 and when the observed bit is 0 [9], [10]
1 1
n, but only on the number of encoder states M. Reduced 0/1
1/1
1/1
1/0
2 1/0 2
1/0
space complexity comes at the cost of increased time 3
0/1
3
4 4 4 0/0 1 4
1/
complexity; thus, additional (but parallelizable)
1 /0
5 5
0/ 1
0/
6 6
1/0
1/0 7 7
the algorithm. In Section III we propose practically 8 8 8 8 8
1
9 /0 9
realizable BCJR decoder with reduced memory. However,
0/
0/
1
1
10 10
1/1 0/0 1/1 0/0 1/1 0/
0
11 11
1/1
12 12 12 0/0 12
decoder cannot achieve the complexity of the theoretical 1/1
13 0/0
1/1
13
1/1
model. In Section IV we use parallel concatenated Turbo 14
15 0/0
14
15
Codes with RSC encoders to perform simulations with our
modifications to the BCJR algorithm. We have observed
Figure 1. Trellis diagram of a 16-state convolutional code
memory reduction for 97% the forward probabilities at
588
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on December 17,2021 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.
with pre-stored i s and outputs (1). Thus with the forward recursion we only need to find
The computation of the forward probabilities starts all n
1 s , without storing any intermediary i s .
from the initial conditions at the time i=0 Then, during the backward computational stage we
include additional processor (fig. 2) that backward-
0 s 1 s 0 recursively computes i s from i 1 s .
0 s 0 s 0
We should note that this description of the BCJR
algorithm can be applied on arbitrary block code.
and following the edges of the trellis with non-zero branch A. BCJR Decoding of Convolutional Codes
probabilities i s, s' , the decoder, at each iteration stores In theory, any algorithm that solves system of linear
all i s and computes i 1 s , according to equations can be used in the backward stage to solve (3).
However, this approach may impose quadratic slowdown
in the backward stage. Convolutional codes have certain
i1 s' i s i s, s' , c regularity that helps in solving (3) faster. Convolutional
s
codes over an alphabet of q elements have regular trellises
that are made of M/q subgraphs known as butterflies.
Thus, in the case of convolutional codes, (3) can be split
The computation of the backward βprobabilities
into M/q independent systems of linear equations with q
starts from the initial conditions at the time i=N unknowns.
0 s 1 N s
Example of a butterfly with corresponding channel
and state probabilities for a convolutional code over
binary alphabet is given on figure 3. The system of linear
and following the edges with non-zero branch equations that corresponds to the butterfly is given with
probabilities i s, s' , the decoder at each time i
computes i s , according to i 1 u i x i x, u i y i y, u
i 1 v i x i x, v i y i y, v
i s' i
1 s i s, s' , s' 0,.., M
1
s Solving (6) for the unknowns i x and i y , we
obtain the expression that has to be computed in the
During the backward stage our goal is to recompute backward decoding stage. Due to the symmetry in
i 1 s from i s thus to eliminate the need to store all computations we use
i s in memory. We will consider (3) as system of M
linear equations with M unknowns, i.e i s . Solving i1 u i y, v
i1 v i y, u
i x
this system will give us the values i s . i x, u i y, v
i x, v i y, u
Let assume that with the forward recursion we have
arrived at the time i=N-1 and we have computed all state x at the time i.
n
1 s . Next, we compute Ln
1 and in parallel with the
recursion, we recompute all n
2 s from n
1 s We III. DESIGN OF THE DECODER
compute Ln
2 and repeat this for all information bits. In theory, forward and backward computation can
be designed with memory size of just 2M. However, the
finite precision of computer arithmetic introduces two
types of errors that can deviate the decoder from the ideal
behavior.
The first type of errors are the round-off errors. These
errors occur during the many exponential computations of
Figure 2. Backward computation of the i s probabilities Figure 3. Binary butterfly of a convolutional code with corresponding
metrics
589
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on December 17,2021 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.
the branch metrics i s, s' in the forward and the algorithm:
backward stage. Round-off errors can cumulatively for each information bit i=n-1 downto 0
become so large that they will make the decoder
catastrophic. To avoid this from happening, at this point if (i mod T == 0)
of research, we use higher precision arithmetic – the restore i s from memory
IEEE’s 754 double precision format. In addition we
periodically store all i s at regular time intervals that else
are multiple of a constant T . Then we employ backward for each state s<M
computations between two successive storage points
using (5) compute i s
iT s and (i1)T s (fig. 4).
The second type of errors are the cancellation errors. i s t ,i * i s //denormalization
For example, in a butterfly it may happen that
i y i x . Thus after the round-off stage, (6) will
IV. PRACTICAL RESULTS
become
In testing our designs we have used parallel
i 1 u i y i y, u concatenated Turbo Codes (fig. 5). To be more precise
. (8) we use rate-1/2 16-state 1025-bit long Turbo code over
i 1 v i y i y, v Gaussian channel. In addition, we use IEEE’s 754 double
precision format to store state and branch metrics,
We see that the value of i x is lost during the although we are fully aware that this format will not be
forward recursion. In the case of the BCJR algorithm we very useful in implementing this algorithm on a chip.
do not recover from these errors because they do not The comparison between our algorithm and the
affect the performance. traditional BCJR algorithm is given on figure 5. We can
In addition, it may happen i x to become negative. notice performance degradation when going from the
In this situation we can set i x to zero or we can take regular BCJR decoder to the BCJR decoder with reduced
memory size. For this simulation, we have chosen that the
absolute value of (7).
periodical storage points to be at distance T 35
The last issue that we have to resolve is the periodic information bits. At this distance we clearly have
normalization [5]. During the forward computations, observed performance loss of about 0.1 dB. However,
periodic normalization i s t ,i is introduced in order to careful research is needed in finding the largest possible
acceptable T. We should note that at some point
avoid overflow. However, during the backward
performance degradation becomes unacceptable. For
computation this normalization may create underflow. example, we have noticed that this is clearly happening
Hence we need to perform denormalization, i.e. to around T 50. From the figure it can be concluded that
multiply i s with appropriate coefficients t ,i . Using for T 35 performance loss is about 0.1 dB at BER
the original normalization coefficients t ,i , would require around 10-5 – 10-6. In our simulations round-off errors
to keep them all in memory. To avoid this we use were dominant with the BCJR algorithm, the number of
normalization coefficients t ,i , from the backward cancellation errors per frame was always less than 20 in a
matrix of size 1025×16=16400 32-bit words.
recursion in the denormalization stage of the backward
recursion. Thus we eliminate the need to store the V. CONCLUSION
original normalization coefficients t ,i . We have demonstrated a possibility to achieve
Next we give pseudo code for the backward considerable memory reduction for the BCJR algorithm.
We believe that this reduction technique is powerful
enough so that other memory optimization techniques
may become unnecessary. Moreover, with proper
parallelization (like initiating the alpha and beta
computations at the same time) we can build a decoder
with space and time complexities that will come closer to
the complexity of the Viterbi algorithm. This will open
new horizons for the convolutional codes, like using
Input x
Recursive
Convolutional Output
y1
Encoder
Interleaver
y2
Recursive
Convolutional
Encoder
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on December 17,2021 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.
reduction technique will increase the round-off errors, thus
decreasing the interval T between two storage points (fig.
4).
REFERENCES
[1] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding
of linear codes for minimizing symbol error rate,” IEEE Trans.
Inform. Theory, vol. 20, no. 3, pp. 284–287, Mar. 1974.
[2] E. Boutillon, W. J. Gross, and P. G. Gulak, “VLSI Architectures
for the MAP Algorithm,” IEEE Trans. On Communications, vol.
51, no. 2, February 2003.
[3] M. Zhan, L. Zhou, “ A Memory Reduced Decoding Scheme for
Double Binary Convolutional Turbo Code Based on Forward
Recalculation,” in 7th International Symposium on Turbo Codes
and Iterative Information Processing (ISTC), Gothenburg,
Sweden, 2012.
[4] H.-M. Choi, J.-H. Kim, and I.-C. Park, “Low-power hybrid turbo
decoding based on reverse calculation,” ISCAS. 2006. pp. 2053–
2056.
[5] Emmanuel Boutillon, Catherine Douillard, and Guido Montorsi,
“Iterative Decoding of Concatenated Convolutional Codes:
Implementation Issues,” Proc. of the IEEE, Vol. 95, No. 6, June
Figure 6. Performance comparison between MAP and reduced- 2007, pp. 1201-1227.
memory MAP with periodical storage points at distance T=35 [6] G. Colavolpe, G. Ferrari, and R. Raheli, “Reduced-State BCJR-
Type Algorithms,” IEEE Journal on Selected Areas in
longer codewords or using convolutional encoders with Communications, Vol. 19, No. 5, May 2001.
32 or even 64 states. On the hardware side, the benefit of [7] M. Sikora, D. Costello, Jr. “Heuristic Survivor Selection for
memory saving is also important, because it will provide Reduced Complexity BCJR-type Algorithms,” ISIT 2008,
space reduction, less power consumption and increased Toronto, Canada, 2008
decoding speed. [8] A. Ouardi, A. Djebbari, B. Bouazza, “Optimal M-BCJR Turbo
Popular and frequently implemented variant of the Decoding: The Z-MAP Algorithm,” Wireless Engineering and
Technology, 2011, pp. 230-234
BCJR decoder is the Log-MAP algorithm. A question that
[9] J. Hagenauer, E. Offer, and L. Papke, “Iterative Decoding of
naturally arises is if this reduction can be applied to the Binary Block and Convolutional Codes,” IEEE Trans. Inform.
Log-MAP algorithm. While Log-MAP is often considered Theory, vol. 42, no. 2, pp. 429–445, Mar. 1996.
as MAX-Log-MAP with a correction term that is added to [10] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon
(7), in its nature, this algorithm is the BCJR algorithm limit error-correcting coding and decoding: Turbo-codes,” in Proc.
with fixed-point arithmetic for the exponentiations. Thus ICC, Geneva, Switzerland, May 1993.
in the case of the Log-MAP algorithm any memory
591
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on December 17,2021 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.