Ibook - Pub Efficient Stochastic Successive Cancellation List Decoder For Polar Codes
Ibook - Pub Efficient Stochastic Successive Cancellation List Decoder For Polar Codes
Information Sciences
. RESEARCH PAPER . October 2020, Vol. 63 202303:1–202303:19
https://doi.org/10.1007/s11432-019-2924-6
Received 29 December 2019/Revised 13 March 2020/Accepted 22 May 2020/Published online 21 September 2020
Abstract Polar codes are one of the most favorable capacity-achieving codes owing to their simple struc-
tures and low decoding complexity. Successive cancellation list (SCL) decoders with large list sizes achieve
performances very close to those of maximum-likelihood (ML) decoders. However, hardware cost is a se-
vere problem because an SCL decoder with list size L consists of L copies of a successive cancellation (SC)
decoder. To address this issue, a stochastic SCL (SSCL) polar decoder is proposed. Although stochastic
computing can achieve a good hardware reduction compared with the deterministic one, its straightforward
application to an SCL decoder is not well-suited owing to the precision loss and severe latency. Therefore,
a doubling probability approach and adaptive distributed sorting (DS) are introduced. A corresponding
hardware architecture is also developed. Field programmable gate array (FPGA) results demonstrate that
the proposed stochastic SCL polar decoder can achieve a good performance and complexity tradeoff.
Keywords SCL polar decoder, stochastic computing, 2-bit decoding, distributed sorting, hardware
Citation Liang X, Wang H Z, Shen Y F, et al. Efficient stochastic successive cancellation list decoder for polar
codes. Sci China Inf Sci, 2020, 63(10): 202303, https://doi.org/10.1007/s11432-019-2924-6
1 Introduction
Polar codes, proposed by Arıkan’s breakthrough paper [1], are an exciting new class of channel codes
that can asymptotically achieve the capacity for symmetric binary-input discrete memoryless channels.
Because of its FFT-like structure and low complexity with O(N log N ) where N denotes code length,
successive cancellation (SC) decoding algorithm has become one of the most popular polar decoding
algorithms. Nevertheless, it cannot be denied that compared to maximum likelihood (ML) decoder [2],
the decoding performance of SC polar decoder still suffers from an evident degradation. To narrow
the performance gap caused by the sub-optimality of traditional successive cancellation decoder, the
successive cancellation list (SCL) polar decoding algorithm is developed in [2, 3]. Simulation results have
revealed that SCL polar decoder can outperform the low density parity check (LDPC) codes even within
high error-rate regions. To realize the spectrum efficiency and low-latency required by next generation
wireless system, Refs. [4, 5] proposed two latency-reduced SCL decoders for polar codes, respectively.
The main drawback of SCL polar decoder is its complexity increases linearly with the list size L. To
realize satisfied performance, the SCL polar decoder suffers from high complexity cost, especially when
* Corresponding author (email: [email protected])
† Xiao LIANG and Huizheng WANG have the same contribution to this work.
c Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2020 info.scichina.com link.springer.com
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:2
L is considerably large. According to the simulation over the BPSK-AWGN channel [2], it is required
L > 32 to achieve frame error rate (FER) less than 10−5 when cyclic redundancy check (CRC) is 16 bits
and the signal to noise (SNR) is 2 dB, where L denotes list size. Owing to the complexity O(LN log N ) of
SCL decoding, large L will introduce huge complexity that obstructs the efficient implementation of SCL
decoder. To address the issue, a stochastic computing based SCL polar decoding algorithm is proposed
in this paper.
Stochastic computing [6], as an approximate computing technique, has the potential to provide signif-
icantly low hardware footprint with high energy efficiency and scalability [7]. In stochastic calculation
framework, a probability number is represented by a bit-stream, hence the fundamental arithmetic oper-
ations such as additions and multiplications can be realized as simple as multiplexers (MUX) and AND
gates, respectively [8]. Moreover, even some complex arithmetic operations can also be implemented
with very elementary hardware logic [9,10]. In [11], a fully parallel stochastic Markov Chain Monte Carlo
(MCMC) multiple-input and multiple-output (MIMO) detector is developed, which can achieve a higher
throughput with lower hardware cost compared with conventional MIMO detectors. Further, the corre-
sponding design supporting iterative detection is proposed in [12]. Considering the low-complexity and
extensive computation in SCL decoding algorithm, stochastic computing offers a colossal design space
for SCL optimization owing to its advantages in area reduction and soft error resiliency.
In [13], stochastic computing is incorporated with SCL decoder with L = 1, 2. Thanks to stochastic
computing, the stochastic SCL (SSCL) decoder is expected to be an efficient paradigm which can real-
ize complexity reduction, satisfied performance, and fault tolerance. Nevertheless, applying stochastic
computing to SCL decoder directly will lead to unacceptable performance degradation and latency. This
paper devotes itself in proposing solutions to reduce SCL decoder hardware complexity. The major nov-
elties of this paper are summarized as follows. First, channel message scaling and enlarging probability
(EP) are proposed to mitigate the randomness loss of stochastic computing. Second, to reduce decoding
latency, we propose a stochastic based double level (DL) decoding scheme that can estimate two bits
each time. Furthermore, 2p -level decoding method is developed to realize arbitrary tradeoff between
performance and latency. Third, based on distributed sorting (DS), we develop an adaptive DS (ADS)
scheme that can compromise different sorting scales in stochastic multiple-level decoding. Finally, the
detailed hardware architecture for the proposed stochastic SCL decoder is developed. To the best of our
knowledge, this is the first paper to implement stochastic SCL decoder to field programmable gate array
(FPGA) platform. Compared with the deterministic-based decoding framework, it shows advantages in
the balance between performance and complexity.
The remainder of this paper is organized as follows. Section 2 re-explains the SSCL decoding algorithm
with L = 1 that is proposed by [13] firstly. Section 3 presents the SSCL decoding with L = 2. Further, the
SSCL decoding with L = 2p is presented in Section 4. The implementation of SSCL decoder is developed
in Section 5. Comparison and conclusion are presented in Sections 6 and 7.
2 Stochastic SCL decoder for polar codes with single-level decoding method
We use the notation aN 1 as shorthand for denoting a row vector (a1 , . . . , aN ). Given a vector a1 , we
N
j j N
write ai , 1 6 i, j 6 N , to denote the subvector (ai , . . . , aj ); if j < i, ai is regarded as void. Given a1 and
A ⊂ {1, . . . , N }, we write aA to denote the subvector (ai : i ∈ A).
A polar code can be defined by parameters (N, K, A, uAc ) [1], where N , K, A, and uAc are the code
length, the number of information bits, the set of information bits, and the frozen bit whose value is always
zero, respectively. The transition probabilities of binary-input discrete memoryless channels (B-DMCs)
are defined as W (y|x), and for simplicity, let (N, K) denote a polar code in this study.
△
For a polar encoding process, we denote n = log2 N and F = [ 11 01 ]. Then the codeword x = [x1 , x2 , . . .,
xN ] can be generated from x = F ⊗n u. For SC decoding, the estimated bit sequence is defined as
û = [û1 , û2 , . . . , ûN ]. If ui is a frozen bit, we can simply assign ûi = 0. Otherwise, the decoded bit can
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:3
be determined by
(i) (i)
(
0, if WN (y1N , ûi−1 N i−1
1 |0) > WN (y1 , û1 |1),
ûi = (i) (i) (1)
1, if WN (y1N , ûi−1 N i−1
1 |0) < WN (y1 , û1 |1),
(i)
where WN (y1N , ûi−1 N i−1
1 |ui = ûi ) is defined as the bit channel transition probability, (y1 , û1 ) denotes the
(i)
output of WN and ui is input. The deterministic successive cancellation decoder calculates this log-
likelihood ratio (LLR) by recursively utilizing two basic processing nodes, namely f node and g node,
whose functions are shown as
1 + ab
f (a, b) = , g(a, b, ûsum) = a1−2ûsum b, (2)
a+b
Stochastic decoding utilizes bit-streams that are composed of ‘0’ and ‘1’ to represent a number rather
than employs weighted sum to represent values like conventional deterministic decoding. In stochastic
computing, the portion of 1’s in the whole bitstream denotes corresponding value. The first stochastic SC
decoder for polar codes was proposed by [16], which mentioned that when the normalization processing
is applied, the bit conditional probability denoted in (3) can be utilized as the decision metric for SC
polar decoder.
(i)
W (y N ,ûi−1 |u =û )
PN 1 (i)1 N i i−1i , if i ∈ A.
WN (y1 ,û1 |ui )
p(ui = ûi |ûi−1
1 )= ui ∈{0,1} (3)
if i ∈ Ac .
1ui =0 ,
Pr(A=0) Pr(B=0)
According to the definition of likelihood ratio (LR) [16], a = Pr(A=1) and b = Pr(B=1) , the equation of
f node can be written as
where Pr(F = 0) + Pr(F = 1) = 1. Therefore, we can get the equation Prf = Pr(F = 1) = (1 − Pr(A =
1))Pr(B = 1) + Pr(A = 1)(1 − Pr(B = 1)).
To derive stochastic version of g node equation, we first set ûsum = 0, then,
Pr(A=0)Pr(B=0)
Pr(A=1)Pr(B=1)+Pr(A=0)Pr(B=0) Pr(G = 0)
g(a, b, 0) = ab = Pr(A=1)Pr(B=1)
= , (5)
Pr(G = 1)
Pr(A=1)Pr(B=1)+Pr(A=0)Pr(B=0)
where both numerator and denominator are divided by Pr(A = 1)Pr(B = 1) + Pr(A = 0)Pr(B =
0) in order to meet the requirement that Pr(G = 0) + Pr(G = 1) = 1. Then we can get Prg =
Pra Prb
Pr(G = 1) = (1−Pra )(1−Pr b )+Pra Prb
. Secondly, we set ûsum = 1, similarly we can get Prg = Pr(G =
(1−Pra )Prb
1) = Pra (1−Prb )+(1−Pra )Prb . In this instance, if we define b = 1ûsum =0 , the stochastic form (2) can be
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:4
0 1
p11 p21
p11 p21
0 1 0 1 p 2
1 p22 p32 p42
p11 p12 p11 p22 p21 p32 p21 p42
p 3
p23 p33 p 3
1 4
0 1
0 1
p11 p12 p13 p11 p12 p23 p21 p32 p33 p21 p32 p43
0 1 0 1 i i i i
∏p i
path1 ∏p
n =1
i
path 2 ∏p i
path 3 ∏p i
path4
n =1 n =1 n =1
Figure 1 SCL decoding steps with L = 2. Figure 2 (Color online) Stochastic SCL decoding with
L = 2.
rewritten as
From (6), bit conditional probabilities in SC decoding can be updated by recursively applying
(2i−1) (i) N/2 (i)
(y1N , û2i−2 ) = PrN/2 (y1 , û2i−2 2i−2 N 2i−2
PrN 1 1,o ⊕ û1,e ) 1 − PrN/2 (yN/2+1 , û1,e )
(i) N/2 (i)
+ 1 − PrN/2 (y1 , û2i−2 2i−2 N 2i−2
1,o ⊕ û1,e ) PrN/2 (yN/2+1 , û1,e ).
(8)
(i) N/2 (i) N/2 (i)
((1−û2i−1 )PrN/2 (y1 ,û2i−2 2i−2
1,o ⊕û1,e )+û
2i−1
(1−PrN/2 (y1 ,û2i−2 2i−2 N 2i−2
1,o ⊕û1,e )))PrN/2 (yN/2+1 ,û1,e )
Pr(2i) (y1N , û2i−1
)= .
N 1 2i−1
û
(2i−1)
PrN N 2i−2
(y1 ,û1 2i−1
)+(1−û
(2i−1) N 2i−2
)(1−PrN (y1 ,û1 ))
Please note that the two formulas only calculate the bit conditional probability instead of the path
probability P (ûi1 ). After calculating bit conditional probability as denoted in
i
PrN , if i ∈ A and ûi = 1,
p(ui = ûi |ûi−1
1 )= 1 − PriN , if i ∈ A and ûi = 0, (9)
ui =0 , if i ∈ Ac ,
1
in order to obtain path probability as the path metric, all bit conditional probability in one path [p(u1 =
û1 ), . . . , p(uN = ûN )] has to be multiplied as
i
Y
P (ûi1 ) = P (ûi−1 i−1
1 )p(ui = ûi |û1 ) = p(un = ûn |ûn−1
1 ). (10)
n=1
Take the SSCL decoding with L = 2 depicted in Figure 2 as an example. At each level, SCL decoding
tree expands paths and updates path probabilities, and then selects paths with the largest L path prob-
abilities instead of only keeping the best path. In general, at the i-th decoding level of SSCL decoder, a
total of 2L ordered probabilities P{1,...,2L} (ûi1 ) are calculated as the metric to select the L most probable
candidates.
Therefore, the path probabilities at the i-th decoding level can be calculated through multiplying the
(i − 1)-th level’s path probabilities with the corresponding ordered bit conditional probability. Revealed
by (10), to implement path probabilities, only one two-input AND gate is needed for multiplication,
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:5
100
10−1
p(ui=uAi|uA )
i−1
1
BER
P (uA1i )
P (uA1i−1) AND
10−2
N=64, deterministic decoding
N=64, stochastic decoding
N=64, α=0.5, stochastic
N=256, deterministic decoding
N=256, stochastic decoding
N=256, α=0.5, stochastic
10−3
1.5 2.0 2.5 3.0
SNR (dB)
Figure 3 (Color online) The architecture for generating Figure 4 (Color online) Comparison of different SC polar
path probability in stochastic SCL decoding. decoders.
with the corresponding architecture depicted in Figure 3. Moreover, the calculation of the conditional
probability of each corresponding ordered bit entails the recursive operation between f node and g node.
In the successive way, the most probable L candidates in each stage can always be kept and the best one
at the last stage can also be selected.
As illustrated in Figure 4, compared with the deterministic design, the decoding performance of stochastic
SC suffers from a sharp degradation. The reason is that the channel message represented by bitstreams
will lose the randomness in high SNR regions. For stochastic LDPC decoders, a new and essential
scaling method for stochastic decoders was proposed in [17], which is used to provide a similar level of
switching activity over different ranges of SNRs. This method can efficiently improve bit error rate (BER)
performance for stochastic decoders at a high level.
In the scaling method, the received channel reliabilities are scaled by a factor that is proportional to
the noise in channel. The scaled LLRs calculated by
Pr (xi = +1|yi )
LR(yi ) = log (11)
Pr (xi = −1|yi )
are independent towards channel noise. That is to say, the decoder does not need to estimate the noise
in the channel. Furthermore, according to the work in [17], the scaled LLR for the i-th received symbol
is LR′ (yi ) = αN0 LR(yi ), where the initial channel likelihood ratio can be denoted by LR(yi ) = 4yi /N0 ,
and N0 is the single-sided noise power density. The channel message scaling approach introduces a
scaling coefficient α and the scaled channel likelihood ratio becomes LR′ (yi ) = 4αyi , which is now used
for generating the input stochastic data bitstreams for SC polar decoders. The input probability can
therefore be rewritten as
1 1
Pr(yi = 1) ≈ −LR(y ) = −4αyi . (12)
e i +1 e +1
Considering that the choices of α would result in different decoding performances and previous work [18]
has proven the best performance of stochastic SC decoder can be achieved with α = 0.5, the scaling factor
α = 0.5 is employed in all numerical simulations in the following part of this paper.
To testify the effectiveness of the channel message scaling method, simulation comparisons of determin-
istic design, stochastic design, and stochastic design with scaling factor α = 0.5 are depicted in Figure 4.
Simulation results for both (64, 32) code and (256, 128) code have demonstrated that the stochastic SC
decoders with the scaling approach achieve improvement over the ones without.
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:6
The complexity of path probabilities’ comparison and memory management grows drastically with L.
Thus, SSCL decoder was designed to reduce the heavy consumption. However, it is difficult to sort
directly by stochastic computation representation, in other words, the stochastic data streams of path
probabilities should be changed into deterministic data. Here, DS [20] is developed for path selection.
In order to change stochastic data streams into deterministic values, a counter is applied after the
path probability of each candidate is calculated. The counter computes the number of ‘1’ in stochastic
streams, and this number is denoted as the path probability metric (PPM). Then paths with the largest
L PPM values are selected.
For clarity, children from the same father node with larger and smaller PPM are denoted by the first
child (FC) and the next child (NC), respectively. In traditional direct sorting algorithms, such as insertion
and bubble sorting, when the m-th maximum NC node and the m-th minimum FC node are found in the
m-th round, top m maximum NCs are all compared to the m-th minimum FC value. In DS algorithm,
this step can be simplified to only compare the m-th maximum NC and the m-th minimum FC, and
then the comparison result decides whether the next round of comparison needs to be conducted. The
DS reduces the comparison complexity from O(L2 ) to O(L), and an upper rounds coefficient k is set to
reduce the latency from kL2 to kL [20]. By employing DS algorithm, the selection of candidate paths is
depicted in Figure 5. In our stochastic decoding system, L FC PPM values are always superior to L NC
PPM values in most levels on the binary-tree, that is to say, when coefficient k is canceled, the average
round’s number of comparison required is still quite small and the DS method will have no performance
gap compared to direct sorting.
For stochastic polar SCL decoder, the scaling factor is set as α = 0.5, the bitstream length is chosen
as l = 1024, and doubling probability approach is employed in the stochastic system. Figure 6 shows
the numerical results for different decoder with (256, 128) codeword. From Figure 6, it can be observed
that the performance of DS-based SSCL decoding with list size 2L is similar to the performance of
deterministic SCL decoding with list size L.
The latency of DS algorithm is relevant to the average rounds of comparison. For (256, 128) polar
decoder with L = 8, the average comparison rounds of DS are illustrated in Figure 7. When SNR is in
[1.5, 3.0], the average rounds are no larger than 1.76, hence the low latency of DS algorithm is guaranteed.
The biggest problem of SSCL decoder is decoding latency. Single-level decoding mentioned in Section 2
for stochastic polar code makes every bit decoding consume too many clock periods, which almost equals
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:7
1 0
1 0 1 0
NC1 NC2 NC3 NC4 FC1 FC2 FC3 FC4
0
1
0
0
1
0
1
1st Round NC1 FC3 FC1 NC1 FC2 NC2 FC3 NC3 FC4 NC4
NC1 FC3
FC NC Candidates
FC1 NC1 NC4 FC4 FC1 FC2 NC1 FC4
Node with maximal probability Node with minimal probability
3rd Round Node with sub-maximal probability Node with sub-minimal probability
NC2 NC2 FC4 FC4
Figure 5 (Color online) Candidate paths’ selection in polar SCL decoder (L = 4).
10−1 1.77
1.76
1.75
Average comparison rounds
10−2 1.74
1.73
BER
1.72
1.71
10−3 Determined-SCL, L=1 1.70
Stochastic-SCL-DS, L=2
Determined-SCL, L=2 1.69
Stochastic-SCL-DS, L=4
Determined-SCL, L=4 1.68
Stochastic-SCL-DS, L=8
10 −4 1.67
1.6 1.8 2.0 2.2 2.4 2.6 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
SNR (dB) SNR (dB)
Figure 6 (Color online) Performance comparison of Figure 7 Comparisons in (256, 128) stochastic SCL de-
(256, 128) deterministic and stochastic decoders with single- coder with single-level DS decoding (L = 8, l = 1024).
level DS (l = 1024).
to the length of stochastic bitstream. To deal with the dilemma, in this section, double-level decoding
method is proposed to reduce decoding latency. Each time we estimate two bits no matter they are
information bits or frozen bits, and then utilize the feedback signal to calculate the next two bits codes.
This double-level method is very similar to the pre-computation scheme in deterministic SC de-
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:8
1 0
1 0 1 0
1 1 0 1 0
Frozen bit
0 0
1
1
Information bit
0 0
1
1
2 1 0 1 0
Information bit
0
0
0
Frozen bit
1
0
1
0
3 1 0 1 0
Information bit
1 0 1 0 1 0 1 0
Information bit
(2i)
coding [21]. Ref. [21] proposed a pre-computation scheme in all Prj (y1j , û12i−1 ) calculation when
j = 1, 2, . . . , N/2, N , in order to reduce one clock period when estimating one bit value in deterministic
decoding. In stochastic calculation, our proposed double-level scheme only employs pre-computation
(2i)
in PrN (y1N , û2i−1
1 ) to ensure two neighbouring bit conditional probabilities p(u2i−1 = û2i−1 |û2i−2 1 )
and p(u2i = û2i |û2i−11 ) output simultaneously. For decoding each bit conditional probability needs
almost l clock periods, the double-level scheme can save N l/2 clock periods. In stochastic decoding,
(2i)
pre-computation in other stages of Prj (y1j , û2i−11 ) calculation when j = 1, 2, . . . , N/2 is meaningless.
Double-level decoding scheme is illustrated in Figure 8. For each father node, there are two children at
the first level (the (2i − 1)-th level) and four sub-children (the children of children nodes) at the second
level (the 2i-th level). According to different distributions of information bits and frozen bits, there are
three cases in the double-level scheme. As shown in Figure 8, the first and second cases have a frozen bit
and an information bit in two neighbouring levels, then DS is utilized to select paths with largest L PPM
values from 2L children. The third case has two information bits in both levels, and then an adaptable
DS is employed to select L candidates from 4L children. Details of adaptable DS are in Subsection 3.2.
In double-level SCL decoding, every two levels need one sorting process.
The ADS method is proposed to meet the sorting requirement for stochastic double-level SCL decoding.
Employing the adaptable DS algorithm, the selection process of candidate paths from 4L to L is illustrated
in Figure 9. From one father node, four sub-children are divided into one FC and three NCs. The ADS
algorithm is composed of at most L − 1 rounds of comparison. Hence, it is dynamic and approximate to
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:9
1 0 1 0 1 0 1 0
1
0
1
1
0
1
0
0
0
0
0
0
1
NC1 FC1 NC2 NC3 NC4 NC5 NC6 FC2 NC7 FC3 NC8 NC9 NC10 NC11 NC12 FC4
NC1 NC2 NC3 NC4 NC5 NC6 NC7 NC8 NC9 NC10 NC11 NC12 FC1 FC2 FC3 FC4
FC NC Candidates
Node with minimal Node with sub-minimal Node with the third-minimal
probability probability probability
Node with maximal Node with sub-maximal Node with the third-maximal
probability probability probability
Probability Probability sort
comparison
Figure 9 (Color online) Candidate paths’ selection in polar SCL decoder with double-level (L = 4).
DS. Firstly, all L FCs are picked out, and the remainders are 3L NCs. In the first round comparison,
NC with maximal PPM (NC1) and FC with minimal PPM (FC3) are chosen out (requiring 4L − 2
comparison times), and then compare NC1 and FC3 (requiring 1 comparison time). If the PPM of FC3
is larger, the sorting will be finished with the selected candidates of FC1, FC2, FC3, and FC4. In this
case, only one round of comparison is required with 4L − 1 PPM comparisons. Otherwise, FC3 is replaced
by NC1, and these two PPMs will not be compared in the following process any more. And then the
second round is operated in similar pattern. Next, NC with submaximal PPM (NC10) and FC with
subminimal PPM (FC2) are chosen out (requiring 4L − 4 comparison times), and then compare NC10
with FC2 (requiring 1 comparison time). If the PPM of FC2 is larger, the sorting process will finish with
the selected candidates of FC1, FC2, NC1, and FC4. In this case, two rounds of comparison are required
with 8L − 4 PPM comparisons. Otherwise, FC2 is replaced by NC10, the step of final round comparison
repeats the above process, and the final candidates can be obtained after 12L − 9 PPM comparisons.
To reasonably evaluate the performance of the proposed stochastic SCL decoder, numerical simulation
is carried out for 1/2-rate (1024, 512) codeword, where the scaling factor α = 0.5 and bit-stream length l =
1024. As shown in Figure 10, owing to the random fluctuations of stochastic decoding, the performance
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:10
10−1 350
300
Single-level DS
BER
150
Stochastic-SCL-single level, L=2
10−4 Stochastic-SCL-double level, L=2
Determined-SCL-single level, L=2
Stochastic-SCL-double level, L=4 100
Determined-SCL-single level, L=4
Stochastic-SCL-double level, L=8
50
1.4 1.6 1.8 2.0 2.2 2.4 2.6 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
SNR (dB) SNR (dB)
Figure 10 (Color online) Performance comparison of the Figure 11 (Color online) The sum of rounds in SCL de-
1/2-rate (1024, 512) deterministic SCL decoder and stochas- coder with double-level DS 1/2-rate (1024, 512) decoding
tic SCL decoder with different decoding levels. (L = 4).
gap, resulting from deterministic SCL decoding and SSCL decoding, does exist. And the performance of
(double level stochastic SCL) DL-SSCL decoding with list size 2L can transcend the performance of single
level deterministic SCL decoder with list size L. Furthermore, the performance gap between single-level
and double-level stochastic decoding scheme results from multiplying two bit conditional probabilities
into the path probability simultaneously, and enlarging probability approach in double-level decoding
is applied half times than that in single-level decoding scheme. Accordingly, the randomness of PPM
in double-level scheme is worse than that in single-level scheme, but the performance gap of stochastic
decoders with list 2, depicted in Figure 10, can be still accepted.
In stochastic double-level SCL decoding, ADS method needs to select L best paths from 2L or 4L
children. These two kinds of sorting require different comparison times in each round. Therefore, the
latency of ADS is relevant to the average rounds of comparison. For (1024, 512) stochastic polar decoder
with L = 4, the sum of comparison rounds is shown in Figure 11, the red line presents the sum of
comparison rounds resulting from selecting L best paths from 4L children, and the black line denotes the
sum of comparison rounds resulting from selecting L best paths from 2L children.
According to Figures 5 and 9, the DS from 2L or 4L to L approximately requires 2L or 4L compar-
isons per round, respectively. Using 2L comparisons in one round as an evaluation standard, Figure 12
summarizes the average rounds for each information bit in double-level decoding. When SNR is in
[1.5, 3.0], the average rounds are no larger than 1.47, so the low latency of ADS algorithm is guaranteed.
Hence, the ADS can compromise different sorting scales in stochastic double-level decoding, as concluded
in Algorithm 1. The FER performance comparison with other FPGA-based SCL decoders is given in
Figure 13.
4 Stochastic SCL decoder for polar codes with 2p -level decoding method
As analyzed in Section 3, multiple-level scheme in binary tree is efficient for SSCL decoder, for it can
estimate several bits simultaneously rather than bit by bit. However, multiple-level scheme will cause
a larger sorting scale. Direct sorting cannot afford such large scale complexity, and the dramatically
increasing sorting complexity will cancel out the advantages brought by stochastic computing. To this
end, ADS is improved to maintain the complexity in multiple-level decoding in O(L).
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:11
1.48 100
[20], (N, L, R)=(1024,4,0.5)
[22], (N, L, R)=(1024,4,0.5)
1.46 SSCL double-level decoding
SSCL single-level decoding
Average comparison rounds
10−1
1.44
1.42
FER
10−2
1.40
1.38
10−3
1.36
1.34 10−4
1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5
SNR (dB)
SNR (dB)
Figure 12 Average rounds in SCL decoder with single- Figure 13 (Color online) The FER performance of dif-
double-level DS 1/2-rate (1024, 512) decoding (L = 4). ferent SCL decoders with 1/2 rate (1024, 512) codeword
(L = 4, l = 1024).
The multiple-level scheme provides different depth of pre-computation in stochastic decoding process.
Denote p as the number of stages to perform pre-computation operations. In the single-level scheme,
there is not any pre-computation employed in decoding process, that is to say, p = 0 and 2p = 1.
Therefore, the decoding binary tree selects L candidates bit by bit. In double-level scheme, one stage of
(2i) (2i)
pre-computation is applied with [PrN (y1N , 0), PrN (y1N , 1)] pre-calculated without feedback information,
and p = 1 and 2p = 2. Therefore, the decoding binary tree selects L candidates from 2L or 4L nodes
every double levels. And so on, when increasing the stages of pre-computation to two (p = 2, 2p =
(i) N/2 (i) N/2 (2i) (2i)
4), [PrN/2 (y1 , 0), PrN/2 (y1 , 1)] and corresponding [PrN (y1N , 0), PrN (y1N , 1)] are pre-calculated
without feedback information. The decoding binary tree selects L candidates from 2L, 4L, 8L or 16L
nodes every four levels (2p = 4).
In general, when the number of pre-computation stages is p, the 2p -level scheme of SSCL decoder
(i/2p−2 ) N/2p−1 (i/2p−2 ) N/2p−1
is proposed. The last p stages [PrN/2p−1 (y1 , 0), PrN/2p−1 (y1 , 1), p > 2] to corresponding
(2i) (2i)
[PrN (y1N , 0), PrN (y1N , 1)] need to be pre-calculated without feedback information. For the informa-
tion bits are dispersed in N -bit code, the sorting scale is dynamic. The decoding binary tree selects L
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:12
p
candidates from 2L, 4L, . . . , 22 L nodes every 2p levels. The proposed 2p -level scheme reduces decoding
latency from N l to N l/2p . It really improves the efficiency of stochastic decoding. For data are cal-
culated between bit and bit in the stochastic computation, the increasing computational elements for
pre-computation are negligible, and more details of architecture design are presented in Section 5.
In 2p -level decoding scheme, the sorting scale is dynamic and large. The worst situation is selecting L
p
best candidates from 22 L nodes. The DS scheme picks L FC nodes from L paths, and then sequentially
chooses the worst FC and the best NC and compares them until the possibility is not changing. However,
p
when p is large, the number of NC nodes is (22 − 1)L, the comparisons of choosing the best NC are
difficult. To this end, in the first round, L best NCs are picked from L paths in parallel, and then
p
compare L NCs to select the best one, and this round needs (22 − 1)L NC comparisons. In the following
rounds, only the path of last selected NC needs to be compared. So these rounds need only (2p + L) NC
p
comparisons. Algorithm 2 shows that only the first round requires (22 − 1)L NC comparisons (line 7),
the following rounds need (2p + L) NC comparisons (lines 10 and 18).
In stochastic 2p -level SCL polar decoding, when p increases, the decoding latency will be reduced from
N l to N l/2p . However, increasing p also causes performance loss to some extent. As shown in Figure 10,
there is a slight performance gap between double-level scheme and single-level scheme. To update path
probability every 2p -level, the 2p bit probabilities will be multiplied together and multiplied with the
old path probability. Meanwhile, to keep the probabilities for stochastic computing maintained in a
reasonable range, enlarging probability approach is employed. Every time updating the path probability,
we re-scale all the path probabilities with the same multiple factor to make the best PPM maintained
in the range of [0.5, 1]. However, the re-scaling times will decrease when p grows. Consequently, the 2p
bit probabilities multiplying at the same time with one re-scaling approach make stochastic randomness
worse with p increasing. To conclude, increasing p lowers the latency of stochastic 2p -level SCL polar
decoding at the expense of slight performance loss. Therefore, the selection of size p depends on the
realistic situations and requirements of designers.
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:13
Prb
x A XOR REG
Pra
xs (t) Prforg
A>B 0 MUX
1
AND J
LFSR B FF
R ûsum Q
log2 l bits CLK
1 bit AND K
i
Stochastic bit stream
CLK 1 bit
Figure 14 (Color online) The bit stream generation mod- Figure 15 (Color online) The architecture of the pro-
ule. posed mixed node I.
Considering the tradeoff between latency and performance of SCL decoder, we employ p = 1 here as
our proposed architecture design. This section focuses on designing details of every module in the
corresponding proposed SSCL decoder architecture.
As illustrated in Figure 14, the deterministic value of x is compared with a random number R with
R ∈ [0, 1]. In reality, R is pseudo-random, generated by a linear feedback shift register (LFSR). The
stochastic bitstream output of the comparator is denoted by xs (t). If x > R, xs (t) = 1, otherwise,
Pl
xs (t) = 0. Consequently, the deterministic value of a stochastic stream xs is obtained by x = 1l t=1 xs (t),
where l denotes the length of a stochastic bitstream. This module converses the deterministic value of
channel transition probability into stochastic streams. To decode an N -bit polar code, the N paralleling
stochastic bit stream generation (SG) modules are required as the interface to connect to the main
stochastic decoder. For N paralleling SG sharing the same pseudo-random sequence, the complexity of
this interface is determined by that of the comparator, which is based on the length of bitstream l. To
this end, the complexity of this interface module is O(N l).
Through adopting similar pre-computation [21], a mixed node I is defined here to realize single level
decoding. More specifically, f node and g node are merged in mixed node I to make the whole de-
coder succincter. On the basis of the elementary nodes presented in [16], the mixed node I is depicted
in Figure 15.
This node can execute the function of f and g nodes based on stochastic computing. Table 1 summarizes
the hardware complexity of each mixed node I. Although mixed node I can complete the general stochastic
element processing, it is unsuitable to realize the proposed double-level decoding.
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:14
Prb
REG P (uA1i )
XOR
Pra AND
AND J
AND
FF Q
CLK
AND K
AND
AND J
AND
FF
Q
CLK
1 bit
AND K
Stochastic bit stream
P (uA1i−2)
Figure 16 (Color online) Mixed node II for the last stage, also mixed node III for the first stage in SCL decoding.
To realize the double-level dcoding, mixed node II that can improve the efficiency of stochastic polar
decoding is proposed. To implement the proposed double-level decoding algorithm, the stochastic bit-
stream of p(u2i = û2i |û2i−1
1 ) needs to be pre-computed as the simultaneous output with the stream of
2i−2
p(u2i−1 = û2i−1 |û1 ), and hence the node module in the last stage should be changed as mixed node
II when performing double-level decoding.
As shown in Figure 16, mixed node II (the architecture in blue box) is employed in the last stage of the
SCL decoder. According to the property of probabilities, only an inverter is required to implement the
complementary operation. The inverter in Figure 16 calculates stochastic stream of p(û2i−1 = 0|ûsum ).
Different with mixed node I outputting only one-bit stream, mixed node II simultaneously outputs four-
bit streams of the production of two adjacent bit probabilities, which are p(û2i = 1, û2i−1 = 1|ûsum ),
p(û2i = 1, û2i−1 = 0|ûsum ), p(û2i = 0, û2i−1 = 1|ûsum) and p(û2i = 0, û2i−1 = 0|ûsum ).
The SCL decoding architecture consists of L SC decoders in parallel. However, the parallelling design
extends the hardware complexity dramatically. For the first stage of SCL decoding, only three outputs
are possible for each node module (one f node calculation result and two possible g node calculation
results). That is to say, the first stage does not need to use the L parallelling architecture. Therefore,
mixed node III is proposed to output all possible results for the second stage calculation. As shown
in Figure 16, mixed node III (the architecture in green box) is applied in the first stage in SCL decoder.
As illustrated in Figure 17, the architecture of stochastic SC decoder with double-level decoding is
composed of two blocks, the interface and the main decoder. The figure presents an example of 8-bit
stochastic decoder, and N -bit decoder obeys the same design. The interface includes N SG modules,
which generate stochastic bit streams of channel bit probabilities as the inputs of main decoder.
The main decoder consists of three parts: the node process, estimation and feedback module. The node
process of an N -bit SSCL decoder consists of (N − 2) mixed nodes I and one mixed node II on a total
of log2 N stages. For double-level decoding, mixed node II is applied on the last stage of the decoder.
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:15
D D 0
uˆ3+uˆ4
1
uˆ2+uˆ4 D uˆ2 or uˆ6
û4 D D 0
1
D
Figure 17 (Color online) The 8-bit stochastic SC decoder with double-level decoding.
Figure 18 (Color online) The N -bit stochastic SCL decoder with double-level decoding.
Because stochastic bit streams are not suitable for comparing, four counters are employed to convert
the stochastic streams into corresponding deterministic data, and then a maximum module selects the
best probability. Output in Figure 17 is an estimation module to output the paths û2i−1 and û2i , which
corresponds to selected probability.
The feedback part is utilized to produce ûsum signal for the g node calculation. Considering that
the architecture has been researched well [21], we omit its details here. It is worth mentioning that for
stochastic decoding, the clock set in feedback part is different from that in stochastic node processing
framework. The clock in feedback part is l times than that in stochastic calculation.
As shown in Figure 18, the architecture of SSCL decoder with double-level scheme is similar to that of
SC design. The interface block includes N SG modules, while the main decoder expands to L paralleling
basic frames. However, the node modules in the first stage can merely output three possible possibilities,
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:16
REG
REG REG b
...
N
PPMFCj
3 3
PPMNC3L
PPMFC2 SG Y N
...
...
PPMFCL SG
4 5 5
Figure 19 (Color online) The architecture of list core module with ADS method and enlarging probability approach.
as mentioned in Subsection 5.4. To this end, N/2 mixed nodes III are employed in the first stage, instead
of expanded N L/2 mixed nodes I. All nodes in stage 2 to stage n expand to L paralleling modules. The
LC denotes the list core module, which uses ADS method to select L best paths from 4L candidates.
The list core module implements the improved DS and enlarging probability approach in the proposed
SSCL decoder. To realise the ADS method, PPMs have been categorized into two groups, FCs and NCs.
Subsequently, the L FCs are saved into memory block II, and 3L NCs are stored in memory block I.
As presented in the ADS method, PPMs in memory block II have higher priorities to be selected. As
illustrated in Figure 19, two multiplexers respectively read data from memory block I or II in order in
each sorting round, and select the best PPM NCi from block I and the worst PPM FCj from block II.
If NCi is better, the control signal is activated to execute the instructions ① and ②. The instruction ①
removes NCi from block I, and the instruction ② replaces the value of FCj with that of NCi . Then, the
next round of sorting keeps active. If FCj is better, the sorting process is over and data in memory block
II are the selected L best paths, the instruction ③ is executed. The instruction ③ is an indicator of the
end of sorting and the beginning of enlarging probability. The corresponding L best paths are output.
To realize the enlarging probability approach, the instruction ③ is executed to start-up the enlarging
probability approach. If the best-selected path is smaller than l/2 (the corresponding probability is
smaller than 0.5), the instruction ④ executes left shifting operations to all PPMs in block II. Then the
instruction ⑤ employs L SG modules to regenerate enlarged PPM stochastic streams. If the best-selected
path is larger than l/2, the instruction ⑤ is executed directly.
As shown in Figure 20, the decoding schedule can be denoted by three parts: node processing, DS and
enlarging probability approach. When decoding every two code bits û2i and û2i+1 , all stochastic streams
flow through n stages (n = log2 N ), and the latency is (n + l − 1) clock cycles. The latency of DS is
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:17
Stage 1
Stage 2
Stage 3
Stage n−1
Stage n
DS
Enlarging
probability
l l
Figure 20 (Color online) Decoding schedule for N -bit stochastic SCL decoder with double-level decoding.
dynamic, and more details are presented in Subsection 2.5. Enlarging probability approach includes an
SG module, and the latency of output is l clock cycles. When decoding an N -bit codeword, the latency
L is
N Nl
L =(n + l − 1) × + L(DS) ≈ + L(DS). (14)
2 2
Table 2 compares the proposed SSCL decoder with other FPGA-based SCL decoders. Compared with
deterministic SCL designs [20,22], the single-level SSCL only requires around 8%, 5.6% ALMs and 20.8%,
14.2% registers at the expense of increasing latency and reduced throughput. With double-level decoding,
the SSCL can efficiently improve its throughput performance at slight cost of hardware consumption.
Moreover, the FER performances of these SCL decoders are shown in Figure 13. Compared with [20,22],
the single-level SSCL decoder has a loss of about 0.5 and 1 dB, respectively, and the loss of double-
level is little larger. Although double-level decoding expends little more hardware resource compared to
single-level decoding, it can achieve more excellent performance in throughput and latency. Therefore,
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:18
our proposed SSCL decoder can successfully realize a good hardware cost and decoding performance
tradeoff.
7 Conclusion
Stochastic SCL polar code decoders are proposed. Approaches that can improve decoding performance
have been discussed. It is shown that the stochastic SCL decoder can achieve similar error-correcting
performance to its deterministic counterpart and considerable hardware reduction at the expense of
increasing latency. Hence, it can achieve a suitable tradeoff between hardware cost and error performance.
To our knowledge, this is the first stochastic SCL polar decoder, which has potential applications for
wearable and IoT devices that do not require high speed but have strict hardware constraints.
Acknowledgements This work was supported in part by National Key R&D Program of China (Grant No. 2020YFB220-
5503), National Natural Science Foundation of China (Grant Nos. 61871115, 61501116), Jiangsu Provincial NSF for
Excellent Young Scholars (Grant No. BK20180059), Six Talent Peak Program of Jiangsu Province (Grant No. 2018-
DZXX-001), Distinguished Perfection Professorship of Southeast University, Fundamental Research Funds for the Central
Universities, and Student Research Training Program of Southeast University.
References
1 Arıkan E. Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input mem-
oryless channels. IEEE Trans Inform Theor, 2009, 55: 3051–3073
2 Tal I, Vardy A. List decoding of polar codes. In: Proceedings of the IEEE International Symposium on Information
Theory Proceedings, Petersburg, 2011. 1–5
3 Chen K, Niu K, Lin J R. List successive cancellation decoding of polar codes. Electron Lett, 2012, 48: 500–501
4 Zhou H Y, Zhang C, Song W Q, et al. Segmented CRC-aided SC list polar decoding. In: Proceedings of 2016 IEEE
83rd Vehicular Technology Conference (VTC Spring), 2016. 1–5
5 Shen Y F, Zhang C, Yang J M, et al. Low-latency software successive cancellation list polar decoder using stage-located
copy. In: Proceedings of IEEE International Conference on Digital Signal Processing (DSP), 2016. 84–88
6 Gaines B R. Stochastic computing systems. In: Advances in Information Systems Science. Boston: Springer, 1969.
37–172
7 Moons B, Verhelst M. Energy-efficiency and accuracy of stochastic computing circuits in emerging technologies. IEEE
J Emerg Sel Top Circuits Syst, 2014, 4: 475–486
8 Brown B D, Card H C. Stochastic neural computation. I. Computational elements. IEEE Trans Comput, 2001, 50:
891–905
9 Wang H Z, Zhang Z C, You X H, et al. Low-complexity Winograd convolution architecture based on stochastic
computing. In: Proceedings of the IEEE International Conference on Digital Signal Processing, Shanghai, 2018. 1–5
10 Ceroici C, Gaudet V C. FPGA implementation of a clockless stochastic LDPC decoder. In: Proceedings of the IEEE
Workshop on Signal Processing Systems, Belfast, 2014. 1–5
11 Chen J N, Hu J H, Sobelman G E. Stochastic MIMO detector based on the Markov chain Monte Carlo algorithm.
IEEE Trans Signal Process, 2014, 62: 1454–1463
12 Chen J N, Hu J H, Sobelman G E. Stochastic iterative MIMO detection system: algorithm and hardware design. IEEE
Trans Circ Syst I, 2015, 62: 1205–1214
13 Liang X, Zhang C, Xu M H, et al. Efficient stochastic list successive cancellation decoder for polar codes. In:
Proceedings of the IEEE International System-on-Chip Conference, Beijing, 2015. 421–426
14 Tal I, Vardy A. List decoding of polar codes. In: Proceedings of the IEEE International Symposium on Information
Theory Proceedings, Petersburg, 2011. 1–5
15 Bakulin M, Kreyndelin V, Rog A, et al. MMSE based K-best algorithm for efficient MIMO detection. In: Proceedings
of the International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, Munich,
2017. 258–263
16 Yuan B, Parhi K K. Successive cancellation decoding of polar codes using stochastic computing. In: Proceedings of
the IEEE International Symposium on Circuits and Systems, Lisbon, 2015. 3040–3043
Liang X, et al. Sci China Inf Sci October 2020 Vol. 63 202303:19
17 Tehrani S S, Mannor S, Gross W J. Fully parallel stochastic LDPC decoders. IEEE Trans Signal Process, 2008, 56:
5692–5703
18 Xu Z L, Niu K. Successive cancellation decoders of polar codes based on stochastic computation. In: Proceedings of
the IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication, Washington,
2014. 908–912
19 Zhang Z Y, Zhang L, Wang X B, et al. A split-reduced successive cancellation list decoder for polar codes. IEEE J
Sel Areas Commun, 2016, 34: 292–302
20 Liang X, Yang J M, Zhang C, et al. Hardware efficient and low-latency CA-SCL decoder based on distributed sorting.
In: Proceedings of the IEEE Global Communications Conference, Washington, 2016. 1–6
21 Zhang C, Parhi K K. Low-latency sequential and overlapped architectures for successive cancellation polar decoder.
IEEE Trans Signal Process, 2013, 61: 2429–2441
22 Xiong C, Zhong Y, Zhang C, et al. An FPGA emulation platform for polar codes. In: Proceedings of the IEEE
Workshop on Signal Processing Systems, 2016. 148–153