0% found this document useful (0 votes)
58 views11 pages

List Decoding Polarcodes

1) The document describes a list decoding method for polar codes called successive-cancellation list (SCL) decoding. SCL decoding considers up to L decoding paths concurrently and selects the most likely path as the output. 2) Simulation results show that SCL decoding with moderate list sizes L performs close to maximum-likelihood decoding for polar codes of length 2048. Using a "genie" to select the correct codeword gives performance comparable to state-of-the-art LDPC codes. 3) The SCL decoder runs in O(L*n log n) time and O(L*n) space, a significant improvement over the O(L*n^2) time complexity of

Uploaded by

Rahim Umar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views11 pages

List Decoding Polarcodes

1) The document describes a list decoding method for polar codes called successive-cancellation list (SCL) decoding. SCL decoding considers up to L decoding paths concurrently and selects the most likely path as the output. 2) Simulation results show that SCL decoding with moderate list sizes L performs close to maximum-likelihood decoding for polar codes of length 2048. Using a "genie" to select the correct codeword gives performance comparable to state-of-the-art LDPC codes. 3) The SCL decoder runs in O(L*n log n) time and O(L*n) space, a significant improvement over the O(L*n^2) time complexity of

Uploaded by

Rahim Umar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1

List Decoding of Polar Codes


Ido Tal Alexander Vardy
University of California San Diego,
La Jolla, CA 92093, USA
Email: [email protected], [email protected]

Abstract—We describe a successive-cancellation list decoder for n = 2048, L = 1


polar codes, which is a generalization of the classic successive- 10−1 n = 2048, L = 2
cancellation decoder of Arıkan. In the proposed list decoder, up n = 2048, L = 4

Word error rate


10−2
to L decoding paths are considered concurrently at each decoding n = 2048, L = 8
stage. Then, a single codeword is selected from the list as output. 10−3 n = 2048, L = 16
arXiv:1206.0050v1 [cs.IT] 31 May 2012

n = 2048, L = 32
If the most likely codeword is selected, simulation results show 10−4 n = 2048, ML bound
that the resulting performance is very close to that of a maximum-
n = 2048, L = 32, CRC − 16
likelihood decoder, even for moderate values of L. Alternatively, 10−5
min(RCB,TSB)
if a “genie” is allowed to pick the codeword from the list, the 10−6 max(ISP,SP59)
results are comparable to the current state of the art LDPC 1.50 2.25 3.00
codes. Luckily, implementing such a helpful genie is easy. Signal-to-noise ratio (Eb /N0 ) [dB]
Our list decoder doubles the number of decoding paths at
each decoding step, and then uses a pruning procedure to Fig. 1. Word error rate of a length n = 2048 rate 1/2 polar code optimized
discard all but the L “best” paths. Nevertheless, a straightforward for SNR=2 dB under various list sizes. Code construction was carried out via
implementation still requires Ω(L · n2 ) time, which is in stark the method proposed in [4]. The two dots represent upper and lower bounds
contrast with the O(n log n) complexity of the original successive- [5] on the SNR needed to reach a word error rate of 10−5 .
cancellation decoder. We utilize the structure of polar codes
to overcome this problem. Specifically, we devise an efficient,
numerically stable, implementation taking only O(L · n log n) 10−1
time and O(L · n) space.
Bit error rate

10−2 Successive cancellation


List-decoding ( L = 32)
10−3
WiMax LDPC (n = 2304)
I. I NTRODUCTION 10−4 List + CRC-16 (n = 2048)
Polar codes, recently discovered by Arıkan [1], are a major 10−5 List + CRC + systematic

breakthrough in coding theory. They are the first and currently


only family of codes known to have an explicit construction 1.0 1.5 2.0 2.5 3.0
(no ensemble to pick from) and efficient encoding and decod- Signal-to-noise ratio [dB]

ing algorithms, while also being capacity achieving over binary


Fig. 2. Comparison of our polar coding and decoding schemes to an
input symmetric memoryless channels.
√ Their probability of implementation of the WiMax standard take from [6]. All codes are rate 1/2.
error is known to approach O(2− n ) [2], with generalizations The length of the polar code is 2048 while the length of the WiMax code is
giving even better asymptotic results [3]. 2304. The list size used was L = 32. The CRC used was 16 bits long.
Of course, “capacity achieving” is an asymptotic property,
and the main sticking point of polar codes to date is that their seen, this choice of the most likely codeword results in a large
performance at short to moderate block lengths is disappoint- range in which our algorithm has performance very close to
ing. As we ponder why, we identify two possible culprits: that of the ML decoder, even for moderate values of L. Thus,
either the codes themselves are inherently weak at these the sub-optimality of the SC decoder indeed does plays a role
lengths, or the successive cancellation (SC) decoder employed in the disappointing performance of polar codes.
to decode them is significantly degraded with respect to Even with the above improvement, the performance of
Maximum Likelihood (ML) decoding performance. More so, polar-codes falls short. Thus, we conclude that polar-codes
the two possible culprits are complementary, and so both may themselves are weak. Luckily, we can do better. Suppose that
occur. instead of picking the most likely codeword from the list, a
In this paper we show an improvement to the SC decoder, “genie” would aid us by telling us what codeword in the list
namely, a successive cancellation list (SCL) decoder. Our list was the transmitted codeword (if the transmitted codeword was
decoder has a corresponding list size L, and setting L = 1 indeed present in the list). Luckily, implementing such a genie
results in the classic SC decoder. It should be noted that the turns out to be simple, and entails a slight modification of the
word “list” was chosen as part of the name of our decoder in polar code. With this modification, the performance of polar
order to highlight a key concept relating to the inner working codes is comparable to state of the art LDPC codes, as can be
of it. However, when our algorithm finishes, it returns a single seen in Figure 2.
codeword. In fairness, we refer to Figure 3 and note that there are
The solid lines in Figure 1 corresponds to choosing the most LDPC codes of length 2048 and rate 1/2 with better per-
likely codeword from the list as the decoder output. As can be formance than our polar codes. However, to the best of our
2

Normalized rates of code families over BIAWGN, Pe=0.0001


1 the algorithm, we must first calculate the pair of probabil-
(ϕ) (ϕ)
0.95 ities Wm (y0n−1 , ûϕ−1
0 |0) and Wm (y0n−1 , ûϕ−1
0 |1), defined
shortly. Then, we must make a decision as to the value of ûϕ
0.9
according to the pair of probabilities.
0.85
Turbo R=1/3
Turbo R=1/6
0.8 Turbo R=1/4 Algorithm 1: A high-level description of the SC decoder
Normalized rate

Voyager
Galileo HGA
0.75
Turbo R=1/2 Input: the received vector y
Cassini/Pathfinder
Galileo LGA Output: a decoded codeword ĉ
Hermitian curve [64,32] (SDD)
0.7 BCH (Koetter−Vardy)
Polar+CRC R=1/2 (List dec.)
1 for ϕ = 0, 1, . . . , n − 1 do
(ϕ) (ϕ)
0.65
ME LDPC R=1/2 (BP)
2 calculate Wm (y0n−1 , ûϕ−1 0 |0) and Wm (y0n−1 , ûϕ−1
0 |1)
3 if uϕ is frozen then
0.6
4 set ûϕ to the frozen value of uϕ
0.55 5 else
(ϕ) (ϕ)
6 if Wm (y0n−1 , ûϕ−10 |0) > Wm (y0n−1 , ûϕ−1
0 |1) then
0.5 2
10 10
3 4
10
5
10 7 set ûϕ ← 0
Blocklength, n
8 else
Fig. 3. Comparison of normalized rate [7] for a wide class of codes. The
9 set ûϕ ← 1
target word error rate is 10−4 . The plot is courtesy of Dr. Yury Polyanskiy.
10 return the codeword ĉ corresponding to û
knowledge, for length 1024 and rate 1/2 it seems that our
implementation is slightly better than previously known codes We now show how the above probabilities are calculated.
when considering a target error-probability of 10−4 . For layer 0 ≤ λ ≤ m, denote hereafter
The structure of this paper is as follows. In Section II, we
present Arıkan’s SC decoder in a notation that will be useful to Λ = 2λ . (1)
us later on. In Section III, we show how the space complexity
Recall [1] that for
of the SC decoder can be brought down from O(n log n) to
0≤ϕ<Λ, (2)
O(n). This observation will later help us in Section IV, where
(ϕ)
we presents our successive cancellation list decoder with time bit channel Wλ is a binary input channel with output
complexity O(L·n log n). Section V introduces a modification alphabet Y Λ × X ϕ , the conditional probability of which we
of polar codes which, when decoded with the SCL decoder, generically denote as
results in a significant improvement in terms of error rate. (ϕ)
This paper contains a fair amount of algorithmic detail. Wλ (y0Λ−1 , uϕ−1
0 |uϕ ) . (3)
Thus, on a first read, we advise the reader to skip to Section IV In our context, y0Λ−1
is always a contiguous subvector of
and read the first three paragraphs. Doing so will give a high- received vector y. Next, for 1 ≤ λ ≤ m, recall the recursive
level understanding of the decoding method proposed and also definition of a bit channel [1, Equations (22) and (23)] : let
show why a naive implementation is too costly. Then, we 0 ≤ 2ψ < Λ, then
advise the reader to skim Section V where the “list picking
genie” is explained. branch β
z }| {
(2ψ) Λ−1 2ψ−1
Wλ (y0 , u0 |u2ψ )
II. F ORMALIZATION OF THE S UCCESSIVE C ANCELLATION X 1 (ψ) Λ/2−1 2ψ−1 2ψ−1
D ECODER = Wλ−1 (y0 , u0,even ⊕ u0,odd |u2ψ ⊕ u2ψ+1 )
u2ψ+1
2 | {z }
The Successive Cancellation (SC) decoder is due to Arıkan branch 2β
[1]. In this section, we recast it using our notation, for future (ψ) Λ−1
· Wλ−1 (yΛ/2 , u2ψ−1
0,odd |u2ψ+1 ) (4)
reference. | {z }
Let the polar code under consideration have length n = 2m branch 2β + 1
and dimension k. Thus, the number of frozen bits is n − k.
We denote by u = (ui )n−1 n−1 and
i=0 = u0 the information bits vector
(including the frozen bits), and by c = cn−1
0 the corresponding branch β
codeword, which is sent over a binary-input channel W : X → z
(2ψ+1)
}| {
Λ−1 2ψ
Y, where X = {0, 1}. At the other end of the channel, we Wλ (y0 , u0 |u2ψ+1 )
get the received word y = y0n−1 . A decoding algorithm is 1 (ψ) Λ/2−1 2ψ−1 2ψ−1
= Wλ−1 (y0 , u0,even ⊕ u0,odd |u2ψ ⊕ u2ψ+1 )
then applied to y, resulting in a decoded codeword ĉ having 2| {z }
corresponding information bits û. branch 2β
(ψ) Λ−1 2ψ−1
· Wλ−1 (yΛ/2 , u0,odd |u2ψ+1 ) (5)
A. An outline of Successive Cancellation | {z }
branch 2β + 1
A high-level description of the SC decoding algorithm
(0)
is given in Algorithm 1. In words, at each phase ϕ of with “stopping condition” W0 (y|u) = W (y|u).
3

B. Detailed description to branch 2β as u2ψ ⊕ u2ψ+1 . Likewise, we define the input


For Algorithm 1 to become concrete, we must specify how corresponding to branch 2β + 1 as u2ψ+1 . Note that under
(ϕ)
the probability pair associated with Wm is calculated, and this recursive definition, we have that for all 0 ≤ λ ≤ m,
how the set values of û, namely ûϕ−1 , are propagated into 0 ≤ ϕ < Λ, and 0 ≤ β < 2m−λ , the input corresponding to
0 (ϕ)
those calculations. We now show an implementation that is branch β of Wλ is well defined.
straightforward, yet somewhat wasteful in terms of space. The following lemma points at the natural meaning that
For λ > 0 and 0 ≤ ϕ < Λ, recall the recursive definition of a branch number has at layer λ = 0. It is proved using a
(ϕ)
Wλ (y0Λ−1 , uϕ−1 |uϕ ) given in either (4) or (5), depending on straightforward induction.
0
the parity of ϕ. For either ϕ = 2ψ or ϕ = 2ψ + 1, the channel Lemma 1: Let y and ĉ be as in Algorithm 1, the received
(ψ) Λ/2−1
, u2ψ−1 2ψ−1 vector and the decoded codeword. Consider layer λ = 0, and
Wλ−1 is evaluated with output (y0 0,even ⊕ u0,odd ),
Λ−1 2ψ−1 thus set ϕ = 0. Next, fix a branch number 0 ≤ β < 2n . Then,
as well as with output (yΛ/2 , u0,odd ). Since our algorithm (0)
the input and output corresponding to branch β of W0 are
will make use of these recursions, we need a simple way
yβ and ĉβ , respectively.
of defining which output we are referring to. We do this by
We now introduce our second, and last, data structure for
specifying, apart from the layer λ and the phase ϕ which define
this section. For each layer 0 ≤ λ ≤ m, we will have a bit
the channel, the branch number
array, denoted by Bλ , and indexed by an integer 0 ≤ i < 2m ,
0 ≤ β < 2m−λ . (6) as in (7). The data structure will be used as follows. Let layer
(ϕ) 0 ≤ λ ≤ m, phase 0 ≤ ϕ < Λ, and branch 0 ≤ β < 2m−λ be
Since, during the run of the SC algorithm, the channel Wm given. Denote the input corresponding to branch β of Wλ
(ϕ)
is only evaluated with a single output, (y0n−1 , ûϕ−1
0 ), we give a as û(λ, ϕ, β). Then, ultimately,
branch number of β = 0 to each such output. Next, we proceed
(ϕ) Bλ [hϕ, βi] = û(λ, ϕ, β) , (10)
recursively as follows. For λ > 0, consider a channel Wλ
Λ−1 ϕ−1
with output (y0 , û0 ) and corresponding branch number where we have used the same shorthand as in (8). Notice that
Λ/2−1
β. Denote ψ = bϕ/2c. The output (y0 , û2ψ−1 2ψ−1
0,even ⊕ û0,odd ) the total memory consumed by our algorithm is O(n log n).
(ψ)
associated with Wλ−1 will have a branch number of 2β, while Our first implementation of the SC decoder is given as
the output (yΛ/2 , û2ψ−1
Λ−1
0,odd ) will have a branch number of 2β +
Algorithms 2–4. The main loop is given in Algorithm 2,
1. Finally, we mention that for the sake of brevity, we will and follows the high-level description given in Algorithm 1.
talk about the output corresponding to branch β of a channel, Note that the elements of the probabilities arrays Pλ and bit
although this is slightly inaccurate. array Bλ start-out uninitialized, and become initialized as the
We now introduce our first data structure. For each layer algorithm runs its course. The code to initialize the array
0 ≤ λ ≤ m, we will have a probabilities array, denoted by values is given in Algorithms 3 and 4.
Pλ , indexed by an integer 0 ≤ i < 2m and a bit b ∈ {0, 1}.
For a given layer λ, an index i will correspond to a phase Algorithm 2: First implementation of SC decoder
0 ≤ ϕ < Λ and branch 0 ≤ β < 2m−λ using the following Input: the received vector y
quotient/reminder representation. Output: a decoded codeword ĉ
1 for β = 0, 1, . . . , n − 1 do // Initialization
i = hϕ, βiλ = ϕ + 2λ · β . (7) 2 P0 [h0, βi][0] ← W (yβ |0), P0 [h0, βi][1] ← W (yβ |1)

In order to avoid repetition, we use the following shorthand 3 for ϕ = 0, 1, . . . , n − 1 do // Main loop
4 recursivelyCalcP(m, ϕ)
Pλ [hϕ, βi] = Pλ [hϕ, βiλ ] . (8) 5 if uϕ is frozen then
6 set Bm [hϕ, 0i] to the frozen value of uϕ
The probabilities array data structure Pλ will be used as 7 else
follows. Let a layer 0 ≤ λ ≤ m, phase 0 ≤ ϕ < Λ, and branch 8 if Pm [hϕ, 0i][0] > Pm [hϕ, 0i][1] then
9 set Bm [hϕ, 0i] ← 0
0 ≤ β < 2m−λ be given. Denote the output corresponding to 10 else
(ϕ)
branch β of Wλ as (y0Λ−1 , ûϕ−1
0 ). Then, ultimately, we will 11 set Bm [hϕ, 0i] ← 1
have for both values of b that
12 if ϕ mod 2 = 1 then
(ϕ)
Pλ [hϕ, βi][b] = Wλ (y0Λ−1 , ûϕ−1
0 |b) . (9) 13 recursivelyUpdateB(m, ϕ)
n−1
Analogously to defining the output corresponding to a 14 return the decoded codeword: ĉ = (B0 [h0, βi])β=0
branch β, we would now like define the input corresponding
to a branch. As in the “output” case, we start at layer m Lemma 2: Algorithms 2–4 are a valid implementation of
(ϕ)
and continue recursively. Consider the channel Wm , and let the SC decoder.
ûϕ be the corresponding input which Algorithm 1 assumes. Proof: We first note that in addition to proving the claim
We let this input have a branch number of β = 0. Next, we explicitly stated in the lemma, we must also prove an implicit
proceed recursively as follows. For layer λ > 0, consider the claim. Namely, we must prove that the actions taken by the
(2ψ) (2ψ+1)
channels Wλ and Wλ having the same branch β with algorithm are well defined. Specifically, we must prove that
corresponding inputs u2ψ and u2ψ+1 , respectively. In light of when an array element is read from, it was already written to
(ψ)
(5), we now consider Wλ−1 and define the input corresponding (it is initialized).
4

Algorithm 3: recursivelyCalcP(λ, ϕ) implementation I — disregarding the phase information — can be exploited for
Input: layer λ and phase ϕ a general layer λ as well. Specifically, for all 0 ≤ λ ≤ m,
1 if λ = 0 then return // Stopping condition let us now define the number of elements in Pλ to be 2m−λ .
2 set ψ ← bϕ/2c Accordingly,
// Recurse first, if needed
3 if ϕ mod 2 = 0 then recursivelyCalcP(λ − 1, ψ) Pλ [hϕ, βi] is replaced by Pλ [β] . (11)
4 for β = 0, 1, . . . , 2m−λ − 1 do // calculation
5 if ϕ mod 2 = 0 then // apply Equation (4) Note that the total space needed to hold the P arrays has
6 for u0 ∈ {0, 1} do P gone down from O(n log n) to O(n). We would now like to do
7 Pλ [hϕ, βi][u0 ] ← u00 21 Pλ−1 [hψ, 2βi][u0 ⊕ u00 ] · the same for the B arrays. However, as things are currently
8 Pλ−1 [hψ, 2β + 1i][u00 ] stated, we can not disregard the phase, as can be seen for
9 else // apply Equation (5) example in line 3 of Algorithm 4. The solution is a simple
10 set u0 ← Bλ [hϕ − 1, βi] renaming. As a first step, let us define for each 0 ≤ λ ≤ m an
11 for u00 ∈ {0, 1} do array Cλ consisting of bit pairs and having length n/2. Next,
12 Pλ [hϕ, βi][u00 ] ← 12 Pλ−1 [hψ, 2βi][u0 ⊕ u00 ] ·
13 Pλ−1 [hψ, 2β + 1i][u00 ] let a generic reference of the form Bλ [hϕ, βi] be replaced by
Cλ [ψ + β · 2λ−1 ][ϕ mod 2], where ψ = bϕ/2c. Note that we
have done nothing more than rename the elements of Bλ as
elements of Cλ . However, we now see that as before we can
Algorithm 4: recursivelyUpdateB(λ, ϕ) implementation I disregard the value of ψ and take note only of the parity of ϕ.
Require : ϕ is odd So, let us make one more substitution: replace every instance
1 set ψ ← bϕ/2c of Cλ [ψ+β ·2λ−1 ][ϕ mod 2] by Cλ [β][ϕ mod 2], and resize
2 for β = 0, 1, . . . , 2m−λ − 1 do each array Cλ to have 2m−λ bit pairs. To sum up,
3 Bλ−1 [hψ, 2βi] ← Bλ [hϕ − 1, βi] ⊕ Bλ [hϕ, βi]
4 Bλ−1 [hψ, 2β + 1i] ← Bλ [hϕ, βi] Bλ [hϕ, βi] is replaced by Cλ [β][ϕ mod 2] . (12)
5 if ψ mod 2 = 1 then
6 recursivelyUpdateB(λ − 1, ψ) The alert reader will notice that a further reduction in space
is possible: for λ = 0 we will always have that ϕ = 0, and
Both the implicit and explicit claims are easily derived from thus the parity of ϕ is always even. However, this reduction
the following observation. For a given 0 ≤ ϕ < n, consider does not affect the asymptotic space complexity which is
iteration ϕ of the main loop in Algorithm 2. Fix a layer 0 ≤ now indeed down to O(n). The revised algorithm is given
λ ≤ m, and a branch 0 ≤ β < 2m−λ . If we suspend the run as Algorithms 5–7.
of the algorithm just after the iteration ends, then (9) holds
with ϕ0 instead of ϕ, for all Algorithm 5: Space efficient SC decoder, main loop
j ϕ k Input: the received vector y
0 ≤ ϕ0 ≤ m−λ . Output: a decoded codeword ĉ
2
1 for β = 0, 1, . . . , n − 1 do // Initialization
Similarly, (10) holds with ϕ0 instead of ϕ, for all 2 set P0 [β][0] ← W (yβ |0), P0 [β][1] ← W (yβ |1)
 
0 ϕ+1 3 for ϕ = 0, 1, . . . , n − 1 do // Main loop
0 ≤ ϕ < m−λ .
2 4 recursivelyCalcP(m, ϕ)
5 if uϕ is frozen then
The above observation is proved by induction on ϕ. 6 set Cm [0][ϕ mod 2] to the frozen value of uϕ
7 else
III. S PACE -E FFICIENT S UCCESSIVE C ANCELLATION 8 if Pm [0][0] > Pm [0][1] then
D ECODING 9 set Cm [0][ϕ mod 2] ← 0
10 else
The running time of the SC decoder is O(n log n), and our 11 set Cm [0][ϕ mod 2] ← 1
implementation is no exception. As we have previously noted,
the space complexity of our algorithm is O(n log n) as well. 12 if ϕ mod 2 = 1 then
13 recursivelyUpdateC(m, ϕ)
However, we will now show how to bring the space complexity
down to O(n). The observation that one can reduce the space 14 return the decoded codeword: ĉ = (C0 [β][0])n−1
β=0
complexity to O(n) was noted, in the context of VLSI design,
in [8].
We end this subsection by mentioning that although we were
As a first step towards this end, consider the probability
concerned here with reducing the space complexity of our SC
pair array Pm . By examining the main loop in Algorithm 2,
decoder, the observations made with this goal in mind will
we quickly see that if we are currently at phase ϕ, then we
be of great use in analyzing the time complexity of our list
will never again make use of Pm [hϕ0 , 0i] for all ϕ0 < ϕ. On
decoder.
the other hand, we see that Pm [hϕ00 , 0i] is uninitialized for all
ϕ00 > ϕ. Thus, instead of reading and writing to Pm [hϕ, 0i],
we can essentially disregard the phase information, and use IV. S UCCESSIVE C ANCELLATION L IST D ECODER
only the first element Pm [0] of the array, discarding all the In this section we introduce and define our algorithm, the
rest. By the recursive nature of polar codes, this observation successive cancellation list (SCL) decoder. Our list decoder
5

Algorithm 6: recursivelyCalcP(λ, ϕ) space-efficient Consider the following outline for a naive implementation
Input: layer λ and phase ϕ of an SCL decoder. Each time a decoding path is split into
1 if λ = 0 then return // Stopping condition two forks, the data structures used by the “parent” path are
2 set ψ ← bϕ/2c duplicated, with one copy given to the first fork and the other
// Recurse first, if needed to the second. Since the number of splits is Ω(L · n), and
3 if ϕ mod 2 = 0 then recursivelyCalcP(λ − 1, ψ) since the size of the data structures used by each path is
// Perform the calculation
4 for β = 0, 1, . . . , 2m−λ − 1 do
Ω(n), the copying operation alone would take time Ω(L · n2 ).
5 if ϕ mod 2 = 0 then // apply Equation (4) This running time is clearly impractical for all but the short-
6 for u0 ∈ {0, 1} do est of codes. However, all known (to us) implementations
7 Pλ [β][u0 ] ← of successive cancellation list decoding have complexity at
P 1 0 00 00
u00 2 Pλ−1 [2β][u ⊕ u ] · Pλ−1 [2β + 1][u ] least Ω(L · n2 ). Our main contribution in this section is the
8 else // apply Equation (5) following: we show how to implement SCL decoding with
9 set u0 ← Cλ [β][0] time complexity O(L · n log n) instead of Ω(L · n2 ).
10 for u00 ∈ {0, 1} do
The key observation is as follows. Consider the P arrays of
11 Pλ [β][u00 ] ← 12 Pλ−1 [2β][u0 ⊕u00 ]·Pλ−1 [2β+1][u00 ]
the last section, and recall that the size of Pλ is proportional
to 2m−λ . Thus, the cost of copying Pλ grows exponentially
small with λ. On the other hand, looking at the main loop of
Algorithm 7: recursivelyUpdateC(λ, ϕ) space-efficient Algorithm 5 and unwinding the recursion, we see that Pλ is
Input: layer λ and phase ϕ accessed only every 2m−λ incrementations of ϕ. Put another
Require : ϕ is odd way, the bigger Pλ is, the less frequently it is accessed. The
1 set ψ ← bϕ/2c same observation applies to the C arrays. This observation
2 for β = 0, 1, . . . , 2m−λ − 1 do suggest the use of a “lazy-copy”. Namely, at each given stage,
3 Cλ−1 [2β][ψ mod 2] ← Cλ [β][0] ⊕ Cλ [β][1] the same array may be flagged as belonging to more than one
4 Cλ−1 [2β + 1][ψ mod 2] ← Cλ [β][1]
decoding path. However, when a given decoding path needs
5 if ψ mod 2 = 1 then access to an array it is sharing with another path, a copy is
6 recursivelyUpdateC(λ − 1, ψ)
made.
has a parameter L, called the list size. Generally speaking,
larger values of L mean lower error rates but longer running A. Low-level functions
times. We note at this point that successive cancellation list
decoding is not a new idea: it was applied in [9] to Reed- We now discuss the low-level functions and data structures
Muller codes1 . by which the “lazy-copy” methodology is realized. We note
Recall the main loop of an SC decoder, where at each phase in advance that since our aim was to keep the exposition as
we must decide on the value of ûϕ . In an SCL decoder, instead simple as possible, we have avoided some obvious optimiza-
of deciding to set the value of an unfrozen ûϕ to either a 0 tions. The following data structures are defined and initialized
or a 1, we inspect both options. Namely, when decoding a in Algorithm 8.
non-frozen bit, we split the decoding path into two paths (see
Figure 4). Since each split doubles the number of paths to be Algorithm 8: initializeDataStructures()
examined, we must prune them, and the maximum number of 1 inactivePathIndices ← new stack with capacity L
paths allowed is the specified list size, L. Naturally, we would 2 activePath ← new boolean array of size L
like to keep the “best” paths at each stage, and thus require 3 arrayPointer P ← new 2-D array of size (m + 1) × L, the
a pruning criterion. Our pruning criterion will be to keep the elements of which are array pointers
4 arrayPointer C ← new 2-D array of size (m + 1) × L, the
most likely paths. elements of which are array pointers
1 In
5 pathIndexToArrayIndex ← new 2-D array of size (m + 1) × L
a somewhat different version of successive cancellation than that of 6 inactiveArrayIndices ← new array of size m + 1, the elements
Arıkan’s, at least in exposition. of which are stacks with capacity L
0 1
7 arrayReferenceCount ← new 2-D array of size (m + 1) × L
// Initialization of data structures
8 for λ = 0, 1, . . . , m do
0 1 0 1 9 for s = 0, 1, . . . , L − 1 do
10 arrayPointer P[λ][s] ← new array of float pairs of
size 2m−λ
0 1 0 1 0 1 0 1 11 arrayPointer C[λ][s] ← new array of bit pairs of size
2m−λ
0 1 0 1 0 1 0 1 12 arrayReferenceCount[λ][s] ← 0
13 push(inactiveArrayIndices[λ], s)

Fig. 4. Decoding paths of unfrozen bits for L = 4: each level has at most
14 for ` = 0, 1, . . . , L − 1 do
4 nodes with paths that continue downward. Discontinued paths are colored 15 activePath[`] ← false
gray. 16 push(inactivePathIndices, `)
6

Each path will have an index `, where 0 ≤ ` < L. At Algorithm 9: assignInitialPath()


first, only one path will be active. As the algorithm runs Output: index ` of initial path
its course, paths will change states between “active” and 1 ` ← pop(inactivePathIndices)
“inactive”. The inactivePathIndices stack [10, Section 10.1] 2 activePath[`] ← true
will hold the indices of the inactive paths. We assume the // Associate arrays with path index
“array” implementation of a stack, in which both “push” and 3 for λ = 0, 1, . . . , m do
4 s ← pop(inactiveArrayIndices[λ])
“pop” operations take O(1) time and a stack of capacity L 5 pathIndexToArrayIndex[λ][`] ← s
takes O(L) space. The activePath array is a boolean array 6 arrayReferenceCount[λ][s] ← 1
such that activePath[`] is true iff path ` is active. Note that, 7 return `
essentially, both inactivePathIndices and activePath store
the same information. The utility of this redundancy will be
made clear shortly. Algorithm 10: clonePath(`)
For every layer λ, we will have a “bank” of L probability- Input: index ` of path to clone
Output: index `0 of copy
pair arrays for use by the active paths. At any given moment,
some of these arrays might be used by several paths, while 1 `0 ← pop(inactivePathIndices)
2 activePath[`0 ] ← true
others might not be used by any path. Each such array is
// Make `0 reference same arrays as `
pointed to by an element of arrayPointer P. Likewise, we 3 for λ = 0, 1, . . . , m do
will have a bank of bit-pair arrays, pointed to by elements of 4 s ← pathIndexToArrayIndex[λ][`]
arrayPointer C. 5 pathIndexToArrayIndex[λ][`0 ] ← s
The pathIndexToArrayIndex array is used as follows. For 6 arrayReferenceCount[λ][s]++
a given layer λ and path index `, the probability-pair array and 7 return `0
bit-pair array corresponding to layer λ of path ` are pointed with the path must have their reference count decreased by
to by one.
arrayPointer P[λ][pathIndexToArrayIndex[λ][`]]
Algorithm 11: killPath(`)
and Input: index ` of path to kill
// Mark the path index ` as inactive
arrayPointer C[λ][pathIndexToArrayIndex[λ][`]] , 1 activePath[`] ← false
2 push(inactivePathIndices, `)
respectively. // Disassociate arrays with path index
Recall that at any given moment, some probability-pair 3 for λ = 0, 1, . . . , m do
4 s ← pathIndexToArrayIndex[λ][`]
and bit-pair arrays from our bank might be used by multiple 5 arrayReferenceCount[λ][s]−−
paths, while others may not be used by any. The value 6 if arrayReferenceCount[λ][s] = 0 then
of arrayReferenceCount[λ][s] denotes the number of paths 7 push(inactiveArrayIndices[λ], s)
currently using the array pointed to by arrayPointer P[λ][s].
Note that this is also the number of paths making use of
arrayPointer C[λ][s]. The index s is contained in the stack The goal of all previously discussed low-level functions was
inactiveArrayIndices[λ] iff arrayReferenceCount[λ][s] is essentially to enable the abstraction implemented by the func-
zero. tions getArrayPointer_P and getArrayPointer_C.
Now that we have discussed how the data structures are The function getArrayPointer_P is called each time
initialized, we continue and discuss the low-level functions a higher-level function needs to access (either for read-
by which paths are made active and inactive. We start by ing or writing) the probability-pair array associated with
mentioning Algorithm 9, by which the initial path of the a certain path ` and layer λ. The implementation of
algorithm is assigned and allocated. In words, we choose getArrayPointer_P is give in Algorithm 12. There are
a path index ` that is not currently in use (none of them two cases to consider: either the array is associated with more
are), and mark it as used. Then, for each layer λ, we mark than one path or it is not. If it is not, then nothing needs to
(through pathIndexToArrayIndex) an index s such that both be done, and we return a pointer to the array. On the other
arrayPointer P[λ][s] and arrayPointer C[λ][s] are allocated hand, if the array is shared, we make a private copy for path
to the current path. `, and return a pointer to that copy. By doing so, we ensure
Algorithm 10 is used to clone a path — the final step before that two paths will never write to the same array. The function
splitting that path in two. The logic is very similar to that of getArrayPointer_C is used in the same manner for bit-
Algorithm 9, but now we make the two paths share bit-arrays pair arrays, and has exactly the same implementation, up to
and probability arrays. the obvious changes.
Algorithm 11 is used to terminate a path, which is achieved At this point, we remind the reader that we are deliberately
by marking it as inactive. After this is done, the arrays marked sacrificing speed for simplicity. Namely, each such function
as associated with the path must be dealt with as follows. Since is called either before reading or writing to an array, but the
the path is inactive, we think of it as not having any associated copy operation is really needed only before writing.
arrays, and thus all the arrays that were previously associated We have now finished defining almost all of our low-level
7

Algorithm 12: getArrayPointer P(λ, `) I A path index ` is active by Definition 1 iff


Input: layer λ and path index ` activePath[`] is true iff inactivePathIndices does
Output: pointer to corresponding probability pair array not contain the index `.
// getArrayPointer_C(λ, `) is defined II The bracketed expression in (13) is the number of
identically, up to the obvious changes active paths at the end of stage t.
in lines 6 and 10 III The value of arrayReferenceCount[λ][s] is positive
1 s ← pathIndexToArrayIndex[λ][`] iff the stack inactiveArrayIndices[λ] does not con-
2 if arrayReferenceCount[λ][s] = 1 then tain the index s, and is zero otherwise.
3 s0 ← s
4 else IV The value of arrayReferenceCount[λ][s] is equal
5 s0 ← pop(inactiveArrayIndices[λ]) to the number of active paths ` for which
6 copy the contents of the array pointed to by pathIndexToArrayIndex[λ][`] = s.
arrayPointer P[λ][s] into that pointed to by
arrayPointer P[λ][s0 ] We are now close to formalizing the utility of our low-
7 arrayReferenceCount[λ][s]−−
8 arrayReferenceCount[λ][s0 ] ← 1 level functions. But first, we must formalize the concept of a
9 pathIndexToArrayIndex[λ][`] ← s0 descendant path. Let (ft )Tt=0 be a valid sequence of calls. Next,
10 return arrayPointer P[λ][s0 ] let ` be an active path index at the end of stage 1 ≤ t < T .
Henceforth, let us abbreviate the “phrase path index ` at the
functions. At this point, we should specify the constraints one end of stage t” by “[`, t]”. We say that [`0 , t + 1] is a child of
should follow when using them and what one can expect if [`, t] if i) `0 is active at the end of stage t + 1, and ii) either
these constraints are met. We start with the former. `0 = ` or ft+1 was the clonePath operation with input `
Definition 1 (Valid calling sequence): Consider a sequence and output `0 . Likewise, we say that [`0 , t0 ] is a descendant of
(ft )Tt=0 of T + 1 calls to the low-level functions implemented [`, t] if 1 ≤ t ≤ t0 and there is a (possibly empty) hereditary
in Algorithms 8–12. We say that the sequence is valid if the chain.
following traits hold. We now broaden our definition of a valid function calling
Initialized: The one and only index t for which ft is equal sequence by allowing reads and writes to arrays.
to initializeDataStructures is t = 0. The one and Fresh pointer: consider the case where t > 1 and ft is ei-
only index t for which ft is equal to assignInitialPath ther the getArrayPointer_P or getArrayPointer_C
is t = 1. function with input (λ, `) and output p. Then, for valid indices
Balanced: For 1 ≤ t ≤ T , denote the number of times the i, we allow read and write operations to p[i] after stage t
function clonePath was called up to and including stage t but only before any stage t0 > t for which ft0 is either
as clonePath or killPath.
(t) Informally, the following lemma states that each path effec-
#clonePath = | {1 ≤ i ≤ t : fi is clonePath} | .
(t)
tively sees a private set of arrays.
Define #killPath similarly. Then, for every 1 ≤ t ≤ L, we Lemma 4: Let (ft )Tt=0 be a valid sequence of calls to the
require that low-level functions implemented in Algorithms 8–12. Assume
the read/write operations between stages satisfy the “fresh
 
(t) (t)
1 ≤ 1 + #clonePath − #killPath ≤ L . (13)
pointer” condition.
Active: We say that path ` is active at the end of stage Let the function ft be getArrayPointer_P with input
1 ≤ t ≤ T if the following two conditions hold. First, there (λ, `) and output p. Similarly, for stage t0 ≥ t, let ft0 be
exists an index 1 ≤ i ≤ t for which fi is either clonePath getArrayPointer_P with input (λ, `0 ) and output p0 .
with corresponding output ` or assignInitialPath with Assume that [`0 , t0 ] is a descendant of [`, t].
output `. Second, there is no intermediate index i < j ≤ t for Consider a “fresh pointer” write operation to p[i]. Similarly,
which fj is killPath with input `. For each 1 ≤ t < T we consider a “fresh pointer” read operation from p0 [i] carried out
require that if ft+1 has input `, then ` is active at the end of after the “write” operation. Then, assuming no intermediate
stage t. “write” operations of the above nature, the value written is
We start by stating that the most basic thing one would the value read.
expect to hold does indeed hold. A similar claim holds for getArrayPointer_C.
Lemma 3: Let (ft )Tt=0 be a valid sequence of calls to the Proof: With the observations made in the proof of
low-level functions implemented in Algorithms 8–12. Then, Lemma 3 at hand, a simple induction on t is all that is needed.
the run is well defined: i) A “pop” operation is never carried
out on a empty stack, ii) a “push” operation never results in a We end this section by noting that the function
stack with more than L elements, and iii) a “read” operation pathIndexInactive given in Algorithm 13 is simply a
from any array defined in lines 2–7 of Algorithm 8 is always shorthand, meant to help readability later on.
preceded by a “write” operation to the same location in the
array. B. Mid-level functions
Proof: The proof boils-down to proving the following In this section we introduce Algorithms 14 and 15, our new
four statements concurrently for the end of each step 1 ≤ t ≤ implementation of Algorithms 6 and 7, respectively, for the
T , by induction on t. list decoding setting.
8

Algorithm 13: pathIndexInactive(`) Lemma 4 holds.


Input: path index `
Output: true if path ` is active, and false otherwise Algorithm 15: recursivelyUpdateC(λ, ϕ) list version
1 if activePath[`] = true then Input: layer λ and phase ϕ
2 return false Require : ϕ is odd
3 else
1 set Cλ ← getArrayPointer_C(λ, `)
4 return true
2 set Cλ−1 ← getArrayPointer_C(λ − 1, `)
3 set ψ ← bϕ/2c
4 for ` = 0, 1, . . . , L − 1 do
Algorithm 14: recursivelyCalcP(λ, ϕ) list version 5 if pathIndexInactive(`) then
Input: layer λ and phase ϕ 6 continue

1 if λ = 0 then return // Stopping condition 7 for β = 0, 1, . . . , 2m−λ − 1 do


2 set ψ ← bϕ/2c 8 Cλ−1 [2β][ψ mod 2] ← Cλ [β][0] ⊕ Cλ [β][1]
// Recurse first, if needed 9 Cλ−1 [2β + 1][ψ mod 2] ← Cλ [β][1]
3 if ϕ mod 2 = 0 then recursivelyCalcP(λ − 1, ψ)
10 if ψ mod 2 = 1 then
// Perform the calculation
11 recursivelyUpdateC(λ − 1, ψ)
4 σ←0
5 for ` = 0, 1, . . . , L − 1 do
6 if pathIndexInactive(`) then
7 continue We now consider the normalization step carried out in
8 Pλ ← getArrayPointer_P(λ, `) lines 21–27 of Algorithm 14. Recall that a floating-point
9 Pλ−1 ← getArrayPointer_P(λ − 1, `) variable can not be used to hold arbitrarily small positive reals,
10 Cλ ← getArrayPointer_C(λ, `) and in a typical implementation, the result of a calculation that
11 for β = 0, 1, . . . , 2m−λ − 1 do is “too small” will be rounded to 0. This scenario is called an
12 if ϕ mod 2 = 0 then “underflow”.
// apply Equation (4)
13 for u0 ∈ {0, 1} do We now confess that all our previous implementations of
P 0
]← SC decoders were prone to “underflow”. To see this, consider
14
Pλ [β][u 1 0 00 00
u00 2 Pλ−1 [2β][u ⊕ u ] · Pλ−1 [2β + 1][u ] line 1 in the outline implementation given in Algorithm 2.
15 σ ← max (σ, Pλ [β][u0 ]) Denote by Y and U the random vectors corresponding to y
16 else // apply Equation (5) and u, respectively. For b ∈ {0, 1} we have that
17 set u0 ← Cλ [β][0]
18 for u00 ∈ {0, 1} do Wm (y0 , ûϕ−1
(ϕ) n−1
|b) =
0
19 Pλ [β][u00 ] ←
1
P [2β][u0 ⊕ u00 ] · Pλ−1 [2β + 1][u00 ] 2 · P(Y0n−1 = y0n−1 , Uϕ−1
0 = ûϕ−1
0 , Uϕ = b) ≤
2 λ−1
20 σ ← max (σ, Pλ [β][u00 ]) 2 · P(Uϕ−1 = ûϕ−1 , Uϕ = b) = 2−ϕ .
0 0

// normalize probabilities Recall that ϕ iterates from 0 to n − 1. Thus, for codes having
21 for ` = 0, 1, . . . , L − 1 do length greater than some small constant, the comparison in
22 if pathIndexInactive(`) then line 1 of Algorithm 2 ultimately becomes meaningless, since
23 continue both probabilities are rounded to 0. The same holds for all of
24 Pλ ← getArrayPointer_P(λ, `) our previous implementations.
25 for β = 0, 1, . . . , 2m−λ − 1 do Luckily, there is a simple fix to this problem. After the
26 for u ∈ {0, 1} do
probabilities are calculated in lines 5–20 of Algorithm 14, we
27 Pλ [β][u] ← Pλ [β][u]/σ
normalize2 the highest probability to be 1 in lines 21–27.
We claim that apart for avoiding underflows, normalization
One first notes that our new implementations loop does not alter our algorithm. The following lemma formalizes
over all path indices `. Thus, our new implementations this claim.
make use of the functions getArrayPointer_P and Lemma 5: Assume that we are working with “perfect”
getArrayPointer_C in order to assure that the con- floating-point numbers. That is, our floating-point variables are
sistency of calculations is preserved, despite multiple paths infinitely accurate and do not suffer from underflow/overflow.
sharing information. In addition, Algorithm 6 contains code Next, consider a variant of Algorithm 14, termed Algo-
to normalize probabilities. The normalization is needed for a rithm 14’, in which just before line 21 is first executed,
technical reason (to avoid floating-point underflow), and will the variable σ is set to 1. That is, effectively, there is no
be expanded on shortly. normalization of probabilities in Algorithm 14’.
We start out by noting that the “fresh pointer” condition Consider two runs, one of Algorithm 14 and one of Algo-
we have imposed on ourselves indeed holds. To see this, rithm 14’. In both runs, the input parameters to both algorithms
consider first Algorithm 14. The key point to note is that are the same. Moreover, assume that in both runs, the state
neither the killPath nor the clonePath function is called 2 This correction does not assure us that underflows will not occur. However,
from inside the algorithm. The same observation holds for now, the probability of a meaningless comparison due to underflow will be
Algorithm 15. Thus, the “fresh pointer” condition is met, and extremely low.
9

of the auxiliary data structures is the same, apart for the


following.
Recall that our algorithm is recursive, and let λ0 be the Algorithm 17: continuePaths FrozenBit(ϕ)
first value of the variable λ for which line 5 is executed. That Input: phase ϕ
is, λ0 is the layer in which (both) algorithms do not perform
1 for ` = 0, 1, . . . , L − 1 do
preliminary recursive calculations. Assume that when we are 2 if pathIndexInactive(`) then continue
at this base stage, λ = λ0 , the following holds: the values read 3 Cm ← getArrayPointer_C(m, `)
from Pλ−1 in lines 15 and 20 in the run of Algorithm 14 are 4 set Cm [0][ϕ mod 2] to the frozen value of uϕ
a multiple by αλ−1 of the corresponding values read in the
run of Algorithm 14’. Then, for every λ ≥ λ0 , there exist a
constant αλ such that the values written to Pλ in line 27 in the
run of Algorithm 14 are a multiple by αλ of the corresponding
values written by Algorithm 14’.
Proof: For the base case λ = λ0 we have by inspection
that the constant αλ is simply (αλ−1 )2 , divided by the value of Algorithm 18: continuePaths UnfrozenBit(ϕ)
σ after the main loop has finished executing in Algorithm 14. Input: phase ϕ
The claim for a general λ follows by induction. 1 probForks ← new 2-D float array of size L × 2
2 i←0
C. High-level functions // populate probForks
3 for ` = 0, 1, . . . , L − 1 do
We now turn our attention to the high-level functions of 4 if pathIndexInactive(`) then
our algorithm. Consider the topmost function, the main loop 5 probForks [`][0] ← −1
given in Algorithm 16. We start by noting that by lines 1 6 probForks [`][1] ← −1
and 2, we have that condition “initialized” in Definition 1 is 7 else
8 Pm ← getArrayPointer_P(m, `)
satisfied. Also, for the inductive basis, we have that condition 9 probForks [`][0] ← Pm [0][0]
“balanced” holds for t = 1 at the end of line 2. Next, notice 10 probForks [`][1] ← Pm [0][1]
that lines 3–5 are in-line with our “fresh pointer” condition. 11 i←i+1
The main loop, lines 6–13, is the analog of the main loop
12 ρ ← min(2i, L)
in Algorithm 5. After the main loop has finished, we pick (in
13 contForks ← new 2-D boolean array of size L × 2
lines 14–16) the most likely codeword from our list and return // The following is possible in O(L) time
it. 14 populate contForks such that contForks[`][b] is true iff
probForks [`][b] is one of the ρ largest entries in probForks
Algorithm 16: SCL decoder, main loop (and ties are broken arbitrarily)
Input: the received vector y and a list size L as a global // First, kill-off non-continuing paths
Output: a decoded codeword ĉ 15 for ` = 0, 1, . . . , L − 1 do
16 if pathIndexInactive(`) then
// Initialization 17 continue
1 initializeDataStructures()
2 ` ← assignInitialPath() 18 if contForks[`][0] = false and contForks[`][1] = false
3 P0 ← getArrayPointer_P(0, `) then
4 for β = 0, 1, . . . , n − 1 do 19 killPath(`)
5 set P0 [β][0] ← W (yβ |0), P0 [β][1] ← W (yβ |1)
// Then, continue relevant paths, and
// Main loop duplicate if necessary
6 for ϕ = 0, 1, . . . , n − 1 do 20 for ` = 0, 1, . . . , L − 1 do
7 recursivelyCalcP(m, ϕ) 21 if contForks[`][0] = false and contForks[`][1] = false
8 if uϕ is frozen then then // both forks are bad, or invalid
9 continuePaths_FrozenBit(ϕ) 22 continue
10 else
23 Cm ← getArrayPointer_C(m, `)
11 continuePaths_UnfrozenBit(ϕ)
24 if contForks[`][0] = true and contForks[`][1] = true then
12 if ϕ mod 2 = 1 then // both forks are good
13 recursivelyUpdateC (m, ϕ) 25 set Cm [0][ϕ mod 2] ← 0
26 `0 ← clonePath(`)
// Return the best codeword in the list 27 Cm ← getArrayPointer_C(m, `0 )
14 ` ← findMostProbablePath() 28 set Cm [0][ϕ mod 2] ← 1
15 set C0 ← getArrayPointer_C(0, `) 29 else// exactly one fork is good
16 return ĉ = (C0 [β][0])n−1
β=0 30 if contForks[`][0] = true then
31 set Cm [0][ϕ mod 2] ← 0
32 else
We now expand on Algorithms 17 and 18. Algorithm 17 33 set Cm [0][ϕ mod 2] ← 1
is straightforward: it is the analog of line 6 in Algorithm 5,
applied to all active paths.
Algorithm 18 is the analog of lines 8–11 in Algorithm 5.
However, now, instead of choosing the most likely fork out of
10

2 possible forks, we must typically choose the L most likely Proof: Recall that by our notation m = log n. The
forks out of 2L possible forks. The most interesting line is following bottom-to-top table summarizes the running time
14, in which the best ρ forks are marked. Surprisingly3 , this of each function. The notation OΣ will be explained shortly.
can be done in O(L) time [10, Section 9.3]. After the forks function running time
are marked, we first kill the paths for which both forks are
discontinued, and then continue paths for which one or both initializeDataStructures() O(L · m)
are the forks are marked. In case of the latter, the path is first assignInitialPath() O(m)
split. Note that we must first kill paths and only then split paths clonePath(`) O(m)
in order for the “balanced” constraint (13) to hold. Namely, killPath(`) O(m)
this way, we will not have more than L active paths at a time. getArrayPointer_P(λ, `) O(2m−λ )
The point of Algorithm 18 is to prune our list and leave only getArrayPointer_C(λ, `) O(2m−λ )
the L “best” paths. This is indeed achieved, in the following pathIndexInactive(`) O(1)
sense. At stage ϕ we would like to rank each path according recursivelyCalcP(m, ·) OΣ (L · m · n)
the the probability recursivelyUpdateC(m, ·) OΣ (L · m · n)
continuePaths_FrozenBit(ϕ) O(L)
Wm (y0 , ûϕ−1
(ϕ) n−1
0 |ûϕ ) . continuePaths_FrozenBit(ϕ) O(L · m)
By (9) and (11), this would indeed by the case if our floating findMostProbablePath O(L)
point variables were “perfect”, and the normalization step SCL decoder O(L · m · n)
in lines 21–27 of Algorithm 14 were not carried out. By The first 7 functions in the table, the low-level func-
Lemma 5, we see that this is still the case if normalization tions, are easily checked to have the stated running time.
is carried out. Note that the running time of getArrayPointer_P and
The last algorithm we consider in this section is Algo- getArrayPointer_C is due to the copy operation in line 6
rithm 19. In it, the most probable path is selected from the of Algorithm 6 applied to an array of size O(2m−λ ). Thus,
final list. As before, by (9)–(12) and Lemma 5, the value of as was previously mentioned, reducing the size of our arrays
Pm [0][Cm [0][1]] is simply has helped us reduce the running time of our list decoding
1 algorithm.
(n−1)
Wm (y0n−1 , ûn−2
0 |ûn−1 ) = · P (y0n−1 |ûn−1
0 ), Next, let us consider the two mid-level functions, namely,
2n−1
up to a normalization constant. recursivelyCalcP and recursivelyUpdateC. The
notation
Algorithm 19: findMostProbablePath() recursivelyCalcP(m, ·) ∈ OΣ (L · m · n)
Output: the index `0 of the most probable path
means that total running time of the n function calls
1 `0 ← 0, p0 ← 0
2 for ` = 0, 1, . . . , L − 1 do recursivelyCalcP(m, ϕ) , 0 ≤ ϕ < 2m
3 if pathIndexInactive(`) then
4 continue is O(L · m · n). To see this, denote by f (λ) the total running
5 Cm ← getArrayPointer_C(m, `) time of the above with m replaced by λ. By splitting the
6 Pm ← getArrayPointer_P(m, `) running time of Algorithm 14 into a non-recursive part and a
7 if p0 < Pm [0][Cm [0][1]] then recursive part, we have that for λ > 0
8 `0 ← `, p0 ← Pm [0][Cm [0][1]]
f (λ) = 2λ · O(L · 2m−λ ) + f (λ − 1) .
9 return `0
Thus, it easily follows that
We now prove our two main result. f (m) ∈ O(L · m · 2m ) = O(L · m · n) .
Theorem 6: The space complexity of the SCL decoder is
O(L · n). In essentially the same way, we can prove that the total running
Proof: All the data-structures of our list decoder are time of the recursivelyUpdateC(m, ϕ) over all 2n−1
allocated in Algorithm 8, and it can be checked that the total valid (odd) values of ϕ is O(m · n). Note that the two mid-
space used by them is O(L · n). Apart from these, the space level functions are invoked in lines 7 and 13 of Algorithm 16,
complexity needed in order to perform the selection operation on all valid inputs.
in line 14 of Algorithm 18 is O(L). Lastly, the various The running time of the high-level functions is easily
local variables needed by the algorithm take O(1) space, and checked to agree with the table.
the stack needed in order to implement the recursion takes
O(log n) space.
Theorem 7: The running time of the SCL decoder is O(L · V. M ODIFIED POLAR CODES
n log n). The plots in Figure 5 were obtained by simulation. The
3 The O(L) time result is rather theoretical. Since L is typically a small
performance of our decoder for various list sizes is given by
number, the fastest way to achieve our selection goal would be through simple the solid lines in the figure. As expected, we see that as the
sorting. list size L increases, the performance of our decoder improves.
11

Legend:
Section 8.8] value4 of the first k − r unfrozen bits. Note this
10−1 n = 2048, L = 1 new encoding is a slight variation of our polar coding scheme.
Word error rate

n = 2048, L = 2
n = 2048, L = 4
Also, note that we incur a penalty in rate, since the rate of
10−2
n = 2048, L = 8 our code is now (k − r)/n instead of the previous k/n.
n = 2048, L = 16 What we have gained is an approximation to a genie: at
10−3
n = 2048, L = 32 the final stage of decoding, instead of calling the function
n = 2048, ML bound
10−4 findMostProbablePath in Algorithm 19, we can do
the following. A path for which the CRC is invalid can not
1.0 1.5 2.0 2.5 3.0
Signal-to-noise ratio (Eb /N0 ) [dB] correspond to the transmitted codeword. Thus, we refine our
selection as follows. If at least one path has a correct CRC,
Legend: then we remove from our list all paths having incorrect CRC
10−1
n = 8192, L = 1 and then choose the most likely path. Otherwise, we select the
Word error rate

10−2 n = 8192, L = 2 most likely path in the hope of reducing the number of bits
n = 8192, L = 4
10−3 n = 8192, L = 8
in error, but with the knowledge that we have at least one bit
n = 8192, L = 16 in error.
10−4
n = 8192, L = 32 Figures 1 and 2 contain a comparison of decoding per-
10−5 n = 8192, ML bound formance between the original polar codes and the slightly
10−6 tweaked version presented in this section. A further im-
1.0 1.5 2.0 2.5
provement in bit-error-rate (but not in block-error-rate) is
Signal-to-noise ratio (Eb /N0 ) [dB]
attained when the decoding is performed systematically [12].
Fig. 5. Word error rate of a length n = 2048 (top) and n = 8192 (bottom) The application of systematic polar-coding to a list decoding
rate 1/2 polar code optimized for SNR=2 dB under various list sizes. Code setting is attributed to [13].
construction was carried out via the method proposed in [4].

R EFERENCES
We also notice a diminishing-returns phenomenon in terms of [1] E. Arıkan, “Channel polarization: A method for constructing capacity-
increasing the list size. The reason for this turns out to be achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inform. Theory, vol. 55, pp. 3051–3073, 2009.
simple. [2] E. Arıkan and E. Telatar, “On the rate of channel polarization,” in Proc.
IEEE Int’l Symp. Inform. Theory (ISIT’2009), Seoul, South Korea, 2009,
The dashed line, termed the “ML bound” was obtained as pp. 1493–1495.
follows. During our simulations for L = 32, each time a [3] S. B. Korada, E. Şaşoğlu, and R. Urbanke, “Polar codes: Character-
decoding failure occurred, we checked whether the decoded ization of exponent, bounds, and constructions,” IEEE Trans. Inform.
Theory, vol. 56, pp. 6253–6264, 2010.
codeword was more likely than the transmitted codeword. That [4] I. Tal and A. Vardy, “How to construct polar codes,” submitted to IEEE
is, whether W (y|ĉ) > W (y|c). If so, then the optimal ML Trans. Inform. Theory, available online as arXiv:1105.6164v2,
decoder would surely misdecode y as well. The dashed line 2011.
[5] G. Wiechman and I. Sason, “An improved sphere-packing bound for
records the frequency of the above event, and is thus a lower- finite-length codes over symmetric memoryless channels,” IEEE Trans.
bound on the error probability of the ML decoder. Thus, for Inform. Theory, vol. 54, pp. 1962–1990, 2008.
an SNR value greater than about 1.5 dB, Figure 1 suggests [6] TurboBest, “IEEE 802.16e LDPC Encoder/Decoder Core.” [Online].
Available: http://www.turbobest.com/tb ldpc80216e.htm
that we have an essentially optimal decoder when L = 32. [7] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the
finite blocklength regime,” IEEE Trans. Inform. Theory, vol. 56, pp.
Can we do even better? At first, the answer seems to be an 2307–2359, 2010.
obvious “no”, at least for the region in which our decoder is [8] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware ar-
essentially optimal. However, it turns out that if we are willing chitectures for successive cancellation decoding of polar codes,”
arXiv:1011.2919v1, 2010.
to accept a small change in our definition of a polar code, we [9] I. Dumer and K. Shabunov, “Soft-decision decoding of Reed-Muller
can dramatically improve performance. codes: recursive lists,” IEEE Trans. Inform. Theory, vol. 52, pp. 1260–
1266, 2006.
During simulations we noticed that often, when a decoding [10] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction
error occurred, the path corresponding to the transmitted to Algorithms, 2nd ed. Cambridge, Massachusetts: The MIT Press,
codeword was a member of the final list. However, since there 2001.
[11] W. W. Peterson and E. J. Weldon, Error-Correcting Codes, 2nd ed.
was a more likely path in the list, the codeword corresponding Cambridge, Massachusetts: The MIT Press, 1972.
to that path was returned, which resulted in a decoding error. [12] E. Arıkan, “Systematic polar coding,” IEEE Commmun. Lett., vol. 15,
Thus, if only we had a “genie” to tell as at the final stage which pp. 860–862, 2011.
[13] G. Sarkis and W. J. Gross, “Systematic encoding of polar codes for list
path to pick from our list, we could improve the performance decoding,” 2011, private communication.
of our decoder.
Luckily, such a genie is easy to implement. Recall that we
have k unfrozen bits that we are free to set. Instead of setting
all of them to information bits we wish to transmit, we employ
the following simple concatenation scheme. For some small 4 A binary linear code having a corresponding k × r parity-check matrix
constant r, we set the first k − r unfrozen bits to information constructed as follows will do just as well. Let the the first k − r columns
bits. The last r unfrozen bits will hold the r-bit CRC [11, be chosen at random and the last r columns be equal to the identity matrix.

You might also like