List Decoding Polarcodes
List Decoding Polarcodes
n = 2048, L = 32
If the most likely codeword is selected, simulation results show 10−4 n = 2048, ML bound
that the resulting performance is very close to that of a maximum-
n = 2048, L = 32, CRC − 16
likelihood decoder, even for moderate values of L. Alternatively, 10−5
min(RCB,TSB)
if a “genie” is allowed to pick the codeword from the list, the 10−6 max(ISP,SP59)
results are comparable to the current state of the art LDPC 1.50 2.25 3.00
codes. Luckily, implementing such a helpful genie is easy. Signal-to-noise ratio (Eb /N0 ) [dB]
Our list decoder doubles the number of decoding paths at
each decoding step, and then uses a pruning procedure to Fig. 1. Word error rate of a length n = 2048 rate 1/2 polar code optimized
discard all but the L “best” paths. Nevertheless, a straightforward for SNR=2 dB under various list sizes. Code construction was carried out via
implementation still requires Ω(L · n2 ) time, which is in stark the method proposed in [4]. The two dots represent upper and lower bounds
contrast with the O(n log n) complexity of the original successive- [5] on the SNR needed to reach a word error rate of 10−5 .
cancellation decoder. We utilize the structure of polar codes
to overcome this problem. Specifically, we devise an efficient,
numerically stable, implementation taking only O(L · n log n) 10−1
time and O(L · n) space.
Bit error rate
Voyager
Galileo HGA
0.75
Turbo R=1/2 Input: the received vector y
Cassini/Pathfinder
Galileo LGA Output: a decoded codeword ĉ
Hermitian curve [64,32] (SDD)
0.7 BCH (Koetter−Vardy)
Polar+CRC R=1/2 (List dec.)
1 for ϕ = 0, 1, . . . , n − 1 do
(ϕ) (ϕ)
0.65
ME LDPC R=1/2 (BP)
2 calculate Wm (y0n−1 , ûϕ−1 0 |0) and Wm (y0n−1 , ûϕ−1
0 |1)
3 if uϕ is frozen then
0.6
4 set ûϕ to the frozen value of uϕ
0.55 5 else
(ϕ) (ϕ)
6 if Wm (y0n−1 , ûϕ−10 |0) > Wm (y0n−1 , ûϕ−1
0 |1) then
0.5 2
10 10
3 4
10
5
10 7 set ûϕ ← 0
Blocklength, n
8 else
Fig. 3. Comparison of normalized rate [7] for a wide class of codes. The
9 set ûϕ ← 1
target word error rate is 10−4 . The plot is courtesy of Dr. Yury Polyanskiy.
10 return the codeword ĉ corresponding to û
knowledge, for length 1024 and rate 1/2 it seems that our
implementation is slightly better than previously known codes We now show how the above probabilities are calculated.
when considering a target error-probability of 10−4 . For layer 0 ≤ λ ≤ m, denote hereafter
The structure of this paper is as follows. In Section II, we
present Arıkan’s SC decoder in a notation that will be useful to Λ = 2λ . (1)
us later on. In Section III, we show how the space complexity
Recall [1] that for
of the SC decoder can be brought down from O(n log n) to
0≤ϕ<Λ, (2)
O(n). This observation will later help us in Section IV, where
(ϕ)
we presents our successive cancellation list decoder with time bit channel Wλ is a binary input channel with output
complexity O(L·n log n). Section V introduces a modification alphabet Y Λ × X ϕ , the conditional probability of which we
of polar codes which, when decoded with the SCL decoder, generically denote as
results in a significant improvement in terms of error rate. (ϕ)
This paper contains a fair amount of algorithmic detail. Wλ (y0Λ−1 , uϕ−1
0 |uϕ ) . (3)
Thus, on a first read, we advise the reader to skip to Section IV In our context, y0Λ−1
is always a contiguous subvector of
and read the first three paragraphs. Doing so will give a high- received vector y. Next, for 1 ≤ λ ≤ m, recall the recursive
level understanding of the decoding method proposed and also definition of a bit channel [1, Equations (22) and (23)] : let
show why a naive implementation is too costly. Then, we 0 ≤ 2ψ < Λ, then
advise the reader to skim Section V where the “list picking
genie” is explained. branch β
z }| {
(2ψ) Λ−1 2ψ−1
Wλ (y0 , u0 |u2ψ )
II. F ORMALIZATION OF THE S UCCESSIVE C ANCELLATION X 1 (ψ) Λ/2−1 2ψ−1 2ψ−1
D ECODER = Wλ−1 (y0 , u0,even ⊕ u0,odd |u2ψ ⊕ u2ψ+1 )
u2ψ+1
2 | {z }
The Successive Cancellation (SC) decoder is due to Arıkan branch 2β
[1]. In this section, we recast it using our notation, for future (ψ) Λ−1
· Wλ−1 (yΛ/2 , u2ψ−1
0,odd |u2ψ+1 ) (4)
reference. | {z }
Let the polar code under consideration have length n = 2m branch 2β + 1
and dimension k. Thus, the number of frozen bits is n − k.
We denote by u = (ui )n−1 n−1 and
i=0 = u0 the information bits vector
(including the frozen bits), and by c = cn−1
0 the corresponding branch β
codeword, which is sent over a binary-input channel W : X → z
(2ψ+1)
}| {
Λ−1 2ψ
Y, where X = {0, 1}. At the other end of the channel, we Wλ (y0 , u0 |u2ψ+1 )
get the received word y = y0n−1 . A decoding algorithm is 1 (ψ) Λ/2−1 2ψ−1 2ψ−1
= Wλ−1 (y0 , u0,even ⊕ u0,odd |u2ψ ⊕ u2ψ+1 )
then applied to y, resulting in a decoded codeword ĉ having 2| {z }
corresponding information bits û. branch 2β
(ψ) Λ−1 2ψ−1
· Wλ−1 (yΛ/2 , u0,odd |u2ψ+1 ) (5)
A. An outline of Successive Cancellation | {z }
branch 2β + 1
A high-level description of the SC decoding algorithm
(0)
is given in Algorithm 1. In words, at each phase ϕ of with “stopping condition” W0 (y|u) = W (y|u).
3
In order to avoid repetition, we use the following shorthand 3 for ϕ = 0, 1, . . . , n − 1 do // Main loop
4 recursivelyCalcP(m, ϕ)
Pλ [hϕ, βi] = Pλ [hϕ, βiλ ] . (8) 5 if uϕ is frozen then
6 set Bm [hϕ, 0i] to the frozen value of uϕ
The probabilities array data structure Pλ will be used as 7 else
follows. Let a layer 0 ≤ λ ≤ m, phase 0 ≤ ϕ < Λ, and branch 8 if Pm [hϕ, 0i][0] > Pm [hϕ, 0i][1] then
9 set Bm [hϕ, 0i] ← 0
0 ≤ β < 2m−λ be given. Denote the output corresponding to 10 else
(ϕ)
branch β of Wλ as (y0Λ−1 , ûϕ−1
0 ). Then, ultimately, we will 11 set Bm [hϕ, 0i] ← 1
have for both values of b that
12 if ϕ mod 2 = 1 then
(ϕ)
Pλ [hϕ, βi][b] = Wλ (y0Λ−1 , ûϕ−1
0 |b) . (9) 13 recursivelyUpdateB(m, ϕ)
n−1
Analogously to defining the output corresponding to a 14 return the decoded codeword: ĉ = (B0 [h0, βi])β=0
branch β, we would now like define the input corresponding
to a branch. As in the “output” case, we start at layer m Lemma 2: Algorithms 2–4 are a valid implementation of
(ϕ)
and continue recursively. Consider the channel Wm , and let the SC decoder.
ûϕ be the corresponding input which Algorithm 1 assumes. Proof: We first note that in addition to proving the claim
We let this input have a branch number of β = 0. Next, we explicitly stated in the lemma, we must also prove an implicit
proceed recursively as follows. For layer λ > 0, consider the claim. Namely, we must prove that the actions taken by the
(2ψ) (2ψ+1)
channels Wλ and Wλ having the same branch β with algorithm are well defined. Specifically, we must prove that
corresponding inputs u2ψ and u2ψ+1 , respectively. In light of when an array element is read from, it was already written to
(ψ)
(5), we now consider Wλ−1 and define the input corresponding (it is initialized).
4
Algorithm 3: recursivelyCalcP(λ, ϕ) implementation I — disregarding the phase information — can be exploited for
Input: layer λ and phase ϕ a general layer λ as well. Specifically, for all 0 ≤ λ ≤ m,
1 if λ = 0 then return // Stopping condition let us now define the number of elements in Pλ to be 2m−λ .
2 set ψ ← bϕ/2c Accordingly,
// Recurse first, if needed
3 if ϕ mod 2 = 0 then recursivelyCalcP(λ − 1, ψ) Pλ [hϕ, βi] is replaced by Pλ [β] . (11)
4 for β = 0, 1, . . . , 2m−λ − 1 do // calculation
5 if ϕ mod 2 = 0 then // apply Equation (4) Note that the total space needed to hold the P arrays has
6 for u0 ∈ {0, 1} do P gone down from O(n log n) to O(n). We would now like to do
7 Pλ [hϕ, βi][u0 ] ← u00 21 Pλ−1 [hψ, 2βi][u0 ⊕ u00 ] · the same for the B arrays. However, as things are currently
8 Pλ−1 [hψ, 2β + 1i][u00 ] stated, we can not disregard the phase, as can be seen for
9 else // apply Equation (5) example in line 3 of Algorithm 4. The solution is a simple
10 set u0 ← Bλ [hϕ − 1, βi] renaming. As a first step, let us define for each 0 ≤ λ ≤ m an
11 for u00 ∈ {0, 1} do array Cλ consisting of bit pairs and having length n/2. Next,
12 Pλ [hϕ, βi][u00 ] ← 12 Pλ−1 [hψ, 2βi][u0 ⊕ u00 ] ·
13 Pλ−1 [hψ, 2β + 1i][u00 ] let a generic reference of the form Bλ [hϕ, βi] be replaced by
Cλ [ψ + β · 2λ−1 ][ϕ mod 2], where ψ = bϕ/2c. Note that we
have done nothing more than rename the elements of Bλ as
elements of Cλ . However, we now see that as before we can
Algorithm 4: recursivelyUpdateB(λ, ϕ) implementation I disregard the value of ψ and take note only of the parity of ϕ.
Require : ϕ is odd So, let us make one more substitution: replace every instance
1 set ψ ← bϕ/2c of Cλ [ψ+β ·2λ−1 ][ϕ mod 2] by Cλ [β][ϕ mod 2], and resize
2 for β = 0, 1, . . . , 2m−λ − 1 do each array Cλ to have 2m−λ bit pairs. To sum up,
3 Bλ−1 [hψ, 2βi] ← Bλ [hϕ − 1, βi] ⊕ Bλ [hϕ, βi]
4 Bλ−1 [hψ, 2β + 1i] ← Bλ [hϕ, βi] Bλ [hϕ, βi] is replaced by Cλ [β][ϕ mod 2] . (12)
5 if ψ mod 2 = 1 then
6 recursivelyUpdateB(λ − 1, ψ) The alert reader will notice that a further reduction in space
is possible: for λ = 0 we will always have that ϕ = 0, and
Both the implicit and explicit claims are easily derived from thus the parity of ϕ is always even. However, this reduction
the following observation. For a given 0 ≤ ϕ < n, consider does not affect the asymptotic space complexity which is
iteration ϕ of the main loop in Algorithm 2. Fix a layer 0 ≤ now indeed down to O(n). The revised algorithm is given
λ ≤ m, and a branch 0 ≤ β < 2m−λ . If we suspend the run as Algorithms 5–7.
of the algorithm just after the iteration ends, then (9) holds
with ϕ0 instead of ϕ, for all Algorithm 5: Space efficient SC decoder, main loop
j ϕ k Input: the received vector y
0 ≤ ϕ0 ≤ m−λ . Output: a decoded codeword ĉ
2
1 for β = 0, 1, . . . , n − 1 do // Initialization
Similarly, (10) holds with ϕ0 instead of ϕ, for all 2 set P0 [β][0] ← W (yβ |0), P0 [β][1] ← W (yβ |1)
0 ϕ+1 3 for ϕ = 0, 1, . . . , n − 1 do // Main loop
0 ≤ ϕ < m−λ .
2 4 recursivelyCalcP(m, ϕ)
5 if uϕ is frozen then
The above observation is proved by induction on ϕ. 6 set Cm [0][ϕ mod 2] to the frozen value of uϕ
7 else
III. S PACE -E FFICIENT S UCCESSIVE C ANCELLATION 8 if Pm [0][0] > Pm [0][1] then
D ECODING 9 set Cm [0][ϕ mod 2] ← 0
10 else
The running time of the SC decoder is O(n log n), and our 11 set Cm [0][ϕ mod 2] ← 1
implementation is no exception. As we have previously noted,
the space complexity of our algorithm is O(n log n) as well. 12 if ϕ mod 2 = 1 then
13 recursivelyUpdateC(m, ϕ)
However, we will now show how to bring the space complexity
down to O(n). The observation that one can reduce the space 14 return the decoded codeword: ĉ = (C0 [β][0])n−1
β=0
complexity to O(n) was noted, in the context of VLSI design,
in [8].
We end this subsection by mentioning that although we were
As a first step towards this end, consider the probability
concerned here with reducing the space complexity of our SC
pair array Pm . By examining the main loop in Algorithm 2,
decoder, the observations made with this goal in mind will
we quickly see that if we are currently at phase ϕ, then we
be of great use in analyzing the time complexity of our list
will never again make use of Pm [hϕ0 , 0i] for all ϕ0 < ϕ. On
decoder.
the other hand, we see that Pm [hϕ00 , 0i] is uninitialized for all
ϕ00 > ϕ. Thus, instead of reading and writing to Pm [hϕ, 0i],
we can essentially disregard the phase information, and use IV. S UCCESSIVE C ANCELLATION L IST D ECODER
only the first element Pm [0] of the array, discarding all the In this section we introduce and define our algorithm, the
rest. By the recursive nature of polar codes, this observation successive cancellation list (SCL) decoder. Our list decoder
5
Algorithm 6: recursivelyCalcP(λ, ϕ) space-efficient Consider the following outline for a naive implementation
Input: layer λ and phase ϕ of an SCL decoder. Each time a decoding path is split into
1 if λ = 0 then return // Stopping condition two forks, the data structures used by the “parent” path are
2 set ψ ← bϕ/2c duplicated, with one copy given to the first fork and the other
// Recurse first, if needed to the second. Since the number of splits is Ω(L · n), and
3 if ϕ mod 2 = 0 then recursivelyCalcP(λ − 1, ψ) since the size of the data structures used by each path is
// Perform the calculation
4 for β = 0, 1, . . . , 2m−λ − 1 do
Ω(n), the copying operation alone would take time Ω(L · n2 ).
5 if ϕ mod 2 = 0 then // apply Equation (4) This running time is clearly impractical for all but the short-
6 for u0 ∈ {0, 1} do est of codes. However, all known (to us) implementations
7 Pλ [β][u0 ] ← of successive cancellation list decoding have complexity at
P 1 0 00 00
u00 2 Pλ−1 [2β][u ⊕ u ] · Pλ−1 [2β + 1][u ] least Ω(L · n2 ). Our main contribution in this section is the
8 else // apply Equation (5) following: we show how to implement SCL decoding with
9 set u0 ← Cλ [β][0] time complexity O(L · n log n) instead of Ω(L · n2 ).
10 for u00 ∈ {0, 1} do
The key observation is as follows. Consider the P arrays of
11 Pλ [β][u00 ] ← 12 Pλ−1 [2β][u0 ⊕u00 ]·Pλ−1 [2β+1][u00 ]
the last section, and recall that the size of Pλ is proportional
to 2m−λ . Thus, the cost of copying Pλ grows exponentially
small with λ. On the other hand, looking at the main loop of
Algorithm 7: recursivelyUpdateC(λ, ϕ) space-efficient Algorithm 5 and unwinding the recursion, we see that Pλ is
Input: layer λ and phase ϕ accessed only every 2m−λ incrementations of ϕ. Put another
Require : ϕ is odd way, the bigger Pλ is, the less frequently it is accessed. The
1 set ψ ← bϕ/2c same observation applies to the C arrays. This observation
2 for β = 0, 1, . . . , 2m−λ − 1 do suggest the use of a “lazy-copy”. Namely, at each given stage,
3 Cλ−1 [2β][ψ mod 2] ← Cλ [β][0] ⊕ Cλ [β][1] the same array may be flagged as belonging to more than one
4 Cλ−1 [2β + 1][ψ mod 2] ← Cλ [β][1]
decoding path. However, when a given decoding path needs
5 if ψ mod 2 = 1 then access to an array it is sharing with another path, a copy is
6 recursivelyUpdateC(λ − 1, ψ)
made.
has a parameter L, called the list size. Generally speaking,
larger values of L mean lower error rates but longer running A. Low-level functions
times. We note at this point that successive cancellation list
decoding is not a new idea: it was applied in [9] to Reed- We now discuss the low-level functions and data structures
Muller codes1 . by which the “lazy-copy” methodology is realized. We note
Recall the main loop of an SC decoder, where at each phase in advance that since our aim was to keep the exposition as
we must decide on the value of ûϕ . In an SCL decoder, instead simple as possible, we have avoided some obvious optimiza-
of deciding to set the value of an unfrozen ûϕ to either a 0 tions. The following data structures are defined and initialized
or a 1, we inspect both options. Namely, when decoding a in Algorithm 8.
non-frozen bit, we split the decoding path into two paths (see
Figure 4). Since each split doubles the number of paths to be Algorithm 8: initializeDataStructures()
examined, we must prune them, and the maximum number of 1 inactivePathIndices ← new stack with capacity L
paths allowed is the specified list size, L. Naturally, we would 2 activePath ← new boolean array of size L
like to keep the “best” paths at each stage, and thus require 3 arrayPointer P ← new 2-D array of size (m + 1) × L, the
a pruning criterion. Our pruning criterion will be to keep the elements of which are array pointers
4 arrayPointer C ← new 2-D array of size (m + 1) × L, the
most likely paths. elements of which are array pointers
1 In
5 pathIndexToArrayIndex ← new 2-D array of size (m + 1) × L
a somewhat different version of successive cancellation than that of 6 inactiveArrayIndices ← new array of size m + 1, the elements
Arıkan’s, at least in exposition. of which are stacks with capacity L
0 1
7 arrayReferenceCount ← new 2-D array of size (m + 1) × L
// Initialization of data structures
8 for λ = 0, 1, . . . , m do
0 1 0 1 9 for s = 0, 1, . . . , L − 1 do
10 arrayPointer P[λ][s] ← new array of float pairs of
size 2m−λ
0 1 0 1 0 1 0 1 11 arrayPointer C[λ][s] ← new array of bit pairs of size
2m−λ
0 1 0 1 0 1 0 1 12 arrayReferenceCount[λ][s] ← 0
13 push(inactiveArrayIndices[λ], s)
Fig. 4. Decoding paths of unfrozen bits for L = 4: each level has at most
14 for ` = 0, 1, . . . , L − 1 do
4 nodes with paths that continue downward. Discontinued paths are colored 15 activePath[`] ← false
gray. 16 push(inactivePathIndices, `)
6
// normalize probabilities Recall that ϕ iterates from 0 to n − 1. Thus, for codes having
21 for ` = 0, 1, . . . , L − 1 do length greater than some small constant, the comparison in
22 if pathIndexInactive(`) then line 1 of Algorithm 2 ultimately becomes meaningless, since
23 continue both probabilities are rounded to 0. The same holds for all of
24 Pλ ← getArrayPointer_P(λ, `) our previous implementations.
25 for β = 0, 1, . . . , 2m−λ − 1 do Luckily, there is a simple fix to this problem. After the
26 for u ∈ {0, 1} do
probabilities are calculated in lines 5–20 of Algorithm 14, we
27 Pλ [β][u] ← Pλ [β][u]/σ
normalize2 the highest probability to be 1 in lines 21–27.
We claim that apart for avoiding underflows, normalization
One first notes that our new implementations loop does not alter our algorithm. The following lemma formalizes
over all path indices `. Thus, our new implementations this claim.
make use of the functions getArrayPointer_P and Lemma 5: Assume that we are working with “perfect”
getArrayPointer_C in order to assure that the con- floating-point numbers. That is, our floating-point variables are
sistency of calculations is preserved, despite multiple paths infinitely accurate and do not suffer from underflow/overflow.
sharing information. In addition, Algorithm 6 contains code Next, consider a variant of Algorithm 14, termed Algo-
to normalize probabilities. The normalization is needed for a rithm 14’, in which just before line 21 is first executed,
technical reason (to avoid floating-point underflow), and will the variable σ is set to 1. That is, effectively, there is no
be expanded on shortly. normalization of probabilities in Algorithm 14’.
We start out by noting that the “fresh pointer” condition Consider two runs, one of Algorithm 14 and one of Algo-
we have imposed on ourselves indeed holds. To see this, rithm 14’. In both runs, the input parameters to both algorithms
consider first Algorithm 14. The key point to note is that are the same. Moreover, assume that in both runs, the state
neither the killPath nor the clonePath function is called 2 This correction does not assure us that underflows will not occur. However,
from inside the algorithm. The same observation holds for now, the probability of a meaningless comparison due to underflow will be
Algorithm 15. Thus, the “fresh pointer” condition is met, and extremely low.
9
2 possible forks, we must typically choose the L most likely Proof: Recall that by our notation m = log n. The
forks out of 2L possible forks. The most interesting line is following bottom-to-top table summarizes the running time
14, in which the best ρ forks are marked. Surprisingly3 , this of each function. The notation OΣ will be explained shortly.
can be done in O(L) time [10, Section 9.3]. After the forks function running time
are marked, we first kill the paths for which both forks are
discontinued, and then continue paths for which one or both initializeDataStructures() O(L · m)
are the forks are marked. In case of the latter, the path is first assignInitialPath() O(m)
split. Note that we must first kill paths and only then split paths clonePath(`) O(m)
in order for the “balanced” constraint (13) to hold. Namely, killPath(`) O(m)
this way, we will not have more than L active paths at a time. getArrayPointer_P(λ, `) O(2m−λ )
The point of Algorithm 18 is to prune our list and leave only getArrayPointer_C(λ, `) O(2m−λ )
the L “best” paths. This is indeed achieved, in the following pathIndexInactive(`) O(1)
sense. At stage ϕ we would like to rank each path according recursivelyCalcP(m, ·) OΣ (L · m · n)
the the probability recursivelyUpdateC(m, ·) OΣ (L · m · n)
continuePaths_FrozenBit(ϕ) O(L)
Wm (y0 , ûϕ−1
(ϕ) n−1
0 |ûϕ ) . continuePaths_FrozenBit(ϕ) O(L · m)
By (9) and (11), this would indeed by the case if our floating findMostProbablePath O(L)
point variables were “perfect”, and the normalization step SCL decoder O(L · m · n)
in lines 21–27 of Algorithm 14 were not carried out. By The first 7 functions in the table, the low-level func-
Lemma 5, we see that this is still the case if normalization tions, are easily checked to have the stated running time.
is carried out. Note that the running time of getArrayPointer_P and
The last algorithm we consider in this section is Algo- getArrayPointer_C is due to the copy operation in line 6
rithm 19. In it, the most probable path is selected from the of Algorithm 6 applied to an array of size O(2m−λ ). Thus,
final list. As before, by (9)–(12) and Lemma 5, the value of as was previously mentioned, reducing the size of our arrays
Pm [0][Cm [0][1]] is simply has helped us reduce the running time of our list decoding
1 algorithm.
(n−1)
Wm (y0n−1 , ûn−2
0 |ûn−1 ) = · P (y0n−1 |ûn−1
0 ), Next, let us consider the two mid-level functions, namely,
2n−1
up to a normalization constant. recursivelyCalcP and recursivelyUpdateC. The
notation
Algorithm 19: findMostProbablePath() recursivelyCalcP(m, ·) ∈ OΣ (L · m · n)
Output: the index `0 of the most probable path
means that total running time of the n function calls
1 `0 ← 0, p0 ← 0
2 for ` = 0, 1, . . . , L − 1 do recursivelyCalcP(m, ϕ) , 0 ≤ ϕ < 2m
3 if pathIndexInactive(`) then
4 continue is O(L · m · n). To see this, denote by f (λ) the total running
5 Cm ← getArrayPointer_C(m, `) time of the above with m replaced by λ. By splitting the
6 Pm ← getArrayPointer_P(m, `) running time of Algorithm 14 into a non-recursive part and a
7 if p0 < Pm [0][Cm [0][1]] then recursive part, we have that for λ > 0
8 `0 ← `, p0 ← Pm [0][Cm [0][1]]
f (λ) = 2λ · O(L · 2m−λ ) + f (λ − 1) .
9 return `0
Thus, it easily follows that
We now prove our two main result. f (m) ∈ O(L · m · 2m ) = O(L · m · n) .
Theorem 6: The space complexity of the SCL decoder is
O(L · n). In essentially the same way, we can prove that the total running
Proof: All the data-structures of our list decoder are time of the recursivelyUpdateC(m, ϕ) over all 2n−1
allocated in Algorithm 8, and it can be checked that the total valid (odd) values of ϕ is O(m · n). Note that the two mid-
space used by them is O(L · n). Apart from these, the space level functions are invoked in lines 7 and 13 of Algorithm 16,
complexity needed in order to perform the selection operation on all valid inputs.
in line 14 of Algorithm 18 is O(L). Lastly, the various The running time of the high-level functions is easily
local variables needed by the algorithm take O(1) space, and checked to agree with the table.
the stack needed in order to implement the recursion takes
O(log n) space.
Theorem 7: The running time of the SCL decoder is O(L · V. M ODIFIED POLAR CODES
n log n). The plots in Figure 5 were obtained by simulation. The
3 The O(L) time result is rather theoretical. Since L is typically a small
performance of our decoder for various list sizes is given by
number, the fastest way to achieve our selection goal would be through simple the solid lines in the figure. As expected, we see that as the
sorting. list size L increases, the performance of our decoder improves.
11
Legend:
Section 8.8] value4 of the first k − r unfrozen bits. Note this
10−1 n = 2048, L = 1 new encoding is a slight variation of our polar coding scheme.
Word error rate
n = 2048, L = 2
n = 2048, L = 4
Also, note that we incur a penalty in rate, since the rate of
10−2
n = 2048, L = 8 our code is now (k − r)/n instead of the previous k/n.
n = 2048, L = 16 What we have gained is an approximation to a genie: at
10−3
n = 2048, L = 32 the final stage of decoding, instead of calling the function
n = 2048, ML bound
10−4 findMostProbablePath in Algorithm 19, we can do
the following. A path for which the CRC is invalid can not
1.0 1.5 2.0 2.5 3.0
Signal-to-noise ratio (Eb /N0 ) [dB] correspond to the transmitted codeword. Thus, we refine our
selection as follows. If at least one path has a correct CRC,
Legend: then we remove from our list all paths having incorrect CRC
10−1
n = 8192, L = 1 and then choose the most likely path. Otherwise, we select the
Word error rate
10−2 n = 8192, L = 2 most likely path in the hope of reducing the number of bits
n = 8192, L = 4
10−3 n = 8192, L = 8
in error, but with the knowledge that we have at least one bit
n = 8192, L = 16 in error.
10−4
n = 8192, L = 32 Figures 1 and 2 contain a comparison of decoding per-
10−5 n = 8192, ML bound formance between the original polar codes and the slightly
10−6 tweaked version presented in this section. A further im-
1.0 1.5 2.0 2.5
provement in bit-error-rate (but not in block-error-rate) is
Signal-to-noise ratio (Eb /N0 ) [dB]
attained when the decoding is performed systematically [12].
Fig. 5. Word error rate of a length n = 2048 (top) and n = 8192 (bottom) The application of systematic polar-coding to a list decoding
rate 1/2 polar code optimized for SNR=2 dB under various list sizes. Code setting is attributed to [13].
construction was carried out via the method proposed in [4].
R EFERENCES
We also notice a diminishing-returns phenomenon in terms of [1] E. Arıkan, “Channel polarization: A method for constructing capacity-
increasing the list size. The reason for this turns out to be achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inform. Theory, vol. 55, pp. 3051–3073, 2009.
simple. [2] E. Arıkan and E. Telatar, “On the rate of channel polarization,” in Proc.
IEEE Int’l Symp. Inform. Theory (ISIT’2009), Seoul, South Korea, 2009,
The dashed line, termed the “ML bound” was obtained as pp. 1493–1495.
follows. During our simulations for L = 32, each time a [3] S. B. Korada, E. Şaşoğlu, and R. Urbanke, “Polar codes: Character-
decoding failure occurred, we checked whether the decoded ization of exponent, bounds, and constructions,” IEEE Trans. Inform.
Theory, vol. 56, pp. 6253–6264, 2010.
codeword was more likely than the transmitted codeword. That [4] I. Tal and A. Vardy, “How to construct polar codes,” submitted to IEEE
is, whether W (y|ĉ) > W (y|c). If so, then the optimal ML Trans. Inform. Theory, available online as arXiv:1105.6164v2,
decoder would surely misdecode y as well. The dashed line 2011.
[5] G. Wiechman and I. Sason, “An improved sphere-packing bound for
records the frequency of the above event, and is thus a lower- finite-length codes over symmetric memoryless channels,” IEEE Trans.
bound on the error probability of the ML decoder. Thus, for Inform. Theory, vol. 54, pp. 1962–1990, 2008.
an SNR value greater than about 1.5 dB, Figure 1 suggests [6] TurboBest, “IEEE 802.16e LDPC Encoder/Decoder Core.” [Online].
Available: http://www.turbobest.com/tb ldpc80216e.htm
that we have an essentially optimal decoder when L = 32. [7] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the
finite blocklength regime,” IEEE Trans. Inform. Theory, vol. 56, pp.
Can we do even better? At first, the answer seems to be an 2307–2359, 2010.
obvious “no”, at least for the region in which our decoder is [8] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware ar-
essentially optimal. However, it turns out that if we are willing chitectures for successive cancellation decoding of polar codes,”
arXiv:1011.2919v1, 2010.
to accept a small change in our definition of a polar code, we [9] I. Dumer and K. Shabunov, “Soft-decision decoding of Reed-Muller
can dramatically improve performance. codes: recursive lists,” IEEE Trans. Inform. Theory, vol. 52, pp. 1260–
1266, 2006.
During simulations we noticed that often, when a decoding [10] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction
error occurred, the path corresponding to the transmitted to Algorithms, 2nd ed. Cambridge, Massachusetts: The MIT Press,
codeword was a member of the final list. However, since there 2001.
[11] W. W. Peterson and E. J. Weldon, Error-Correcting Codes, 2nd ed.
was a more likely path in the list, the codeword corresponding Cambridge, Massachusetts: The MIT Press, 1972.
to that path was returned, which resulted in a decoding error. [12] E. Arıkan, “Systematic polar coding,” IEEE Commmun. Lett., vol. 15,
Thus, if only we had a “genie” to tell as at the final stage which pp. 860–862, 2011.
[13] G. Sarkis and W. J. Gross, “Systematic encoding of polar codes for list
path to pick from our list, we could improve the performance decoding,” 2011, private communication.
of our decoder.
Luckily, such a genie is easy to implement. Recall that we
have k unfrozen bits that we are free to set. Instead of setting
all of them to information bits we wish to transmit, we employ
the following simple concatenation scheme. For some small 4 A binary linear code having a corresponding k × r parity-check matrix
constant r, we set the first k − r unfrozen bits to information constructed as follows will do just as well. Let the the first k − r columns
bits. The last r unfrozen bits will hold the r-bit CRC [11, be chosen at random and the last r columns be equal to the identity matrix.