Tutorial GFqDecoding Part1
Tutorial GFqDecoding Part1
D. Declercq 1
1 ETIS - UMR8051 ENSEA/Cergy-University/CNRS France
1 / 59
Outline
Introduction
2 / 59
Outline
Introduction
3 / 59
Gallager 1962 regular LDPC codes, proof of convergence (MLD), algo. A (bit ipping), algo. B , Tanner 1981 composite codes on graphs, link with product codes and LDPC codes, MacKay 1995 Belief Propagation (BP) decoding, link with iterative turbo-decoding, irregular LDPC codes, Rich. et Urb.
2001
proof of convergence (BP), optimization of irregularity, codes approaching capacity (BEC, BI-AWGN),
Since then Optimization for other types of channels (freq. selective, multilevel, multi-user, turbo-equalization, joint source-channel coding), nding good matrices for small sizes, lowering the error oor. Golden age of LDPC codes, application in many standards.
4 / 59
Gallager 1963 LDPC codes in Galois Fields, iterative hard decoding algo. B for dv = 3, MacKay 1998 Advantages for small blocks/high rates, ultra-sparse dv = 2 LDPC codes in high-order Fields,
2003-2006 2006-2010
Development of practical decoders for Non-Binary LDPC codes, Attempts to nd applications where NB-LDPC codes outperform binary LDPC codes, Golden age of Non-Binary LDPC codes ?
2010-xxx
[DAVEY 98] M. DAVEY AND D.J.C. M ACKAY, L OW DENSITY PARITY CHECK CODES OVER GF( Q ), IEEE communication letter, VOL . 2, PP 165167, J UNE 1998 [M ACKAY 99] D.J.C. M ACKAY AND M. DAVEY, E VALUATION OF GALLAGER CODES FOR SHORT BLOCK LENGTH AND HIGH RATE APPLICATIONS , proc. of IMA workshop on codes, systems and graphical models, 1999
5 / 59
LDPC Code
GF (q)
= 0
CG = c = G.u, u GF (q)K
with H(M N), parity check matrix, with G(N K ), generator matrix.
N K M
LDPC :
dH
N+
6 / 59
c0 1 1 H= 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 c0 0 0 1 1 0 1 1 0 0 0 0 0 1 1
c1
c2
c3
c4
c5
c6
c7
codeword
c1
c2
c3
c4
c5
c6
c7
codeword
Interleaver
Parity checks
7 / 59
c0 3 4 H= 1 0 0 6 5 0 0 0 3 1 6 2 0 0 7 c0 0 0 3 7 0 1 5 0 0 0 0 0 1 2 0
c1
c2
c3
c4
c5
c6
c7
Codeword
. . . . . . . .
: message
c1
c2
c3
c4
c5
c6
c7
Interleaver
parity checks
8 / 59
edges proportions :
i : proportion of nonzero values {Hkl } in degree i columns, j : proportion of nonzero values {Hkl } in degree j rows,
2
node proportions :
i x i1
(x) =
dcmax X j=2
j x j1
9 / 59
(x) =
4 3 2 8 3 9 8 x+ x + x + x 24 24 24 24
(x) = x 5
10 / 59
Outline
Introduction
11 / 59
The general concept of LDPC decoders is based on message passing between nodes in the Tanner graph of the code, so that iterative updates of the messages lead to a stable state of the messages: convergence to a xed point. Messages represent probability density functions of the random variables. For a discrete random variable in a set of q elements:
q1 X i=0
(k) = 1
The decoding result is the a posteriori probability of one random variable xn : Prob (xn |y0 , y1 , . . . , yN1 ) a particular scheduling of the computation of the messages denes a decoding iteration.
12 / 59
Articial Intelligence Statistical learning : Pearls Belief Propagation (1981-86) Neural Networks : sum-product algorithm () (1985-86)
Information Theory Gallager iterative decoders for LDPC (1963), Viterbi (1967), BCJR (1974): can be analysed as BP on Factor graphs,
Statistical Physics BP = Bethe approximation of the global free energy of complex systems (1935), Generalized BP = Kikuchi approximation of the free energy (1951)
13 / 59
Example
Tanner graph (for PC codes) is a special case of Factor Graph
Let A(), B(), C(), D() be dependant random variables. Let A , B , C , D be their noisy observation.
B A A p(B|A,C,D) B
With Belief propagation on a tree, we get the a posteriori density : optimal solution.
14 / 59
x xn f
variable node
(x
f , f x )
F(.)
function node
: messages = p.d.f.
The graph is not oriented: messages needed in both directions. 2 types of nodes = 2 types of local updates: data node update and function node update
15 / 59
F1
F2
LLR
LLR
LLR
16 / 59
F1
F2
LLR
LLR
LLR
Past of F1:
S F1
Past of F2:
S F2
17 / 59
Independent F1 F2
LLR
LLR
LLR
Past of F1:
S F1
S F2
18 / 59
F(.) f4 F(.)
2 3
x
x1 x1
x
F(.)
f3
f2
F(.)
4 Y i=2
1 f [k] x
if
x [k]
k = 0 . . . q 1
ASSUMPTION: input messages if x are independent ASSUMPTION: noisy symbol sets leading to if x are disjoint
19 / 59
F( x1 x2 x, x4 ) , , 3 x1
1 f x
x4
x4
f
x3
f 2 x f
x3 x2
1 x [k1 ] = f
X
k2 ,k3 ,k4
4 Y i ` F x1 = k1 , x2 = k2 , x3 = k3 , x4 = k4 x i=2
f [ki ]
k1 = 0 . . . q 1 are independent ASSUMPTION: input messages ASSUMPTION: noisy symbol sets leading to if x are disjoint if x
20 / 59
1 x [k1 ] = f
X
k2 ,k3 ,k4
4 ` Y i F x1 = k1 , x2 = k2 , x3 = k3 , x4 = k4 x i=2
f [ki ]
let k GF(q) = 0, 1, , 2 , . . . , q1 Parity-check case: the function node reduces to an indicator function: F x1 = 1 , x2 = 2 , x3 = 3 , x4 = 4 = 1 if 1 + 2 + 3 + 4 = 0 F x1 = 1 , x2 = 2 , x3 = 3 , x4 = 4 = 0 if 1 + 2 + 3 + 4 = 0
21 / 59
This ordering of the messages is called ooding schedule One decoding iteration = APP
22 / 59
LLR
LLR
LLR
Computational span of L iterations: in L iterations, a maximum of dv (dv 1)L1 (dc 1)L nodes are seen from the top of the tree. As a consequence, a usual assumption is that the BP decoder needs at least L = log(N) iterations to converge (to see all LLRs). As a consequence, the independence assumption for the BP decoder breaks after at most L = log(N) iterations.
23 / 59
wrong update
LLR
LLR
LLR
a crucial parameter of the graph is its girth g, i.e. the size of the smallest closed path/cycle,
24 / 59
25 / 59
26 / 59
27 / 59
Advantage: for bitnodes with degree dv 3 Messages are computed several times during ONE iteration Faster convergence.
28 / 59
Outline
Introduction
29 / 59
uk = log
vk = log
vm = u0 +
dv X k =1,k=m
uk
tanh
uk = 2
dc Y m=1;m=k
tanh
vm 2
Vm
Uk
30 / 59
dv f [k] = x
dY v 1 i=1
if
x [k]
k = 0, 1
v3
= = =
log log
31 / 59
dc x [dc ] f
X
1 ,...,dc 1
dY c 1 i=1
ix f [i ]
! X
k
k = 0
k = 0, 1
Let us consider a dc = 3 bitnode with u3 as output message: 3 x [0] f 3 x [1] f = = 1 f [0]2 f [0] + 1 f [1]2 f [1] x x x x 1 f [0]2 f [1] + 1 f [1]2 f [0] x x x x
32 / 59
Now compute the factorization of the sum ... (3 x [0] + 3 x [1]) f f = (1 f [0] + 1 f [1]) (2 f [0] + 2 f [1]) x x x x
... and the factorization of the difference (3 x [0] 3 x [1]) f f = (1 f [0] 1 f [1]) (2 f [0] 2 f [1]) x x x x
"
1 1
#"
1 f [0] x 1 f [1] x
#!
"
1 1
1 1
#"
2 f [0] x 2 f [1] x
#!
33 / 59
1 1
1 1
we obtain the checknode update in the Fourier Domain: " 3 x [0] f 3 x [1] f # = F 1 F " 1 f [0] x 1 f [1] x # F " 2 f [0] x 2 f [1] x #!
34 / 59
From previous equations, we have: (3 x [0] 3 x [1]) = (1 f [0] 1 f [1]) (2 f [0] 2 f [1]) x x x x f f e u3 1 = e u3 + 1 eu3 /2 eu3 /2 = eu3 /2 + eu3 /2 tanh e v2 1 e v2 + 1 ! ! v1 /2 ev1 /2 e ev2 /2 ev2 /2 ev1 /2 + ev1 /2 ev2 /2 + ev2 /2 ev1 1 ev1 + 1
u3 v1 v2 = tanh tanh 2 2 2
35 / 59
Lets compute the BP checknode update in the Log-Domain: |u3 | 2 |v1 | |v2 | + log tanh 2 2
log tanh
log tanh
The sign of the message is computed in a parallel stream: u3 sign tanh 2 sign (u3 ) = v1 v2 sign tanh sign tanh 2 2 sign (v1 ) sign (v2 )
36 / 59
uk = log
vk = log
log tanh
dv X k =1,k =m
|uk | 2
dc X m=1;m=k dc Y m=1;m=k
log tanh
|vm | 2
Vm
Uk
vm = u0 +
uk
sign(uk ) =
sign(vm )
37 / 59
u3
= =
log log
3 x [0] f 3 x [1] f
= =
e v1 e v2 1 1 + v e v1 + 1 e v2 + 1 e 1 + 1 ev2 + 1 1 1 ev2 e v1 + v log ev1 + 1 ev2 + 1 e 1 + 1 e v2 + 1 ` ` log ev1 +v2 + 1 log ev1 + ev2 max (v1 + v2 , 0) max (v1 , v2 )
38 / 59
After some transformations: u3 = = max (v1 + v2 , 0) max (v1 , v2 ) max(v1 + v2 , 0) max(v1 , v2 ) + log sign(v1 ) sign(v2 ) min (|v1 | , |v2 |) + log 1 + e|v1 +v2 | 1 + e|v1 v2 | !
noting that this term is negative when v1 and v2 have the same sign, and the term is positive when v1 and v2 have different signs, ...
39 / 59
vm = u0 +
uk
Checknode update: 0 uk = @
dc Y m=1;m=k
40 / 59
Shufed Scheduling can be parallelized if the LDPC is properly designed increased throughput, Shufed Scheduling converges approximately 2 to 3 times faster than ooding schedule reduced latency,
Bit-ipping, Gal-A and Gal-B: easier to get theorems on theoretical performance, Min-Sum with proper offset correction approaches BP for regular or slightly irregular LDPC codes, In some particular cases, the Min-Sum decoder can surpass the BP decoder in the error oor region.
41 / 59
Outline
Introduction
42 / 59
hij .cj = 0
in GF (q)
with GF (q) = 0, 0 , 1 , . . . , q1
c1 c1
f
f
c2
c3
Permutation Nodes
c2
c3
p
x
h 1c1
p h1c1 + h2c2 + h3c 3
c
h 2c2
c
p
h 3c3
43 / 59
Now the code is dened from Non Binary Parity Check equations
dc X j=1
hij .cj = 0
in GF (q)
with GF (q) = 0, 0 , 1 , . . . , q1
ip v [k]
k = 0, . . . , q 1
Or in vector form:
1 dv p = 0 v . . . dv v v p p
44 / 59
GF(q) is a cyclic Field, as such, multiplication by hij acts on the symbols as a cyclic permutation of the Field elements:
ip c [k ] = iv
p [k ]
k = hij k
k = 0, . . . , q 1
GF(8)
0 0 1 2 3 4 5 6
GF(8)
0 0= 1= 2= 3= 4= 5= 6= 2 5 2 6 2 0 2 1 2 2 2 3 2 4
ci
h ji . ci
45 / 59
hij .cj = 0
in GF (q)
with GF (q) = 0, 0 , 1 , . . . , q1
h 1c1
h 2c2
c
h 3c3
c1 \ c2 0 1 2 3 0 0 1 2 3 1 1 0 3 2 2 2 3 0 1 3 3 2 1 0
c +
46 / 59
h 1c1
h 2c2
c
h 3c3
c1 \ c2 0 1 2 3 0 0 1 2 3 1 1 0 3 2 2 2 3 0 1 3 3 2 1 0
c +
h 1c1
h 2c2
c
h 3c3
c + Fourier ?
48 / 59
with with
ci x i1
Let put the probability weights (c = k ) in a size-2, p-dimensional tensor indexed by binary values {c1 , . . . , cp }.
Prob(c(x)=0)
C=
C[i,j]=
49 / 59
C=
C[i,j,k]=
50 / 59
C=
C[i,j,k]=
51 / 59
C=
C[i,j,k]=
52 / 59
C=
C[i,j,k]=
53 / 59
F =
1 1
1 1
where k denotes the tensor product in the k-th dimension of the tensor C(i1 , . . . , ip ).
for the Fourier Transform in one dimension, we perform 2p = q operations, the total number of operations for F (.) is then p 2p = q log(q) operations: Fast Fourier Transform
54 / 59
1 1 1 1 1
1 1
GF(8) : 3 dimensions
1 1 1 1
1 1 GF(4) : 2 dimensions 1
1
1 1
1 1
55 / 59
Vpv Vcp
U vp U pc
Interleaver
Fourier Tranform
F
Fourier
56 / 59
Quantization impacts on the performance are very strong in the Probability Domain
u(k) = log
c p [k] c p [0]
k = 0, . . . , q 1
v (k) = log
p c [k] p c [0]
k = 0, . . . , q 1
57 / 59
After some manipulations: u3 (1) u3 (2) u3 (3) K = = = = max (v1 (1) , v2 (1) , v1 (2) + v2 (3) , v1 (3) + v2 (2)) K max (v1 (2) , v2 (2) , v1 (1) + v2 (3) , v1 (3) + v2 (1)) K max (v1 (3) , v2 (3) , v1 (1) + v2 (1) , v1 (2) + v2 (1)) K max (0 , v1 (1) + v2 (1) , v1 (2) + v2 (2) , v1 (3) + v2 (3))
The number of max operators grows in O(q 2 ), Its a recursive implementation: approximations (e.g. use of max instead of max , small LUT) become rapidly catastrophic, Log-Domain implementation and the FFT complexity reduction O(q 2 ) O(q log(q)) are not compliant.
58 / 59
59 / 59