SEMINAR On LDPC Code For Ccsds Standard
SEMINAR On LDPC Code For Ccsds Standard
INTRODUCTION
In 1948, Shannon [15] proved that every noisy channel has a maximum rate at which
information may be transferred through it and that it is possible to design error-correcting codes
that approach this capacity, or Shannon limit, provided that the codes may be unbounded in
length. For the last six decades, coding theorists have been looking for practical codes capable
of closely approaching the Shannon limit.
For more than four decades, NASA and the Jet Propulsion Laboratory (JPL) have been sending
deep space probes to explore the far reaches of our solar system. Because of the extreme
dilution of signal power over interplanetary distances, JPL has always taken more than an
academic interest in searching for codes that approach Shannon’s limit as closely as possible.
The saga of error-correcting codes for deep space missions is summarized in Table 1.1.
Initial missions in the late 1950s and early 1960s sent their data uncoded. By the late 1960s
and early 1970s, missions were using codes of that era, such as Reed–Muller and long
constraint-length convolutional codes (the latter decoded with a suboptimal sequential
decoder). Voyager launched in 1977 with the state-of-the-art optimized (7, 1/2) convolutional
code, to be decoded with a maximum-likelihood Viterbi decoder developed partly at JPL years
earlier. Voyager’s engineers also anticipated needing an even stronger code for compressed
1
data and for extended-mission visits to Uranus and Neptune, so they included an encoder for a
(255, 223) Reed–Solomon code to be concatenated with the convolutional code when needed.
Concatenations of two relatively simple codes had been proposed by Forney in his thesis [16]
a decade earlier as a way to create a powerful code with a decoding complexity equivalent to
the complexity of the individual component decoders. This venerable Reed–Solomon and
convolutional concatenated coding system has remained a standard in deep space and many
other communication systems for the past three decades.
Despite this concatenated code’s success in deep space, JPL continued to look for codes that
approached Shannon’s limit even more closely. The popular wisdom in the coding community
was that near-optimal codes could only be obtained with increased minimum distance and
exponentially increasing decoding complexity. Deepspace applications, with their extremely
expensive probes in space and a willingness to invest in complex ground systems, were one
area where highly complex codes could prove worthwhile if they returned even a smidgen of
extra coding gain. To this end, JPL conducted a search for an additional 2 dB of coding gain in
the 1980s. This search culminated in the selection of a (15, 1/6) convolutional code that was
eventually launched with the Mars Pathfinder and Cassini missions in the mid-1990s, which
spawned a round of research into efficient parallel architectures for the extremely complex
Viterbi decoder required to support this code.
Galileo became an early test bed for the new coding schemes in early 1991 when its main
antenna failed to unfurl on its way to Jupiter and the project faced a massive reduction in
planned data rate. The coding group at JPL proposed and implemented software encoding and
decoding of a concatenated code with a variable-redundancy version of the standard Reed–
Solomon outer code and a complex (14, 1/4) inner convolutional code. Galileo ultimately
implemented for its mission at Jupiter a feedback concatenated decoder that iteratively decoded
the concatenated code using four stages of feedback between inner and outer decoders
improving on a single-pass decoder by more than 0.5 dB
2
simple maximum a posteriori probability decoders. The claimed performance was so good that
the initial reaction of the coding establishment was deeply skeptical.
Researchers at JPL were among the first to analyze and verify the turbo code claims, and to
extend the concept of turbo codes from two constituent codes to multiple codes to more general
trellises , from a parallel concatenation to a serial concatenation and so on. Before Cassini
launched in 1996 with its high-complexity convolutional code, JPL had already begun
standardization of turbo codes for future space missions.
In 1998, at about the time deep-space missions were signing up to use turbo codes, MacKay
visited JPL to present a talk entitled “Making Gallager Codes that Beat Turbo Codes.” He
showed that low-density parity-check (LDPC) codes, originally introduced in Gallager’s thesis
in 1960, can be designed to perform as well as, or better than, turbo codes.
The rediscovery of LDPC codes led to more coding research at JPL and around the world.
LDPC codes had more degrees of design freedom compared to turbo codes, which enabled
designers to more effectively trade off threshold and error floor performance or other attributes.
On the other hand, this increased flexibility meant that many of the early LDPC designs were
unstructured, leading to impractical decoding-and even encoding-complexities. A key insight
enabled a structured analysis and design of LDPC codes based on protographs, and eventually
JPL’s high-speed decoders.
The CCSDS LDPC code supports code rate 223/255 with coded block size of 8160 bits, which
allows for a simple replacement of legacy Reed-Solomon decoders. It was designed particularly
for near-earth space missions and deep space missions, but the excellent error correction
performance makes it the ideal fit for additional high-throughput applications such as
microwave or optical links.
This CCSDS Experimental specification [3] has been contributed to CCSDS by NASA. It
describes a set of Low Density Parity Check (LDPC) codes including definition of their code
structures, encoder implementations and experimental performance results. This CCSDS
Experimental specification is composed of two parts. The single rate 7/8 code described in
section 3 was designed and optimized for the characteristics and requirements typical of many
Near-Earth missions, whereas the set of codes described in section 4 was designed and
optimized for the characteristics and requirements typical of many Deep-Space missions. Users
3
are cautioned that selection of the most appropriate code for a particular mission environment
is a complex problem that involves evaluation of several characteristics, including: – error rate
performance, including error floors; – encoder and decoder complexity, including the number
of decoder iterations required to achieve desired performance; – familial relationships between
multiple codes; – maturity and availability of implementations and implementation support; –
intellectual property issues; – etc.
4
Chapter: 2
LDPC CODES
As their name suggests, LDPC codes are block codes with parity-check matrices that contain
only a very small number of non-zero entries. It is the sparseness of H which guarantees both
a decoding complexity which increases only linearly with the code length and a minimum
distance which also increases linearly with the code length. Aside from the requirement that H
be sparse, an LDPC code itself is no different to any other block code. Indeed existing block
codes can be successfully used with the LDPC iterative decoding algorithms if they can be
represented by a sparse parity-check matrix. Generally, however, finding a sparse parity-check
matrix for an existing code is not practical. Instead LDPC codes are designed by constructing
a sparse parity-check matrix first and then determining a generator matrix for the code
afterwards. The biggest difference between LDPC codes and classical block codes is how they
are decoded. Classical block codes are generally decoded with ML like decoding algorithms
and so are usually short and designed algebraically to make this task less complex. LDPC codes
however are decoded iteratively using a graphical representation of their parity-check matrix
and so are designed with the properties of H as a focus.
An LDPC code parity-check matrix is called (wc,wr)-regular if each code bit is contained in a
fixed number, wc, of parity checks and each parity-check equation contains a fixed number,
wr, of code bits.
The LDPC code can be represented graphically too which was introduced by R. M. Tanner in
1981 its called Tanner graph or Bipartite graph. In bipartite graph the edges of the graph
connected with two different node Variable node or Bit node and Check node or Constraint
node. Row of parity check matrix H represented by Check nodes and Column of parity check
matrix is represented by Variable nodes. One can understand iterative message passing
decoding algorithm easily using the graphical approach of representation given by Tanner. The
Tanner graph can be drawn vertically too in that variable nodes are added on left side while the
Check nodes are added on right side. The matrix H is 1 if the check node F is associated with
the variable node C as appeared in Figure 2.1
5
Figure 2.1: representation of parity check matrix with Tanner graph [8]
Code construction play important role in the error correcting code because it will also
responsible for the performance of error correcting code means it will give us idea that up to
which level of noise we can remove from the code or correct the error. While constructing we
will look after different property of code like girth of the Tanner graph, Rank of the Parity
check matrix, Rate of the code. The construction of binary LDPC codes involves assigning a
small number of the values in an all-zero matrix to be 1 so that the rows and columns have the
required degree distribution. LDPC code can be constructed in two different way random
construction and structure construction or algebraic construction. In random construction
Initially we take zero matrix and assign 1 to that matrix in such a way that it will fulfil row-
column constrain like no two row or column occurs more than once. While in case of structure
construction parity check matrix contains array of circulant or permutation or zero sub matrix
and each row is circularly shifted either left or right with reference to row above it, first row is
left or right shifted row according to last row. LDPC code is represented by sparse m x n parity
check matrix H. In case of (n,k) type LDPC code k indicate message bits which encoded using
n bit code word contain k message bits and n-k parity bits. Parity check matrix H in LDPC
code is of n-k x n size where n-k indicate number of row in a matrix while n indicate number
of column in the given matrix H. The rate of LDPC code (R) is calculated using
R = k/n
6
Chapter: 3
LOW DENSITY PARITY CHECK CODE OPTIMIZED FOR
NEAR EARTH APPLICATIONS
3.1 OVERVIEW
3.1.1 BACKGROUND The mid-1990s were highlighted by the rediscovery of Low Density
Parity Check codes (LDPCC)[4] in the field of channel coding .Originally invented by R.
Gallager in his PhD thesis in 1961[1], this coding technique was largely forgotten for more
than 30 years. The primary advance in LDPCC is the discovery of an iterative decoding
algorithm, now called Belief Propagation (BP) decoding, which offers near-optimum
performance for large linear LDPCC at a manageable complexity. LDPCC performance gains
were difficult to realize technologically in the early 1960s. Several decades of VLSI
development has finally made the implementation of these codes practical. The original
construction, now called Gallager LDPCC, has come to be regarded as a special class of
LDPCC. Recent advances in LDPC code construction have resulted in the development of new
codes with (arguably) improved performance over Gallager LDPCC. One class of these codes,
irregular LDPCC demonstrates improved performance in the waterfall region. Disadvantages
of irregular codes, however, include an increase, in general, in the number of iterations required
for decoding convergence and an unequal error protection between code bits resulting from the
irregular structure. Another class of LDPCC developed using algebraic construction based on
finite geometries has been shown to provide very low error floors and very fast iterative
convergence. These qualities make these codes a good fit for near Earth applications where
very high data rates and high reliability are the driving requirements.
A linear block code is designated in this Experimental Specification by (n, k) where n is the
length of the codeword (or block) and k is the length of the information sequence. LDPC codes
are linear block codes in which the ratio of the total number of ‘1’s to the total number of
elements in the parity check matrix is << 0.5. The distribution of the ‘1’s determine the
structure and performance of the decoder. An LDPC code is defined by its parity check matrix.
The k × n generator matrix which is used to encode a linear block code can be derived from
the parity check matrix through linear operations.The LDPC code considered in this
specification is a member of a class of codes called QuasiCyclic codes. The construction of
7
these codes involves juxtaposing smaller circulants (or cyclic submatrices) to form a larger
parity check or base matrix. An example of a circulant is shown in figure 3.1. Notice that every
row is one bit right cyclic shift (where the end bit is wrapped around to the beginning bit) of
the previous row. The entire circulant is uniquely determined and specified by its first row. For
this example the first row has four ‘1’s or a row weight of four.
An example of a quasi-cyclic parity check matrix is shown in figure 3.2. In this case, a quasi-
cyclic 10 × 25 matrix is formed by an array of 2 × 5 circulant submatrices of size 5 × 5. To
unambiguously describe this matrix, only the position of the ‘1’s in the first row of every
circulant submatrix and the location of each submatrix within the base matrix is needed.
8
Figure 3.2: Example of a Quasi-Cyclic Matrix [3]
Constructing parity check matrices in this manner produces two positive features:
a) The encoding complexity can be made linear with the code length or parity bits using shift registers.
b) Encoder and decoder routing complexity in the interconnections of integrated circuits is reduced.
This section describes the base (8176, 7156) LDPC code. For reasons outlined below, implementations
should shorten the base code according to the format described in subsection 2.5. The parity check
matrix for the (8176, 7156) LDPC code is formed by using a 2 × 16 array of 511 × 511 square circulants.
This creates a parity check matrix of dimension 1022 × 8176. The structure of the parity check base
matrix is shown in figure 3.3.
Figure 3.3: Base Parity Check Matrix of the (8176, 7156) LDPC Code [3] [12]
Each Ai,j is a 511 × 511 circulant. The row weight of the each of the 32 circulants is two; i.e., there are
two ‘1’s in each row. The total row weight of each row in the parity check matrix is 2 × 16, or 32. The
column weight of each circulant is also two; i.e., there are two ‘1’s in each column. The total weight
of each column in the parity check matrix is 2 × 2 or four. The position of the ‘1’s in each circulant is
defined in table 3.1. A scatter chart of the parity check matrix is shown in figure 3.4 where every ‘1’
bit in the matrix is represented by a point.
9
Figure 3.4: Scatter Chart of Parity Check Matrix [3]
10
Absolute ‘1’s position in
s
‘1’s position in 1 t row of 1st
row of Parity Check
Circulant circulant Matrix
A1,1 0, 176 0, 176
A1,2 12, 239 523, 750
A1,3 0, 352 1022, 1374
A1,4 24, 431 1557, 1964
A1,5 0, 392 2044, 2436
A1,6 151, 409 2706, 2964
A1,7 0, 351 3066, 3417
A1,8 9, 359 3586, 3936
A1,9 0, 307 4088, 4395
A1,10 53, 329 4652, 4928
A1,11 0, 207 5110, 5317
A1,12 18, 281 5639, 5902
A1,13 0, 399 6132, 6531
A1,14 202, 457 6845, 7100
A1,15 0, 247 7154, 7401
A1,16 36, 261 7701, 7926
A2,1 99, 471 99, 471
A2,2 130, 473 641, 984
A2,3 198, 435 1220, 1457
A2,4 260, 478 1793, 2011
A2,5 215, 420 2259, 2464
A2,6 282, 481 2837, 3036
A2,7 48, 396 3114, 3462
A2,8 193, 445 3770, 4022
A2,9 273, 430 4361, 4518
A2,10 302, 451 4901, 5050
A2,11 96, 379 5206, 5489
A2,12 191, 386 5812, 6007
A2,13 244, 467 6376, 6599
A2,14 364, 470 7007, 7113
A2,15 51, 382 7205, 7536
A2,16 192, 414 7857, 8079
the numbers in the second column represent the relative column position of the ‘1’s in the first
row of each circulant. Since there are only 511 possible positions, these numbers can range
only from 0 to 510. The third column represents the absolute position of the ‘1’s in the parity
check matrix. There are exactly 8176 possible; therefore these numbers can range only from 0
to 8175
11
3.3 ENCODING
The encoder can be designed using the method given in reference . The generator matrix of the
(8176, 7156) code consists of two parts. The first part is a 7154 × 8176 submatrix in systematic
-circulant form as shown in figure 3.5. It consists of a 7154 × 7154 identity matrix and two
columns of 511 × 511 circulants Bi,js, each column consisting of 14 circulants. The Is are the
511 × 511 identity submatrices and the 0s are the all zero 511 × 511 submatrices. The second
part consists of two independent rows. The first part generates a (8176, 7154) LDPC subcode
of the (8176, 7156) code. Each codeword in the subcode consists of 7154 information bits and
1022 parity check bits. For reason given in section 3.4, there are advantages in using the
subcode implementation. There are many ways to design the encoder based on the generator
matrix in figure 3.5.These schemes have complexities that are proportional to the length of the
codeword or parity check bit
12
3.4 SHORTENED (8160, 7136) CODE
Using the generator matrix given by figure 2-5, an encoder can be implemented. This encoder
generates a (8176, 7154) LDPC subcode of the (8176, 7156) code. Current spacecraft and
ground systems manipulate and process data at 32- bit computer word size. Neither (8176,
7154) nor (8176, 7156) is a multiple of 32. It is beneficial to shorten the codeword to the
dimensions of (8160, 7136). In other words, by shortening the information sequence to 7136
through the use of 18 bits of virtual fill, the (8176,7154) subcode encoder can be used. This is
accomplished by encoding the virtual fill bits with zeros but not transmitting them; thus the
total codeword length becomes 8158. Note that it is not necessary to add two independent rows
to the generator matrix to encode the full (8176, 7156) code because these bits would be
shortened anyway, and so the subcode is sufficient and less complicated for this application.
Since the code length of 8158 is two bits shy of 8160, an exact multiple of 32, two bits of actual
transmitted zero fill are appended to the end of the codeword to achieve a shortened code
dimension of (8160, 7136) bits, or (1020, 892) octets, or (255, 223) 32-bit words. The shortened
codeword is shown in figure 2-6.The received shortened codeword would require the removal
of the two zero fill bits prior to decoding. The decoder would then reproduce the 18 virtual fill
zeros after processing but would, in general, not pass these 18 zeros on to the ground equipment
13
Chapter: 4
LOW DENSITY PARITY CHECK CODE FAMILY
OPTIMIZED FOR DEEP SPACE APPLICATIONS
4.1 OVERVIEW
4.1.1 BACKGROUND
The Low-Density Parity-Check (LDPC) codes presented in this document are intended to
complement the current codes in the CCSDS Recommended Standard, TM Synchronization
and Channel Coding, and were designed according to a list of requirements and evaluation
criteria that reflect the needs of spacecraft applications.
– Requirements
• Code rates: The family shall include codes of rate ≈ 0.5 and ≈ 0.8
• Block lengths: The family shall cover k≈1000 to k≈16000 information bits
spaced by multiples of ≈ 4
– Desired Properties
• Code rates: One or two intermediate rates from {1/2,2/3,3/4,4/5} are desired
– Evaluation Criteria
• Encoder computation: Preferred encoders require fewer logic gates for a given
speed
14
• Description complexity: The code description in a standards document should
be short
The selected code rates are 1/2, 2/3, and 4/5, three values, which are about uniformly spaced
by 1 dB on the rate-dependent capacity curve for the binary -input Additive White Gaussian
Noise (AWGN) channel. Near rate 1/2, a 1% improvement in bandwidth efficiency costs
about 0.02 dB in power efficiency; near rate 7/8, a 1% improvement in bandwidth
efficiency costs 0.1 dB in power efficiency. Hence, the use of a higher order modulation
may be a more practical means for saving bandwidth than the use of a code with rate much
above 0.8. The code rates are exact ratios of small integers to simplify implementation.
The selected block lengths are k=1024, k=4096, and k=16384. The three values
k={1024,4096,∞ } are about uniformly spaced by 0.6 dB on the sphere-packing bound at
WER=10-8, and reducing the last value from ∞ to 16384 makes the largest block size
practical at a cost of about 0.3 dB. By choosing to keep k constant among family members,
rather than n, the spacecraft’s command and data handling system can generate data frames
without knowledge of the code rate. Choosing powers of 2 may simplify implementation..
The selected codes are systematic. A low-complexity encoding method is described .The
parity check matrices have plenty of structure to facilitate decoder implementation. The
codes have irregular degree distributions, because this improves performance by about 0.5
dB at rate 1/2, compared to a regular (3,6) code .
LDPC codes may be used to obtain greater coding gain than those provided by concatenated
coding systems. LDPC codes offer the prospect of much higher decoding speeds via highly
parallelized decoder structures.
15
4.2 SPECIFICATION
An LDPC code is specified indirectly by a v-by-w parity -check matrix H consisting of v
linearly independent rows. A coded sequence of w bits must satisfy all v parity-check
equations corresponding to the v rows of H. Parity-check matrices may include additional
linearly dependent rows without changing the code. An encoder maps an input frame of
k≤w-v information bits uniquely into a codeblock of n≤ w bits. If n<w, the remaining w-n
code symbols are punctured and are not transmitted. If k<w-v, the remaining dimensions of
the code remain unused.
The code block lengths n and information block lengths k, and the corresponding rates r=k
/n, are shown in table 4.1 for the suite of LDPC codes. The LDPC code rates r are exactly
as indicated in table 4.1, unlike the case of turbo codes for which the precise code rates are
slightly lower than the corresponding nominal rates due to termination bits.
Information
block length k rate ½ rate 2/3 rate 4/5
Table 4.1: Codeblock Lengths for Supported Code Rates (Measured in Bits)
Submatrix size M
Information
block length k rate ½ rate 2/3 rate 4/5
1024 512 256 128
4096 2048 1024 512
16384 8192 4096 2048
16
4.3 PARITY CHECK MATRICES
The H matrices are constructed from M×M submatrices, where the submatrix size is listed in
table 4.2
The H matrices for the rate-1/2 codes are specified as follows,
where IM and 0M are the M×M identity and zero matrices, respectively, and Π1 through Π8
are permutation matrices. The H matrices for the rate-2/3 and rate-4/5 codes are specified
with additional columns and permutation matrices as follows. An H matrix for rate-3/4 is also
specified since this rate naturally occurs via the column extension required to achieve rate
4/5.
Permutation matrix Πk has non-zero entries in row i and column πk(i) for i ∈ {0,..., M-1}
17
And
where the functions θk and are defined in table 3-3 and table 3-4. Values defined in these tables
describe ’s using 7-tuples where consecutive positions in a tuple correspond to submatrix sizes
from the set M={128, 256, 512, 1024, 2048, 4096, 8192}. The parity check matrix descriptions
in conjunction with table 3-3 and table 3-4 describe 28 codes, one for each rate r = {1/2, 2/3,
3/4, 4/5} and M in the set above. Of these 28 codes, 9
are selected based on criteria provided earlier in this document (for instance spacing of ~1dB
for different rates and 0.6 dB for different lengths). For any of the H matrices constructed per
this description the last M code symbols are to be punctured (not transmitted) For example, the
parity matrix for the (n = 1280, k = 1024) rate-4/5 code is shown below with blue lines
representing each non-zero circulant entry, and structure indicated by gridlines. Minor gridlines
are spaced at intervals m and major gridlines (not shown) at M = 4m
Figure 4.1 An H Matrix for the (n = 1280, k = 1024) Rate 4/5 Code [3]
4.4 ENCODING
The recommended method for producing codeblocks consistent with AR4JA parity-check
matrices is to perform matrix multiplication by block-circulant generator matrices. Note that
the family of AR4JA codes supports rates K/(K+2), where K=2 for a rate 1/2 code, K=4 for
rate 2/3, and K=8 for rate 4/5. AR4JA generator matrices, G, have size MK ×(MK + 3) if
punctured columns are described in the encoding, or MK ×(MK + 2) if punctured columns are
omitted. These matrices may be constructed as follows.
18
1) Let P be the 3M × 3M submatrix of H consisting of the last 3M columns. Let Q be the 3M
× MK submatrix of H consisting of the first MK columns.
3) Construct the matrix G = [IMK W ] , where IMK is the MK × MK identity matrix, and W is a
dense matrix of circulants of size MK × M(N-K). Note that we can define N such that code
block size is nunpunc = MN. For AR4JA codes, nunpunc also equals M(K+3) . We can then
therefore express the dimension of W as MK × 3M, or MK × 2M in the case where punctured
variables are omitted from the generator matrix
19
Chapter: 5
DECODING ALGORITHMS FOR LDPC CODE
The class of decoding algorithms used to decode LDPC codes are collectively termed message-
passing algorithms since their operation can be explained by the passing of messages along the
edges of a Tanner graph. Each Tanner graph node works in isolation ,only having access to the
information contained in the messages on the edges connected to it. The message-passing
algorithms are also known as iterative decoding algorithms as the messages pass back and
forward between the bit and check nodes iteratively until a result is achieved (or the process
halted). Different message-passing algorithms are named for the type of messages passed or
for the type of operation performed at the nodes.
Let us consider a parity check matrix with Wc = 2, Wr = 3 is shown below which is used to
encode the codeword
c 0 0 1 0 1 1
20
Initially in step 1 received binary signal y 1 0 1 0 1 1is directly fed to respected bit nodes,
and then bit nodes propagate these binary bits to respective check nodes via edges connected
to it. The 1st check node is joined to the 1st, 2nd, and 4th bit nodes, and so the message for the
1st check to the 1st bit node is the modulo-2 sum of 2nd and 4th bit (e.g. E1,1= M2 ⊕ M4= 0
⊕ 0 = 0). Similar process happen at each check node and sends back modulo-2 sum output to
respected variable node.In step 2 the 1-st bit has messages from 1st and 3rd checks, both are
zero here. Thus the majority of the messages into the 1st bit node indicate a value different
from the received value and so the 1st bit node flips its value. The 2nd bit has messages from
the 1st and 2nd checks is 1 and 0 respectively. Thus there is no majority for 1 so 2nd bit does
not flips its value. Now get the new bits and checks for parity, if there is no unsatisfied checks
and so the algorithm halts and returns as the decoded codeword.
21
running the LDPC decoder. The bit probabilities returned by the decoder are called the a
posteriori probabilities. In the case of sum-product decoding these probabilities are expressed
as log-likelihood ratios.
L(x)=loge ( 𝑝(𝑥=0)
𝑝(𝑥=1)
)
Log likelihood ratios are used to represent the metrics for a binary variable by a single value.
If p(x = 0) > p(x = 1) then L(x) is positive and the greater the difference between p(x = 0) and
p(x = 1), i.e. the more sure we are that p(x) = 0. Similarly if p(x = 1) > p(x = 0) then L(x) is
negative and the greater the difference between p(x = 0) and p(x = 1) the larger the negative
value for L(x). Thus the sign of L(x) provides the hard decision on x and the magnitude |L(x)|
is the reliability of this decision.
The aim of SP decoding is to compute the maximum a posteriori probability (MAP) for each
codeword bit, Pi = P{ci = 1|N}, which is the probability that the i-th codeword bit is a 1
conditional on the event N that all parity-check constraints are satisfied. The extra information
about bit i which is received from the parity-checks is called extrinsic information for bit i.
In SP decoding the extrinsic message from check node j to bit node i, Ej,i, is the LLR
of the probability that bit i causes parity-check j to be satisfied.
However, the messages sent from the VN to the CN, Mj,i, are not the full LLR value for each
bit. To avoid sending back to each CN information which it already has, the message from the
i-th bit node to the jth check node is the sum in (3.5) without the component Ej,I which was
just received from the j-th check node:
To test, the intrinsic and extrinsic probabilities for each bit are combined. The hard
decision on the received bits is given by the sign of the LLRs.
22
5.3 Min-Sum Decoding
Min-Sum Algorithm (MSA) is the simplified version of SPA that has reduced implementation
complexity with a slight degradation in performance. It performs simple arithmetic and logical
operations that makes suitable for hardware implementation. But the performance of the
algorithm is significantly affected by the quantization of soft input messages used. In the MS
decoding, similar to SP algorithm, the extrinsic messages are passed between check and
variable nodes in the form of log-likelihood ratios (LLRs). The min-sum algorithm, simplifies
the calculation of (3.4) by recognizing that the term corresponding to the smallest Mj,i′
dominates the product term and so the product can be approximated by a minimum
Let represent the LLR value, sent from variable node i to check node j. Suppose W = (w1, w2,
. . . , wN) C and Y = (y1, y2, . . . , yN) are the transmitted codeword and the received sequence
respectively. The MS decoding algorithm consists of the following steps:
(1) Initialize the iteration counter, i, to 1 and let IM be the maximum number of iterations
allowed.
Where product of signs can be calculated by using modulo 2 addition of hard decisions on each
Mji’.
(5) Apply a hard decision, i.e., compute W’ = (w1’w2’…….wn’) where element wn’ is
calculated as
23
Chapter: 6
DECODING ARCHITECTURES
The first approximation of an LDPC decoder architecture is shown in Figure 4.1, which tells
about the basic idea of the decoder's main blocks. It has two types of processors: the check-
node units(CNU) and the variable-node (bit-node) units(VNU) which is used to calculate the
equations in the – “The Min-Sum Algorithm” section. The rest part of the decoder is formed
by memories to store:
1) Rmn(l) messages from the CNU to the VNU,
2) Qmn(l) messages from the VNU to the CNU, and
3) memories to store the estimated values Pn .
24
code shows that the computational dependencies for any node depend only on nodes of the
opposing type. This allows all VNs or CNs to updated in a block-parallel manner which
enabling very high throughput.
Figure 6.2: Example of bipartite graph representation for a LDPC code and information
flow in the message passing algorithm.
It consists of:
1. Processing nodes: A fully parallel decoder contains a number of VNU equal to the number
of columns in the parity check matrix and a number of CNU equal to the number of rows in
the H matrix.
2. Routing network: The routing network is represented by wires which connect the VNUs
with the CNUs, according to the parity check matrix. Parallel architecture is straightforward,
the problem arises due to the routing network. For LDPC codes that have thousands of rows
and columns in the parity check matrix, the routing network involves tens of thousands of
connections between the variable node units and check node units. Furthermore, the H matrix
presents an irregular. structure, which makes the interconnections component highly irregular.
This will further contribute to the increase in cost, as well as reduction in the maximum
operating frequency due to the routing delay across the routing components of the FPGA.
Another disadvantage of fully parallel LDPC decoder is the low flexibility: the decoder is
specific to a LDPC code, and a slight modification in the code leads to the entire decoder
redesign. Furthermore, these types of architecture cannot easily accommodate features such as
multi-rate decoder, which is desired due to the fact that each communication and storage
standard uses multiple LDPC codes with different rates. The main advantage of this
architecture is represented by its high throughput, due to low number of clock cycles required
for an iteration.
25
6.2 Serial LDPC Decoder
In order to reduce the complexity of these decoders, one approach relies on the reduction of
the wires between the CNUs and VNUs. The CN messages and the VN messages are sent bit
by bit to their corresponding processing unit [9]. Thus, the connection between a variable node
unit and a check node unit consists of only two wires. Fig. 4.2 shows the difference between
the conventional bit-parallel scheme and a bit-serial scheme for a simple case of transferring
an n-bit number, bn ⋯ b2 b1. In Fig. 4.2(a) all the n bits are sent over n parallel lines in one
clock cycle. In contrast, in a bitserial scheme as in Fig. 4.2(b), the message is sent over a single
line in n clock cycles.
Figure 6.3: Two alternatives for synchronous transmission of an n-bit number (a) bit parallel:
n bits sent in one clock cycle over n wires. (b) bit-serial: n bits sent in n clock cycles over one
wire
In serial LDPC decoder the check-node and the variable-node operations are done sequentially.
The decoder consists of two memory spaces for storing the parity check matrix. One contains
the location of ones in each row and the other the location of ones in each column named as
Row-to-Col map memory and Col-to-Row map memory respectively. At the start of each
iteration, the indices of columns in a row are read and the corresponding LLR values are read
from the LLR memory. These values are sent to the Check-node processor (CNP) for the check-
node updating.
26
Figure 6.4: Block diagram of serial LDPC decoder and its bus interface to the PLB (Processor Local
Bus)
The CN updating is not done immediately after reading the LLR values and requires the same
processing time. In order to save on the memory consumed by the Row-to-Col map on the
FPGA, and to take advantage of a free time slot, the Row-to-Col memory consists of two
identical banks. While one bank is being used for mapping data, the other is filled with the data
read from the external DDR memory. In addition to simplifying the node-to-node
interconnection, the bit-serial approach has several other advantages for fully-parallel LDPC
decoders. In a bit serial scheme, the wordlength of computations can be increased simply by
increasing the number of clock cycles allocated for transmitting the messages. Using this
property, the precision of the decoder can be made programmable just by re-timing the node-
to-node message transfers without the need for extra routing channels. Programmability of the
decoder wordlength allows one to efficiently trade-off complexity for error correction
performance. Bit-serial decoding, however, imposes some challenges. The immediate effect is
that it reduces the decoder throughput compared with fully-parallel implementations, as
multiple clock cycles are required for transmitting a single message. Also some common check
and variable update functions cannot be efficiently implemented bit-serially
6.3 partially-parallel LDPC Decoder
The other approach to reduce the complexity and the cost of the LDPC decoder relies on the
serialization of the CN and VN operations at different levels. Thus, partially-parallel
architectures are employed. These partially-parallel decoders exploit the regular structure of
the QC-LDPC codes in order to obtain regular, low complexity architectures. Because
27
serialization is employed at different levels, messages have to be stored in dedicated memory
units. Stored messages have to be routed from the memory blocks to the processing units
according to the LDPC matrix. In order to provide a flexible way for message routing, barrel
shifters are employed. The read/write addresses for the memories, as well as the shift amounts
employed in routing, are generated from a dedicated control unit. The main components for a
partial parallel decoder are as follows:
1. Processing nodes: The number of VNUs and CNUs is dependent on the different parallelism
degrees at different level. Furthermore, the number of I/P and O/P for such units can also vary,
depending on how many messages can be processed each clock cycle.
2. Routing network: The routing is implemented using barrel shifters. The number and size
of the barrel shifters may vary with message quantization, circulant size of the base matrix,
different level parallelization degrees, etc. High-frequency pipelined barrel shifters may be
implemented without additional cost in modern FPGA devices due to the LUT and D flip-flop
pair which compose the basic component of the CLB.
3. Memory blocks: Memory blocks are used to store both the input LLRs and the CN and VN
messages. Usually, high degrees of parallelism--increased throughput--require wide memory
words and multi-port memories. In many implementations, the multi-port memories are
replaced by independent memory banks, which can be easily mapped on the FPGA BRAM
blocks.
4. Control unit: The control unit is used to generate the shift amounts, the read/write memory
addresses, as well as the control signals for the processing units. The shift amounts and the
memory addresses are code dependent; this kind of information is usually stored in dedicated
ROM memories.
6.4 A Generic Architecture for LDPC Decoder
the base architecture of the decoder consists of a controller, input/output memories, multiblock
message memories (in the BP-based algorithm messages are stored in message memories) and
a processing block containing many instances of the CN node and BN node processing units.
the challenge of decoding CCSDS C2 LDPC codes comes from the high number of messages
to deal with. In our low cost solution, this architecture processes 16 BN (/2 CN) concurrently
thanks to the regularity and the parallelism of the QC LDPC code. The generic design consists
in the use of several processing blocks and memory blocks with a larger word size (the
messages corresponding to the different input frames are stored in the same memory word and
are accessed concurrently)
28
Figure 6.5 Base Parallel Architecture
29
Chapter: 7
SUMMARY AND FUTURE WORK
I have discussed CCSDS Experimental specification which is composed of two parts. The
single rate 7/8 code described in section 2 was designed and optimized for the characteristics
and requirements typical of many Near-Earth missions, whereas the set of codes described in
section 3 was designed and optimized for the characteristics and requirements typical of many
Deep-Space missions. I have also discussed decoding algorithms and architecture for decoders
which can be used to implement the decoder for above standard.
The complexity of LDPC codes has been an area of research and discussion. For a Field
Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC)
implementation, the encoder’s complexity are dominated by two factors: 1) the total number
of required logic gates and 2) the routing complexity. For the code presented in this
Experimental Specification, the quasi-cyclic property allows for the use of shift registers whose
required number of logic gates is proportional to n–k .With regard to the routing complexity,
there is currently no way to predict this figure: it would depend on a number of factors, such
as the choice of the FPGA or ASIC, the routing algorithm, and the layout of the device. The
decoder’s complexity is larger than the encoder’s and even more difficult to predict.The
primary complexity factors (the total number of required logic gates and the routing
complexity) are a function of the choice of BP decoding algorithm (there are many) as well as
the architectural decisions (i.e., parallel or serial processing, number of bits of finite precision,
fixed number of iterations or stopping rule, use of look-up tables, etc.) These choices also
determine the decoder’s Bit Error Rate (BER) performance. My next step is to implement
LDPC decoder on FPGA for CCSDS standard
30
Chapter: 8
REFERENCES
[3] LOW DENSITY PARITY CHECK CODES FOR USE IN NEAR-EARTH AND DEEP SPACE
APPLICATIONS EXPERIMENTAL SPECIFICATION CCSDS 131.1-O-2
[4] D. J. C. MacKay and R. M. Neal. “Near Shannon Limit Performance of Low Density Parity
Check Codes.” Electro. Lett. 32 (August 1996): 1645-1646.
[6] The Development of Turbo and LDPC Codes for Deep-Space ApplicationsBy Kenneth S.
Andrews, Member IEEE, Dariush Divsalar, Fellow IEEE, Sam Dolinar, Member IEEE, Jon
Hamkins, Student Member IEEE, Christopher R. Jones, Member IEEE, and Fabrizio Pollara,
Member IEEE
[9] S. J. Johnson, “Iterative Error Correction: turbo, low density parity check and repeat-
accumulate codes”, Cambridge University Press, 2009
[11] K. Gunnam, J. M. C. Perez and F. Garcia-Herrero, "Algorithms and vlsi architectures for
low-density parity-check codes: part 2 - efficient coding architectures," in IEEE Solid-State
Circuits Magazine, vol. 9, no. 1, pp. 23-28,winter 2017
31
[12] TM Space Data Link Protocol. Recommendation for Space Data System Standards,
CCSDS 132.0-B-1. Blue Book. Issue 1. Washington, D.C.: CCSDS, September
2003.
[13] AOS Space Data Link Protocol. Recommendation for Space Data System
Standards, CCSDS 732.0-B-2. Blue Book. Issue 2. Washington, D.C.: CCSDS,
July 2006.
[14] A generic architecture of CCSDS Low Density Parity Check decoder for near-earth
applications ,2009 Design, Automation & Test in Europe Conference & Exhibition,2009
[15] C. Shannon, BA mathematical theory of communication, Bell Syst. Tech. J., vol. 27,pp.
379–423, Jul./Oct. 1948, 623-656
32