A_low-complexity_implementation_of_QC-LD
A_low-complexity_implementation_of_QC-LD
RECONFIGURABLE LOGIC
ABSTRACT in 1948 [1], describing the maximum rate at which data can
Low Density Parity Check(LDPC) codes are a special class be transmitted over a noisy communications channel of a
of error correction codes widely used in communication specified bandwidth, was the starting point of the study
and disk storage systems, due to their Shannon limit of error control coding and since then much research has
approaching performance and their favorable structure. In been devoted to the optimization of encoding and decoding
this paper, a methodology for optimized hardware multipli- methods for error control in noisy environments.
cation by constant matrices in GF(2) is introduced and then Low Density Parity Check (LDPC) codes are linear
applied to the Quasi-Cyclic LDPC encoding algorithm. error correcting codes, defined by a sparse parity-check
Taking advantage of the fact that the parity check matrix matrix. These codes were originally developed by Gallager
rarely changes, the signals in many cases are hard-wired in the 1960s [2] and have demonstrated bit error rate
into the LUTs and thus the cyclic-shifters and block- (BER) performance close to the Shannon limit. However,
memories conventionally used are eliminated. Therefore, they have been unnoticed for a long time, because their
the proposed framework leads to less complex, mapped computational complexity was very high for the existent
to reconfigurable logic designs, whereas it combines the hardware technology. The reappearence of Turbo codes in
performance of hard-wired solutions (high throughput, low 1992 though, led to their rediscovery a few years later
latency) and the flexibility of the software and its hard- by MacKay and Neal [3]. Compared with Turbo codes,
ware counterparts. These advantages in terms of hardware LDPCs in many cases demonstrate better characteristics,
savings and throughput prove that the proposed encoder such as parallelism in decoding and simple computation
scheme is suitable for high-speed applications, such as operations, while having high performance. Moreover, a
long-haul optical transmission, where speed and resources special subclass of LDPC codes, called Quasi-Cyclic LDPC
utilization are a major issue. (QC-LDPC) codes, come along with an even more ef-
ficient implementation, while maintaining great perfor-
I. INTRODUCTION mance. Quasi-cyclic codes are codes in which a cyclic shift
of one codeword results in an other codeword and due to
In computer science, telecommunications and informa-
their advantageous structure they require less memory as
tion theory, forward error correction (FEC) is a technique
compared with the conventional LDPC codes, whereas their
with great practical significance in data transmission over
encoding is proved to be linear with code length.
unreliable, noisy communication channels and finds appli-
cation in the mobile, satellite and optical communication, In this paper, we focus on the development of an
and disk storage systems. Generally speaking, the main idea optimized hardware archtitecture for multiplication by con-
of error control coding (ECC) is to provide the transmitter stant matrices in GF(2) and the presentation of a novel
with ways to encode data signals in a redundant way design methodology for QC-LDPC encoders. To prove the
following certain relations, in order to enable automatic effectiveness of the proposed technique we apply it to the
error detection and correction in the received signals. The encoding algorithm of block-type LDPC (B-LDPC) codes.
publication of Claude Elwood Shannon’s landmark paper The B-LDPC codes are a distinguished class of QC-LDPC
codes, characterized by a suitable for efficient hardware
The research leading to these results is partially supported by the implementation encoding algorithm, good noise threshold,
ASTRON project (Adaptive Software-defined Terabit Transceiver for and low error floor [4].
flexible Optical Networks) with funding from the European Community’s
Seventh Framework Programme [FP7/2007-2013] under grant agreement Overall, the main contributions of this paper are the
n. 318714. following:
• presentation of an optimized hardware archtitecture reference to the code generation matrix , as described in
for multiplication by constant matrices in GF(2), [4].
• introduction of a design framework for QC-LDPC
encoders, based on a look-up table (LUT) method and II-B. Encoding Algorithm
advanced mapping to the FPGA resources, Assuming that the parity check matrix H has full rank,
• a low-complexity, high-throughput implementation of it is divided into the following form according to the
QC-LDPC encoder in reconfigurable logic, based on Richardson - Urbanke encoding method [7].
the proposed framework.
The remainder of the paper is organized as follow. In
Section 2, after a brief introduction to QC-LDPC codes,
their use as error correction codes in communication stan- , where A is (Mb - 1)*z × Kb*z, B is (Mb - 1)*z × z,
dards is presented. Moreover, a suitable encoding algorithm T is (Mb - 1)*z × (Mb - 1)*z, C is z × Kb*z, D is z × z,
is reviewed. In Section 3 the proposed design methodology E is z × (Mb - 1)*z, and Nb = Mb + Kb.
is at first described and then applied to B-LDPC codes. Let c be a a codeword of the code specified by H. Then
The differences between our solution and existing are also the following equation (syndrome check) has to be satisfied.
highlighted. In Section 4 experimental results and compar- H ∗ cT = 0 (3)
isons with other solutions, in terms of throughput and area, , where s denotes the Kb*z information bits and p1 and
indicate the gain achieved by the proposed scheme. Finally, p2 symbolize the z and (Mb - 1)*z parity bits ( c = [s
we give concluding remarks in Section 5. p1 p2]), respectively. From the above equation, p1 can be
II. BACKGROUND obtained by:
p1T = ϕ−1 ∗ (E ∗ T −1 ∗ A ∗ sT + C ∗ sT ) (4)
II-A. QC-LDPCs in Communication Standards
, but the inverse of the matrix ϕ = (E ∗ T −1 ∗ B + D) is
Quasi-cyclic low-density parity-check (QC-LDPC) codes not sparse, and thus the computational complexity is high.
have received much attention as a family of forward error In general, while the LDPC decoder can operate in linear
correction codes due to their favorable structure and excel- time, it may be hard to perform low-complexity encoding
lent error correction performance. Generally, a binary QC- of these codes [8].
LDPC code is specified by a parity-check matrix, which However, in their paper “Quasi-Cyclic LDPC Codes for
consists of square sub-matrices of the same size over the Fast Encoding” [4], S. Myung et al. proposed an encoding
(Galois field) GF(2), which are the zero matrix or circulant algorithm for these codes with linearly scaled complexity,
permutation matrices. Equation 1 provides a base of a QC- based on the idea to choose the matrix ϕ as the identity
LDPC code’s parity-check matrix in the form described matrix or a circulant permutation matrix in general (B-
above a LCPC codes). In that way, the encoding procedure of
P 11 · · · P a1Nb
.. B-LDPCs is simplified and can be summarized in the
Hb = ... ..
. . (1) following steps.
P aMb 1 ··· P aMb Nb • Step1: Compute A ∗ s and C ∗ s
T T
−1
• Step2: Compute E ∗ T ∗A∗s T
, where P is a zxz permutation matrix given by −1
• Step3: Compute p1 = E ∗ T ∗ A ∗ sT + C ∗ sT
T
0 1 0 0 ··· 0 • Step4: Compute p2 by T ∗ p2 = A ∗ s + B ∗ p1
T T T T
0 0 1 0 · · · 0
.. III. HARDWARE IMPLEMENTATION
P 1 = ... ... ... ..
. . (2)
0 0 0 0 · · · 1 In this section we present a novel hard-wire oriented
1 0 1 0 ··· 0 methodology for implementing efficient QC-LDPC encoder
architectures with high throughput and low resources con-
Note that P i is the circulant permutation matrix, which sumption, based on an optimized hardware archtitecture for
shifts the identity matrix I i times to the right (0 ≤ i ≤ z). multiplication by constant matrices in GF(2). The overall
Especially, some communication standards, such as design framework is based on a look-up table (LUT)
IEEE 802.11 and IEEE 802.16, have adopted the QC- method and advanced mapping to the FPGA resources.
LDPCs as error correction codes for their channel coding Since the parity check matrix of the QC-LDPC codes is
scheme [5], [6] and support various code rates and block composed of circulant shifted identity and zero matrices,
lengths. It should be noticed that the QC-LDPC codes the key problem of designing the encoder is the efficient
employed by the mentioned IEEE standards own the special multiplication of the input data by a number of circular shift
property of the parity check matrices of B-LDPCs and thus unit matrices. In contrast with other proposed designs, in
their encoding can be done in an efficient way, without the presented scheme there is no need for cyclic shifters
in order to carry out the required arithmetic operations. Figure 2 depicts an architecture overview of the QC-LDPC
Noting that in many cases the code rate does not change encoder defined by the parity-check matrix given by the
regularly, the parity check matrix Hb is defined as constant IEEE 802.11n standard (Rate = 5/6 and subblock size z =
and thus we proceed with an implementation approaching 81).
the efficiency of hard-wired solutions. To maintain the
flexibility of its counterparts and support different code
rates and block lengths use of the reconfigurable nature
of FPGAs should be made.
To make things clear we should provide a simple exam-
ple.
Assuming we want to execute the operation:
q = L ∗ xT (5)
, where L is a z x 3z matrix consisting of z x z sub-
matrices, x is a 3z x 1 matrix and q is a z x 1 matrix.
qz−1 x3z−1
qz−2 x3z−2
. [ a ] .
. = P 2 P a1 P a0 ∗
. .. (6)
q1 x1
q0 x0
q x x x
z−1 3z−1 2z−1 z−1
qz−2 x3z−2 x2z−2 xz−2
. = P a2 ∗ . ⊕P a1 ∗ . ⊕P a0 ∗ . (7)
. . . .
. . . .
q0 x2z xz x0
Fig. 2. Architecture overview for the LDPC code with R
, where ⊕ here denotes the bitwise modulo-2 addition = 5/6 and z = 81 bits, given by the IEEE 802.11n standard
(XOR).
The hardware verification of the LDPC encoding circuit
can be realized by the outcome of Eq. 3.
IV. IMPLEMENTATION RESULTS
This section is devoted to the presentation of IEEE
802.11 LDPC encoder implementation results, defined by
the parity- check matrix given by the standard for codeword
length n = 1944 bits, rate 5/6 and block size z = 81
bits. The target reconfigurable medium is a Xilinx Virtex5
(xc5vlx20t).
Fig. 1. Straightforward implementation of Eq.7. Colored
In more detail, Table I provides a comparison between
components can be removed with the proposed method.
existing and the proposed QC-LDPC encoder in terms
According to conventional approaches, this operation is of throughput, latency and resources utilization. All the
implemented with the use of z-bit cyclic shifters(for the included designs were applied to LDPC codes with the
matrix multiplication) as shown in Figure 1. However, in same structure and rate. Compared with its counterparts,
the proposed solution as the matrix L is regarded constant, the proposed implementation has a higher maximum fre-
we skip the shift operation and connect the signals to the quency, requires less clock cycles, and therefore it achieves
right LUTs (used as XOR operators because all calculations significant higher throughput. As it is demonstrated by the
are in GF(2)) directly. Based on the proposed technique results, skipping cyclic-shifters and proceeding with a hard-
there is also no need for block-memories (ROM) to get used wire oriented implementation leads to important gains in
and further the number of required components is reduced. term of encoding speed. Further performance enhancement
Therefore, the derived architecture exhibits lower hardware could be accomplished by the use of pipeline techniques,
complexity (i.e. a smaller number of gates). Additionally which will come along with an increase in the number of
the maximum number of the required resources is totally occupied slices (more registers will be required) though.
predictable. As far as area is concerned, the presented solution is
To prove the effectiveness of the proposed technique more advantageous than its counterparts. As can be seen
we apply it to the encoding algorithm described above. in Table I, it requires less LUTs/LEs(Logic Elements) than
Table I. Throughput and latency comparison of QC-LDPC encoders
a [9] [10] [11] [12] [13] [14] Proposed Design
Code Rate 5/6 5/6 5/6 5/6 5/6 5/6 5/6
Codeword length(bits) 1944 1944 1944 2304 2304 3072 1944
Frequency (MHz) - - - 60 150.69 100 290
Clock cycles 24 73-83 24 - 51 - 4
Throughput (Gbps) - - - 0.36 5.67 19.2 117.45
LUTs/LEs - - - 11,430 12,306 - 1,782
Block-Memories 3 3 3 3 - 3 7
FFs 1701 162 1053 - - 2187
XOR gates 1620 243 1053 - - 688 -
z-bit barrel shifter 20 1 1 - - - 0
2nd-stage cyclic shifter 0 0 11 - - - 0
Target medium - - - STRATIX STRATIX Virtex-II Virtex5 (xc5vlx20t)
EPIS80F1506c6 EP1S25F672C6
a- denotes that the associated implementation result of this design is not available.
[12], [13], whereas although it utilizes more FFs than [9], [4] S. Myung, K. Yang, and J. Kim, “Quasi-cyclic ldpc codes
[10], [11], the fact that no barrel shifters are employed, for fast encoding,” Information Theory, IEEE Transactions
makes it area efficient. For comparison reasons, we should on, vol. 51, no. 8, pp. 2894–2901, 2005.
[5] “IEEE Standard for Information technology–
mention that in a state-of-the-art Xilinx Virtex6 device a Telecommunications and information exchange between
32-bit barrel shifter uses 96 LUTs. Hence, it is obvious that systems Local and metropolitan area networks–Specific
avoiding the z-bit barrel shifters(in our case z = 81 bits) requirements Part 11: Wireless LAN Medium Access
brings on important resources savings. Finally, we should Control (MAC) and Physical Layer (PHY) Specifications
point out that in the presented scheme, in contrast with the Ame,” IEEE 802.11, pp. –.
[6] “IEEE Standard for Local and metropolitan area networks
others, no block-memories were used. Part 16: Air Interface for Broadband Wireless Access Sys-
V. CONCLUSION tems,” IEEE Std 802.16-2009 (Revision of IEEE Std 802.16-
2004), pp. 1–2080, 2009.
In this work, a novel design methodology for QC-LDPC [7] T. Richardson and R. Urbanke, “Efficient encoding of
encoders has been presented. To prove the low-complexity low-density parity-check codes,” Information Theory, IEEE
and high-throughput of the implementations within the Transactions on, vol. 47, no. 2, pp. 638–656, 2001.
proposed framework, this technique has been applied to [8] A. Mahdi, N. Kanistras, and V. Paliouras, “An encoding
scheme and encoder architecture for rate-compatible QC-
the encoding algorithm of Block-type LDPC codes. The LDPC codes,” in Signal Processing Systems (SiPS), 2011
selected algorithm is based on the special structure of the IEEE Workshop on, 2011, pp. 328–333.
parity check matrices of these codes and can be applied [9] Z. Cai, J. Hao, P. Tan, S. Sun, and P. S. Chin, “Efficient en-
to IEEE 802.11, IEEE 802.16 and other LDPC codes coding of IEEE 802.11n LDPC codes,” Electronics Letters,
with similar properties. The proposed hard-wire oriented vol. 42, no. 25, pp. 1471–1472, 2006.
[10] J. Perez and V. Fernandez, “Low-cost encoding of IEEE
design methodology comes along with advanced mapping 802.11n,” Electronics Letters, vol. 44, no. 4, pp. 307–308,
to the FPGA resources and leads to significant gains in 2008.
the encoding speed, while keeping the resources utilization [11] Y. Jung, Y. Jung, and J. Kim, “Memory-efficient and high-
low. Moreover, the required flexibility is not sacrificed by speed LDPC encoder,” Electronics Letters, vol. 46, no. 14,
taking advantage of the reconfigurable nature of FPGAs. pp. 1035–1036, 2010.
[12] H. Yasotharan and A. Carusone, “A flexible hardware en-
Due to these advantages, the proposed encoding scheme coder for systematic low-density parity-check codes,” in
is proven to be suitable even for high-speed applications, Circuits and Systems, 2009. MWSCAS ’09. 52nd IEEE
such as long-haul optical transmission. International Midwest Symposium on, 2009, pp. 54–57.
VI. REFERENCES [13] S. Kopparthi and D. Gruenbacher, “Implementation of a
Flexible Encoder for Structured Low-Density Parity-Check
[1] C. E. Shannon, “A mathematical theory of communication,”
Codes,” in Communications, Computers and Signal Process-
The Bell System Technical Journal, vol. 27, pp. 379–423,
ing, 2007. PacRim 2007. IEEE Pacific Rim Conference on,
623–656, July, October 1948.
2007, pp. 438–441.
[2] R. G. Gallager, Low-Density Parity-Check Codes. MIT
[14] Z. He, S. Roy, and P. Fortier, “Encoder architecture with
Press, 1963.
throughput over 10 Gbit/sec for quasi-cyclic LDPC codes,”
[3] D. J. MacKay and R. M. Neal, “Near Shannon Limit Per-
in Circuits and Systems, 2006. ISCAS 2006. Proceedings.
formance of Low Density Parity Check Codes,” Electronics
2006 IEEE International Symposium on, 2006, pp. 4 pp.–.
Letters, vol. 32, pp. 1645–1646, 1996.