0% found this document useful (0 votes)
34 views5 pages

An Automated FPGA-based Framework For Rapid Prototyping of Nonbinary LDPC Codes

The document proposes an automated FPGA-based framework for rapidly prototyping nonbinary LDPC codes. It describes a reconfigurable hardware emulation architecture incorporating an extended min-sum and min-max decoder. The framework uses a library and scripting to automate construction of FPGA emulations for evaluating practical code and decoder designs.

Uploaded by

Khaled Ismail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

An Automated FPGA-based Framework For Rapid Prototyping of Nonbinary LDPC Codes

The document proposes an automated FPGA-based framework for rapidly prototyping nonbinary LDPC codes. It describes a reconfigurable hardware emulation architecture incorporating an extended min-sum and min-max decoder. The framework uses a library and scripting to automate construction of FPGA emulations for evaluating practical code and decoder designs.

Uploaded by

Khaled Ismail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

An Automated FPGA-based Framework for Rapid

Prototyping of Nonbinary LDPC Codes


Yaoyu Tao, Qi Wu
Abstract— Nonbinary LDPC codes have shown superior perfor- MM decoder that achieved a 9.3Mb/s throughput [9], [10]. How-
mance close to the Shannon limit. Compared to binary LDPC ever, designing FPGA emulation is not as easy as writing soft-
codes of similar lengths, they can reach orders of magnitudes ware codes and it’s especially difficult to implement reconfigu-
lower error rate. However, multitude of design freedoms of non- rable property on hardware as software parameters. Besides, it
binary LDPC codes complicates the practical code and decoder
design process. Fast simulations are critically important to evalu-
also requires extensive efforts in creating dedicated hardware ar-
ate the pros and cons. Rapid prototyping on FPGA is attractive chitecture and running through FPGA synthesis. For high com-
but takes significant design efforts due to its high design complex- plexity designs like NB-LDPC, it often takes weeks to months
ity. We propose a high-throughput reconfigurable hardware emu- to get a working FPGA emulation model. These barriers render
lation architecture with decoder and peripheral co-design. The ar- rapid FPGA prototyping inaccessible to NB-LDPC which would
chitecture enables a library and script-based framework that au- otherwise benefit significantly in code and decoder evaluation.
tomates the construction of FPGA emulations. Code and decoder
design parameters are programmed either during run time or by In this work, we focus on creating a FPGA-based framework
script in design time. We demonstrate the capability of the frame- enabling end-to-end automated design starting from code and
work in evaluating practical code and decoder design by experi- decoder design parameters to a working FPGA emulation. High-
menting with two popular nonbinary LDPC codes, regular (2, dc) throughput reconfigurable hardware emulation architecture in-
codes and quasi-cyclic codes: each emulation model can be auto- corporating EMS and MM decoder is proposed to address the
constructed within hours and the decoder delivers excellent error- challenges in creating FPGA emulation. The framework is built
correcting performance on a Xilinx Virtex-5 FPGA with through-
upon a library that consists of elementary building blocks, and a
put of up to hundreds of Mbps.
set of router scripts that assemble the building blocks to a com-
I. INTRODUCTION plete emulation model. We also demonstrate the capability of
the framework, especially in low error rate regime, by experi-
Nonbinary low-density parity-check (NB-LDPC) codes de- menting with two popular NB-LDPC codes that have been con-
signed in high order Galois fields have shown great potential ap- sidered for practical adoptions: regular-(2, dc) codes [11] and
proaching the Shannon limit [1], [2]. However, practical perfor- quasi-cyclic codes [12]. In all cases the framework is able to
mance of nonbinary LDPC codes can be far from their theoreti- complete decoder construction within hours including FPGA
cal performance, sometimes even worse than binary LDPC synthesis. The resulting emulation models are tested on Virtex-
codes, for the following two reasons: 1) practical decoding al- 5 FPGA, delivering throughputs up to hundreds of Mbps and
gorithms, like extended min-sum (EMS) [3] and min-max (MM) reaching a BER of 10 -9 in a day.
[4] with message truncation [5] or skimming [6], introduce per-
formance degradations and the effect of high order Galois field II. BACKGROUND
may be diminished; 2) practical decoder implementation inject NB-LDPC code is defined by a parity-check matrix H of size
non-idealities, such as finite word length and fix-point quantiza- m×n, where n is the block length and m is the number of parity
tion effects [7]. Therefore, it is critically important to evaluate checks. The elements in the H matrix belong to Galois field
code and decoder design for each new application that is brought GF(q) [1], [2]. The H matrix can also be represented by a factor
into consideration to ensure the efficiency gain by deploying graph, where each column is mapped to a variable node (VN),
NB-LDPC codes. each row to a check node (CN), and an edge connects variable
node vj and check node ci if H(i, j) ≠ 0. A regular (dv, dc) NB-
FPGA emulations are widely used to accelerate the simula- LDPC code has constant column weight dv and row weight dc.
tions showing orders of magnitude speedup. A typical simula- We surveyed in the literature and summarized that almost all the
tion setup for NB-LDPC contains a decoder under study and pe- NB-LDPC codes considered for practical implementation have
ripherals such as source generator and channel model. Dedicated regular H matrices [12], [13], [14], [15] due to its significantly
emulations have been developed recently for specific NB-LDPC lower hardware complexity.
codes to enhance the simulation throughput from less than a
Several efficient algorithms and their variations have been
hundred kb/s on a microprocessor up to hundreds of Mb/s. A
proposed for NB-LDPC with various error-correcting perfor-
high-speed non-binary LDPC decoder based on trellis MM al-
mance and implementation complexity [1]-[7], [13]. Among
gorithm with layered schedule achieved 630Mbps for a (2304, them the EMS [3] and the MM [4] algorithms work remarkably
2048) NB-LDPC code over GF(16) in [8]. Prior work [6] also well for hardware decoder design: they achieve a performance
demonstrated FPGA emulation for (960, 480) regular-(2, 4) close to the original BP algorithm [1], [2] and their complexity
codes delivering 9.76 Mb/s throughput based on four parallel is relatively low. Message truncation [5], bubble-check [16] and
EMS decoders with message truncation and skimming. The (744, skimming [6] techniques claim even lower complexity and
653) quasi-cyclic code over GF(32) enabled a partial parallel
Start Frame limit Iteration limit SNR
TABLE I NB-LDPC FPGA EMULATION PARAMETERS
Top Controller
Parame-
Category Description
ters Prior Generator GF LUT 2 LLRV calculation channels

q GF field order AWGN Generator


m Number of rows in H matrix AWGN Generator
Muxes Adders Sorter
Code n Number of columns in H matrix AWGN Generator

parame- dv Column degree or variable degree


Decoder
ters dc Row degree or check degree Prior Position
Perm/Perm-1
p Non-zero positions in H matrix Entry
Memory LUT
Dual paths
e GF indices in H matrix CN
LUT

Decoder nm Message truncation number

CN RAM
BW
FW c-v Perm/
design Q Number of quantization bits Mem
ECN VN
Mem/ VN
Perm-1 VN
Posterior
Memory
v-c
parame- LS-VN VN sorter length ECN Mem
LUT

ters LS-CN CN sorter length


Run-time L Iteration limit Decision

parame- F Frame limit Error


Error Memory
ters SNR Signal-to-noise ratio in dB Collector

FE SE BE

demonstrates great potential for practical, but they also degrade Fig. 2. Reconfigurable emulation system architecture
the error correction performance.
parameter combination. The challenges call for an automated
We briefly summarize the EMS and MM algorithms here for
design flow with new decoder and emulation architecture that
completeness. Both algorithms follow a five-step decoding pro-
enables full reconfigurability and delivers a high throughput.
cess as follows: (1) each variable node is initialized with sorted
prior log-likelihood ratio vectors (LLRV) L along with their as- III. RECONFIGURABILE EMULATION
sociated GF indices L. The length of LLRVs is determined by Reconfigurable emulation for NB-LDPC requires address-
message truncation number nm; (2) variable-to-check (v-c) mes- ing parameters of three categories: code parameters, decoder de-
sages are permuted based on the H matrix and sent to the check sign parameters and run-time parameters. We summarize the pa-
nodes. In the first iteration, the priors are used as the v-c mes- rameters with their descriptions as in Table I.
sages; (3) for each adjacent variable node vj, check node ci com-
putes the check-to-variable (c-v) message {Vij[k]}, k  {0, …, A. Emulation System Design
nm – 1}, that the parity-check equation is satisfied if vj = Vij[k]. Suppose without loss of generality that all-zero codeword
The computation is implemented as a forward-backward recur- are transmitted, we introduce a fully reconfigurable emulation
sion: EMS computes the sums through this recursion while MM system with high-throughput decoder.
picks only the maxes without summation operations for even Fig. 2 shows the architecture of proposed emulation system.
lower complexity. Note that c-v messages are sorted and only
A top controller implementing a finite state machine orches-
the nm highest probabilities are stored in both algorithms. Bub-
trates the emulation. The emulation system stays at IDLE state
ble-check technique [16] improves the check node latency as
until input Start jumps from 0 to 1. System then enters the RUN
well as hardware utilization by reducing the sorter length from
state. There are two sub-states in the RUN state: (1) Prior Gen-
nm to LS-CN while still maintaining the equivalent functionality;
eration (PG), (2) Decode and Decision (DD). They iterate for
(4) c-v messages are inverse permuted before being sent to the
each frame and a counter COUNT keeps track of the RUN state
variable nodes; (5) variable node vj computes the v-c message
and increments by 1 every time the system reaches the end of
{Uji[k]}, k  {0, …, nm – 1} for each adjacent check node ci DD state. The state transition diagram is shown in Fig. 3.
based on the prior LLRVs and the permuted c-v messages.
Skimming technique [6] skims less reliable probabilities and re- In PG state, LLRV calculation channels in the prior genera-
duces VN sorter length from nm to LS-VN. The procedure repeats tor compute sorted LLRVs of length nm along with their corre-
itself from step (2) until iteration limit L. sponding GF indices and store them to a dual-port prior memory.
In each channel, log2(q) parallel AWGN generators produce
Decoder architectures implementing EMS or MM have been log2(q) parallel LLRs of Q-bit and send them to a multiplexer
developed for FPGA emulation of various NB-LDPC codes [6], array. The multiplexer array also reads log2(q) bits that represent
[8], [9], [10], and most of them have limited flexibility for pa- a GF(q) symbol from the GF LUT: Note that each bit is associ-
rameters like iteration limit L; however, important parameters ated with a LLR. The multiplexer array selects LLR if the asso-
that are significant factors of error-correction performance and ciated bit is 1 and passes a 0 if the associated bit is 0. A log2(q)-
throughput, like q and nm, can only be studied in software simu- input adder sums up the outputs from multiplexer array for the
lations which take weeks to months to reach low BER region. symbol LLR and send the result to a sorter of length q. It takes
Reconfigurability for these important parameters on hardware q cycles to complete the sorting and another nm cycles to com-
involves complicated architecture and schedule changes and plete LLRV writes into the prior memory. Two channels are in-
takes extensive efforts and time repeatedly for every possible stantiated in our design to make full use of the two ports on prior
Start 0 1 Code and Decoder Parameters Elementary Module System Generator in
COUNT == Frame limit Configuration Simulink
n
-1 -1 -1 0 -1 -1 -1 4 -1 -1 dc dc Elementary Library
IDLE RUN RUN IDLE
-1 -1 -1 -1 2 -1 -1 -1 7 -1
37 04
COUNT++ m 48
PG PG m m 27
DD DD
Pre- Position Entry
Toverhead+(n/2)×(q+nm) NB-LDPC code
LUT LUT
Configure
Local Routing
ITER == Iteration limit processing
cycles Q 6 Script Scripts
F/B mem size
nm 16 Decoder Library
Iteration 1 Iteration 2 Iteration ITER [(dc-3)×nm]×[Q+log2(q)]
Prior mem size
Decoder design parameters
H matrix Row 1 H matrix Row 2 H matrix Row m
[n×nm]×[Q+log2(q)]

Col 0 Col 1 Col 2 Col dc/2-1 CN RAM 2+LS-CN+nm cycles FER/BER and Hardware Global Routing
Col dc-1 Col dc-2 Col dc-3 Col dc/2 write utilization Xilinx Virtex FPGA Scripts
0
3.1E-5 @ 3.2dB Evaluation Board
Col 0/1 Col 2 Col dc/2-1 Col dc-2
10

CN RAM 10
-2
Slice
Registers
22356 4.1E-6 @ 3.4dB Emulation Model
read

Bit Error Rate (dB)


Col dc-1/dc-2 Col dc-3 Col dc/2 Col 1 10
-4

Slice
-6

LUTs
31092 Matlab Virtex Bit
Col 1 Col dc/2-2 Col dc-3
10

FW read 10
-8

Occupied Interface Core file Synthesis/


BW read Col dc-3 Col dc/2 Col 1 1 1.5 2 2. 5 3
Eb/N0 (dB)
3.5 4
Slices download
F1 F2 F dc/2-1 F dc-2 BRAMS
FW/BW
B1 B2 B dc/2-1 B dc-2
FW write Col 1 Col 2 Col dc/2-1 Col dc-3
BW write Col dc-3 Col dc-4 Col dc/2-1 Col 1
M dc/2-1 M1 Fig. 4. Automated design flow based on Xilinx FPGA Platform
nm cycles Merge
M dc/2 M dc-2
perm -1
perm -1
perm-1 the middle of the trellis at dc/2, the other two ECNs start the
c-v mem wr/rd c-v w/r c-v w/r
Post read post rd post rd merge operations and send out two c-v messages to the inverse-
VN RAM
write
Col 0 Col dc-2 permutation blocks.
Col 1 Col dc-1
VN VN 0 VN dc-2
Operation VN 1 VN dc-1 Inverse permuted c-v messages are then stored into the c-v
v-c mem write/
v-c wr
Post
v-c wr
Post
memory of size m×dc×(Q+q) bits and at the same time written
posterior update
update update to the VN RAMs. VNs are implemented similar to [6] and they
nm cycles wait until the completion of VN RAM write since they require
2+LS-VN+nm cycles full-length LLRVs for address look-up, and the latency of each
(dc/2-2)×(2+LS-CN+nm)+2LS-CN+2nm +3 cycles dc/2×(2+LS-VN+nm) cycles VN operation is 2+LS-VN+nm cycles. The latency of decoding a
Fig. 3. Scheduling of the reconfigurable emulation system
complete row is shown in Fig. 3 and the next iteration starts by
reading v-c memory upon completion of all rows in previous it-
eration. Note that posterior memory is completely updated only
memory. Assume the latency overhead from AWGN generators
to the input of the sorter is Toverhead, it takes Toverhead+(n/2)×(q+nm) at the end of iteration until all neighboring c-v messages are in-
corporated for each VN. Decisions are made concurrently with
cycles to complete the entire prior memory initialization.
the last-row posterior memory write. The decoder and peripheral
The EMS or MM decoder is the most complex block of the co-design enables perfect interleaving of the dual-path with an
decoder emulation model. Upon completion of PG state, de- optimal pipeline schedule shown in Fig. 3.
coder is fired up for decoding and the system enters the DD state.
IV. AUTOMATED FLOW FOR RAPID PROTOTYPING
The DD state consists of ITER number of iterations, in which
priors are decoded following a layered row-by-row manner as We developed an end-to-end automated design flow based
shown in Fig. 3. on proposed reconfigurable emulation architecture targeting
Xilinx Virtex FPGA platform. The flow involves three automa-
For the first iteration, VNs read the prior LLRVs according tion steps: (1) pre-processing, (2) decoder library generation,
to the position LUT and by-pass them to the permutation blocks. and (3) top-level routing.
The permutation blocks perform GF multiplications and divi-
sions based on the permutation LUT and output permuted v-c Prior to the automation steps, a Simulink elementary library
messages into v-c memory of size m×dc×(Q+q) bits. A dc- has been developed that consists of elementary blocks that make
banked dual-port RAM of size (Q+q)×nm bits each is used as a up an emulation system: ECN, VN and LUTs, etc. The elemen-
buffer and provide required memory access bandwidth for an tary blocks are designed using Xilinx blockset that can be read-
optimal pipeline schedule like in Fig. 3. For both EMS and MM ily synthesized and they are quick to design and easily reusable.
decoder, CN implements the forward-backward recursion on a Note that controllers are designed with Mcode blocks that can
dc-stage trellis in three elementary steps: (1) forward step (F), (2) be programed directly by finite-state-machine written in Matlab
backward step (B), and (3) merge step (M). Each step in the re- codes.
cursion is done by an elementary CN (ECN) with sorter of length
The first step involves a pre-processing Matlab script that
LS-CN. Note that maximum number of elementary steps that can
computes the configuration parameters referenced in the ele-
run simultaneously is 4. Hence, we designed a CN with 4 ECNs mentary library. It also picks the ECN and VN with sorter length
and a pair of forward and backward memory that store interme- LS-CN and LS-VN, respectively, and creates a list of elementary
diate messages on trellis. Two ECNs perform the forward and
blocks that are required for given NB-LDPC code and decoder
backward, respectively, and each elementary step takes 2+LS-
parameters.
CN+nm cycles to complete. When forward and backward reach
TABLE II FPGA MAPPING RESULTS
(BASED ON XILINX VIRTEX-5 XC5VLX155T)

nm = 8 nm = 8 nm = 12
Resource
q = 32 q = 64 q = 16
Slice
13,929 (15%) 16742 (18%) 16,046 (17%)
Registers
Slice
17,210 (17%) 19,511 (19%) 17,908 (17%)
LUTs
Occupied
6,832 (28%) 8,744 (36%) 7,167 (30%)
Slices
BRAMs 55 (26%) 59 (28%) 53 (25%)

Fig. 5. Performance of rate-1/2 960-bit (2,4)-regular NB-LDPC


codes with 6-bit EMS decoder
In the second step, a set of local routers connect elementary
blocks together and produce larger building blocks for top level
routing. Complete prior generator, CN and Perm/Perm-1 pro-
cessing blocks are obtained, and a decoder library is established
that consists of modules ready for top-level routing.
Upon completion of all the modules in decoder library, the
top-level router connects them into a complete emulation model
in the third step. Interfaces are also added to provide run-time
parameters and capture FER/BER. The three-step automation is
followed by synthesis that generates bit file, which takes up to
several hours based on the experiments we carried out. Exclud- Fig. 6. Coding gain of 1024-bit QC-regular NB-LDPC codes with
ing the initial efforts in making the reusable elementary library, Q-bit MM decoder and nm = 16
the proposed framework including synthesis completes within varying from 1/2 to 7/8. Note that for QC codes we can maintain
hours. regularity of the H matrix if we add or remove a complete row
V. EXPERIMENTS AND ANALYSIS of sub-matrices; hence the code rate R is determined by the num-
ber of rows n in the H matrix. Parameter dv also changes with
We show the capability of proposed automated framework varying R. Fig. 6 shows the coding gain for each possible (R, Q)
by experiments with two popular NB-LDPC codes considered combinations based on a Q-bit MM decoder with nm = 16. With
for hardware implementations, regular (2, dc) codes and quasi- this study decoder designers can easily pick best decoder con-
cyclic (QC) codes. figurations for desired coding gain.
Experiment I: Suppose we are aiming to design a 6-bit de- VI. CONCLUSION
coder based on a rate-1/2 960-bit (2, 4)-regular NB-LDPC [11]
targeting BER 10-6 at SNR 4.4dB with 10 decoding iterations. We present a FPGA-based framework with an automated de-
We studied the BER performance by varying two important pa- sign flow for rapid prototyping of NB-LDPC codes. To the best
rameters, q and nm, based on EMS algorithms without message of our knowledge this work is the first automated FPGA emula-
skimming. With proposed framework, we run emulations on tion framework for NB-LDPC codes enabling full reconfigura-
Xilinx Virtex-5 FPGA core at a clock frequency of 120 MHz bility for code and decoder design parameters. Co-designed
sweeping nm from 4 to 16 for various q. Note that varying q only emulation architecture with EMS/MM decoder and peripherals
changes the values of non-zero entries but doesn’t affect the po- is proposed. The framework accepts parameter specifications
sitions of non-zero entries in the H matrix. Fig. 5 shows the BER and produces a complete FPGA emulation model. Experiments
vs SNR under each possible (nm, q) combinations. As discussed on Xilinx Virtex-5 FPGA demonstrate the capability of the
in Section III, decoding latency is proportional to nm. Hence a framework in evaluating practical NB-LDPC code and decoder
smaller nm is desired for higher throughput upon meeting the design. With parallel copies of decoders mapped onto this FPGA
BER spec. Table II shows the hardware utilization for nm > 8 device, the platform delivers excellent error-correction perfor-
with various q values. Bigger q and nm result in larger hardware mance with throughput up to hundreds of Mb/s.
utilization. A combination of nm = 8 and q = 32 gives the best REFERENCES
throughput and area for the given BER spec.
[1] M. C. Davey and D. Mackay, “Low-density parity check codes over
Experiment II: We also experimented our framework with GF(q),” IEEE Commun. Lett., vol. 2, no. 6, pp. 165-167, Jun. 1998.
1024-bit GF(16) QC NB-LDPC codes [12] with code rate R [2] M. C. Davey, “Error-correction using low-density parity-check codes,”
Ph.D. dissertation, Univ. Cambridge, Cambridge, UK, 1999.
[3] D. Declercq and M. Fossorier, “Extended min-sum algorithm for [10] X. Zhang and F. Cai, “Reduced-complexity decoder architecure for non-
decoding LDPC codes over GF(q),” ,” in IEEE Int. Symp. Information binary LDPC codes,” IEEE Trans. Very Large Scale Integr. Syst., vol. 19,
Theory, Adelaide, Australia, Sep. 2005, pp. 464-468. no. 7, pp.1229-1238, Jul. 2011.
[4] V. Savin, “Min-max decoding for non binary LDPC codes,” in IEEE Int. [11] C. Poulliat, M. Fossorier, and D. Declercq, “Design of regular (2, d c) –
Symp. Information Theory, Toronto, Canada, Jul. 2008, pp. 960-964. LDPC codes over GF(q) using their binary images,” IEEE Trans.
[5] A. Voicila, D. Declercq, F. Verdier, M. Fossorier, and P. Urard, “Low- Commun., vol. 56, no. 10, pp. 1626-1635, Oct. 2008.
complexity decoding for non-binary LDPC codes in high order fields,” [12] B. Zhou, J. Kang, S. Song, S. Lin, K. Abdel-Ghaffar, and M. Xu,
IEEE Trans. Commun., vol. 58, no. 5, pp.1365-1375, May 2010. “Construction of non-binary quasi-cyclic LDPC codes by arrays and array
[6] Y. Tao, Y. Park and Z. Zhang, “High-throughput architecture and dispersions,” IEEE Trans. Commun., vol. 57, no. 6, pp. 1652-1662, Jun.
implementation of regular (2, dc) nonbinary LDPC decoders,” in IEEE 2009.
Int. Symp. Circuits Syst., Seoul, South Korea, May 2012, pp. 2625-2628. [13] C. Spagnol, E. M. Popovici, and W. P. Marnane, “Hardware
[7] H. Wymeersch, H. Steendam, and M. Moneneclaey, “Computational implementation of GF(2m) LDPC decoders,” IEEE Trans. Circuits Syst.
complexity and quantiztion effects of decoding algorithms of LDPC I: Reg. Papers, vol. 56, no. 12, pp. 2609-2620, Dec. 2009.
codes over GF(q),” in Proc. ICASSP, Montreal, Canada, May 2004, pp. [14] J. Lin, J. Sha, Z. Wang, and L. Li, “Efficient decoder design for nonbinary
772-776. quasicyclic LDPC codes,” IEEE Trans. Circuits Syst. I: Reg. Papers, vol.
[8] J. O. Lacruz, F. García-Herrero, M. J. Canet, J. Valls and A. Pérez- 57, no. 5, pp. 1071-1082, May 2010.
Pascual, "A 630 Mbps non-binary LDPC decoder for FPGA," 2015 IEEE [15] Lin, J.; Yan, Z., "An Efficient Fully Parallel Decoder Architecture for
International Symposium on Circuits and Systems (ISCAS), Lisbon, 2015, Nonbinary LDPC Codes," Very Large Scale Integration (VLSI) Systems,
pp. 1989-1992. IEEE Transactions on , vol.PP, no.99, pp.1,1, Dec. 2013
[9] X. Zhang and F. Cai, “Efficient partial-parallel decoder architecuture for [16] Boutillon and L. Conde-Canencia, “Bubble check: a simplified algorithm
quasi-cyclic nonbinary LDPC Codes,” IEEE Trans. Circuits Syst. I: Reg. for elementary check node processing in extended min-sum non-binary
Papers, vol. 58, no. 2, pp. 402-414, Feb. 2011. LDPC decoders,” IEE Electron. Lett., vol. 46, no. 9, pp. 633-634, Aug.
2010.

You might also like