Transactions Briefs: A Nonbinary LDPC Decoder Architecture With Adaptive Message Control
Transactions Briefs: A Nonbinary LDPC Decoder Architecture With Adaptive Message Control
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012
Transactions Briefs
A Nonbinary LDPC Decoder Architecture With Adaptive
Message Control
Weiguo Tang, Jie Huang, Lei Wang, and Shengli Zhou
AbstractA new decoder architecture for nonbinary low-density paritycheck (LDPC) codes is presented in this paper to reduce the hardware operational complexity in VLSI implementations. The low decoding complexity
is achieved by employing adaptive message control (AMC) that dynamically trims the message length of belief information to reduce the amount
of memory accesses and arithmetic operations. To implement the proposed
AMC, we develop the architecture of a horizontal sequential nonbinary
LDPC decoder. Key components in the architecture have been designed
with the consideration of variable message lengths to leverage the benefit
of the proposed AMC. Simulation results demonstrate that the proposed
nonbinary LDPC decoder architecture can significantly reduce hardware
operations and power consumption as compared with existing work with
negligible performance degradation.
Index TermsAdaptive control, decoding, Galois field, Min-Sum, nonbinary low-density parity-check (LDPC) codes, VLSI architecture.
I. INTRODUCTION
Low-density parity-check (LDPC) codes [1], [2] are considered as
one of the most powerful capacity-approaching codes. LDPC codes
can be constructed in both binary domain and Galois fields (i.e.,
m , where
). Binary LDPC codes have been studied
extensively [3][5] and adopted in many communication protocols,
such as DVB-T2, WiMax, etc. In general, a very long code length is
required for binary LDPC codes to approach the channel capacity.
Nonbinary LDPC codes constructed in Galois fields [6] offer improved
performance at a moderate code length. In addition, nonbinary LDPC
codes can be combined with high order modulations [7], [17] to
increase the bandwidth efficiency. Due to these features, design and
implementation of nonbinary LDPC codes have become critical for
many emerging applications such as underwater acoustic communications [17].
A key challenge in the application of nonbinary LDPC codes is their
high decoding complexity, as each symbol in the codeword is decoded
m ). A lot of research effort
using a long message (e.g., m in
aims at reducing the decoding complexity of nonbinary LDPC codes
at the algorithm level [7][9], [11]. To deal with the problem that computational complexity increases exponentially with , the extended
Min-Sum (EMS) was proposed in [10] where only the most significant
m entries in a message were used in the decoding. A decoding technique developed in [12] conducted the EMS with a reduced complexity
of
m 2 m with minor performance degradation. It should be
noted that these algorithm-level techniques do not explicitly consider
the complexity in the implementation of nonbinary LDPC decoders.
While many hardware-efficient VLSI architectures were proposed for
GF(2 )
m>1
GF(2 )
O(n log n )
Manuscript received December 05, 2010; revised March 04, 2011 and June
14, 2011; accepted August 12, 2011. Date of publication September 15, 2011;
date of current version July 27, 2012.
The authors are with the Department of Electrical and Computer Engineering,
University of Connecticut, Storrs, CT 06269 USA (e-mail: weiguo.tang@engr.
uconn.edu).
Digital Object Identifier 10.1109/TVLSI.2011.2165346
binary LDPC decoders [3][5], few results [14][16] exist for nonbinary LDPC decoders.
Different from these existing work targeting hardware implementation cost, the focus of this paper is to reduce the hardware operational
complexity in nonbinary LDPC decoder architectures. This enables efficient decoding suitable for emerging applications such as underwater
acoustic sensor networks [17] that are under the severe resource (e.g.,
energy) constraints. It was reported [4] that memory accesses and arithmetic operations are the two major contributors to the operating cost in
LDPC decoders. As the amount of memory accesses and arithmetic
operations is largely determined by the message length, reducing message length is deemed as an effective way for efficient decoding. Based
on this fact, our past work [18] has proposed to use adaptive message
control (AMC) to reduce the decoding complexity. Different from the
EMS which maintains a constant message length for every symbol, the
proposed AMC adjusts the message length adaptively, which can reduce the message length at the required performance.
In this paper, we develop a horizontal sequential VLSI architecture
for the nonbinary LDPC decoder employing the AMC. The design
of the key components in this architecture, such as variable node and
check node update units, is optimized by exploiting the variable length
sorters, which can be dynamically configured in different functional
units to accommodate variable message lengths. The AMC is implemented by a low-complexity approximation method to avoid hardware
overheads and performance impact. A mapping table based approach
is proposed to conduct searching operations with low complexity. We
apply AMC to EMS to address the memory and throughput issues
caused by the worst case message length. Note that AMC can also be
employed in other decoding update rules such as the Min-Max algorithm. In addition, the proposed AMC can be generally applied to most
existing decoder architectures (sequential, partial parallel and fully parallel).
II. ADAPTIVE MESSAGE CONTROL
A nonbinary LDPC code is defined by its parity check matrix (PCM)
H
1 i N M
v
GF(2 )
N
c ;1 j M
c
h
v;
H
r = q1
q2
(1)
where 1 and 2 are the length- m variable node messages, and the
2
, where
basic operation
is defined as
1
m
2
and
1 2 (in
1 2 are the entries in messages
log domain) corresponding to
, respectively.
GF(2 )
r = max(q + q )
+
=
r ;q ;q
r; q ; q
; ;
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012
2119
On the other hand, each variable node improves the fidelity of belief
information based on the received messages from multiple check nodes
connected by the PCM. The elementary operation at a variable node can
be expressed as [10], [12]
q = r1 + r2
(2)
which sums up the belief information associated with the same finitefield element.
The hardware complexity in implementing (1) and (2) is determined
by the lengths of variable node messages (VNMs) and check node messages (CNMs). Intuitively, when the distribution of belief information
is more concentrated, a shorter message might be sufficient to retain
most of the belief information.
The basic idea of AMC is to keep as few entries in a message as
possible without incurring much information loss. It has been demonstrated [18] that message truncation can be implemented by finding the
minimal n that satisfies
(1 0 )
q(n + 1) q(1) + ln
(3)
k2M (i)nj
qik
(5)
qij
= AMC ^fj +
AMC
k2N (j )ni
rkj
(6)
c^j
= max
fj +
k2N (j )
(1 0 )
q(n + 1) ln
rij
rkj
:
(7)
2120
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012
Fig. 2. Implementation of the variable length sorter, where only the data path
of the belief information is shown for simplicity.
Fig. 3. Implementation of the VNU unit.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012
2121
TABLE I
HARDWARE OPERATION REDUCTION OF EMS, AMC-EMS, AND AMC-MS COMPARED WITH MS
TABLE II
ESTIMATED HARDWARE MEASURES OF MS, EMS, AMC-EMS (
m = 4;n
;K = 5; = 0:9999)
= 10
1005 . Clearly, AMC-MS is more effective in reducing hardware operations than EMS; i.e., the number of major operations in AMC-MS is
less than 50% of that of EMS. By applying AMC to EMS, the hardware operations can be further reduced at the expense of negligible
performance loss (see Fig. 5). It is expected that by reducing hardware
operations, the proposed AMC will also enable significant reduction in
energy consumption.
C. Comparison of Hardware Implementations
and n
A. Decoding Performance
Our past work [18] has shown that applying AMC to MS incurs
less than 0.1 dB loss at the block-error-rate (BLER) of 1004 with
= 0:9999, and the performance loss increases to 0.35 dB when
= 0:999. In comparison, EMS when reducing nm from 16 to 12
and 10 results in the performance loss of 0.15 and 0.3 dB, respectively,
at the same BLER level. One issue of AMC-MS is that the hardware
complexity and throughput is determined by the worst case message
length, even though the operational complexity can be significantly reduced. To address this issue, we propose to apply AMC with EMS in
this paper. Fig. 5 shows the BLER performance of AMC with EMS
initialization. The AMC with = 0:9999 and = 0:999 incurs about
0.03 and 0.1 dB performance loss, respectively, to the conventional
EMS. However, AMC can further reduce the hardware operations of
EMS under the same throughput and memory size.
B. Complexity Reduction by the AMC
We choose three cases: EMS with nm = 10, AMC-MS with =
0:999, and AMC-EMS with = 0:9999 and nm = 10, to evaluate the
reduction in hardware operational complexity. The results are listed in
Table I as compared with MS at SNR 2.8 dB, where BLER can reach
Although the physical implementation of the proposed decoder architecture is beyond the scope of this paper, hardware-related measures
are necessary to evaluate the decoder efficiency. In this subsection, we
will study the hardware cost, throughput (latency), and power savings
of the proposed decoder architecture. Similar to the existing work [14],
[15], these results will be estimated and compared with existing decoders.
As discussed in Section III, the AMC-based decoder needs sorters
for truncation operations. However, the amount of hardware operations is reduced at the CNU and VNU due to AMC, which can offset
the overhead of the sorters. Note that the CNU dominates the decoder
hardware complexity. Thus, we will focus on this unit for comparing
the hardware cost. The major components in the CNU are: 1) status
table and message RAM; 2) registers; 3) MUX; 4) comparators; 5)
adders; and 6) decoder for sorter control as shown in Fig. 2. Compared
with EMS, AMC-EMS needs an additional decoder that generates the
control signal for the variable sorter. As shown in Table II, this decoder takes about 2.2% of the total transistor count in the CNU. Also,
the slightly larger memory size (3.76%) in AMC-EMS compared with
EMS comes from the overhead to record the length of each variable
message, while EMS keeps messages with a constant length. The overhead in the VNU for AMC is also very small, consisting only of a decoder and a comparator. Both AMC-EMS and EMS (nm = 10 and
K = 5 bits quantization) save about 30% of the hardware cost as compared with MS. Note that since the proposed AMC is applied to the
CNU and VNU only, the hardware cost of other units in Fig. 1 in different implementations is more or less the same and thus we do not
include them in the comparison.
For all the three decoders in Table II, the critical path is in the sorter,
which is comprised of a comparator, an XOR gate and a 2-to-1 MUX.
The critical path consists of 12 2-inputs XOR gates for MS and EMS,
assuming a 5-bit carry ripple chain comparator implementation. For
AMC-EMS, the cell enable (CE) signal of the sorter is generated by
2122
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012
the outputs from both the decoder and comparator, thereby adding one
more 2-input AND gate on the critical path. In a full parallel decoder
implementation, all the messages are updated simultaneously. Thus
the worst case message length determines the number of clock cycles to finish one decoding iteration, although the message length for
AMC-EMS will be reduced during the iterations. The AMC-EMS has
slightly lower throughput (about 4%) than EMS due to the longer critical path in the sorter, while it takes the same number of clock cycles
to finish one iteration. As MS needs to compute 32 messages while
AMC-EMS (and EMS) only needs to compute 20 in the CNU, the
throughput can actually be improved by 1:52 by truncation as compared with MS. On the other hand, in a sequential implementation
the messages are updated in series. As the message length varies a lot
in AMC-EMS, so does the number of clock cycles spent on updating
the messages in each iteration. Thus, in a sequential architecture, the
total number of clock cycles for finishing decoding iterations is determined by the average message length. The average AMC-EMS message length is about 60% of that of EMS, indicating about 1:62 higher
throughput than EMS due to a smaller amount of message computation.
Compared with MS, sequential AMC-EMS can improve the throughput
by about 2:42.
Although AMC-EMS has slightly higher hardware cost and smaller
throughput (in a full parallel implementation) than EMS, it enables
significant power savings critical to emerging applications such as underwater acoustic communications [17], where one of the very scarce
resources is energy. Note that so far no measured results of power
consumption from the implementations of nonbinary LDPC decoders
can be found in the literature. However, the power consumption can
be reasonably estimated by considering the major hardware operations
that dominate the power consumption, such as memory access, addition, comparison, shifting, Galois filed multiplication and division, as
shown in Table I. Assume that each kind of these operations consumes
about the same power for MS, EMS, and AMC-EMS. Note that this assumption is conservative because EMS and AMC-EMS use a smaller
memory size. As shown in Table II, among all the major operations,
the AMC-EMS shows the smallest reduction in memory accesses and
real additions, which indicates that AMC-EMS can potentially reduce
about 65% and 50% of power consumption in memory accesses and
real additions, as compared with MS and EMS, respectively, under the
same clock frequency. The power reduction in other operations, such
as comparison and shifting, is even larger. Thus, we expect the overall
power reduction of AMC-EMS to reach about 65% and 50% compared
with MS and EMS, respectively, assuming an equivalent implementation.
V. CONCLUSION
In this paper, we proposed a new nonbinary LDPC decoding architecture based on AMC, which can significantly reduce the hardware
operations and power consumption in VLSI implementations by adaptively adjusting the message length of belief information while maintaining the required performance. A truncation scheme was developed
to implement the proposed AMC. The architecture design was optimized to fully exploit the benefit of the proposed AMC. Further work is
being directed towards a full fledged ASIC implementation for underwater acoustic sensor networks subject to stringent energy constraints.
REFERENCES
[1] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA:
MIT Press, 1963.
[2] D. J. C. MacKay and R. Neal, Good codes based on very sparse
matrices, in Proc. Cryptography Coding, 5th IMA Conf., 1995, pp.
100111.
[3] T. Zhang and K. K. Parhi, VLSI implementation-oriented (3,k) regular low-density parity-check codes, in Proc. IEEE Workshop Signal
Process. Syst. (SIPS), 2001, pp. 2536.
[4] M. M. Mansour and N. R. Shanbhag, High-throughput LDPC decoders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11,
no. 6, pp. 976996, Dec. 2003.
[5] J. Sha, Z. Wang, M. Gao, and L. Li, Multi-Gb/s LDPC code design and
implementation, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 17, no. 2, pp. 262268, Feb. 2009.
[6] M. Davey and D. J. C. MacKay, Low density parity check codes over
GF(q), IEEE Commun. Lett., vol. 2, no. 6, pp. 165167, Jun. 1998.
[7] R. Peng and R. Chen, Application of nonbinary LDPC codes for communication over fading channels using higher order modulations, in
Proc. IEEE Globecom, 2006, pp. 15.
[8] V. Savin, Min-Max decoding for nonbinary LDPC codes, in Proc.
IEEE ISIT, 2008, pp. 960964.
[9] H. Song and J. R. Cruz, Reduced-complexity decoding of q-ary LDPC
codes for magnetic recording, IEEE Trans. Magn., vol. 39, no. 2, pp.
10811087, Mar. 2003.
[10] D. Declercq and M. Fossorier, Decoding algorithms for nonbinary
LDPC codes over GF(q), IEEE Trans. Commun., vol. 55, no. 4, pp.
633643, Apr. 2007.
[11] H. Wymeersch, H. Steendam, and M. Moeneclaey, Log-domain decoding of LDPC codes over GF(q), in Proc. IEEE ICC, 2004, pp.
772776.
[12] A. Voicila, D. Decercq, F. Verdier, M. Fossorier, and P. Urard, Lowcompleixty, low memory EMS algorithm for non-binary LDPC codes,
in Proc. ICC, 2007, pp. 671676.
[13] Y. Chang, A. Vila Casado, M. Chang, and R. D. Wesel, Lower-complexity layered belief-propagation decoding of LDPC codes, in Proc.
ICC, 2008, pp. 11551160.
[14] X. Zhang and F. Cai, Efficient partial-parallel decoder architecture for
quasi-cyclic nonbinary LDPC codes, IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 58, no. 2, pp. 402414, Feb. 2010.
[15] J. Lin, J. Sha, and Z. Wang, An efficient VLSI architecture for nonbinary LDPC decoders, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.
57, no. 1, pp. 5155, Jan. 2010.
[16] A. Voicila, F. Verdier, D. Decercq, M. Fossorier, and P. Urard, Architecture of a low-complexity non-binary LDPC decoder for high order
fields, in Proc. ISCIT, 2007, pp. 12011206.
[17] J. Huang, S. Zhou, and P. Willet, Nonbinary LDPC coding for
multicarrier underwater acoustic communication, IEEE J. Sel. Areas
Commun., vol. 26, no. 9, pp. 16841696, Sep. 2008.
[18] W. Tang, J. Huang, L. Wang, and S. Zhou, Nonbinary LDPC decoding by Min-Sum with adaptive message control, in Proc. Int. Conf.
Acoust., Speech, Signal Process. (ICASSP), 2011, pp. 31643167.