A Low-Complexity Three-Error-Correcting BCH Decoder With Applications in Concatenated Codes
A Low-Complexity Three-Error-Correcting BCH Decoder With Applications in Concatenated Codes
Abstract—Error correction coding (ECC) for optical communi- and all error positions within a single clock cycles. Whereas
cation and persistent storage systems require high rate codes that the calculation of the error location polynomial is often per-
enable high data throughput and low residual errors. Recently, formed using the Berlekamp-Massey algorithm (BMA) which
different concatenated coding schemes were proposed that are
based on binary Bose-Chaudhuri-Hocquenghem (BCH) codes requires several iterations. Alternatively, decoders based on
that have low error correcting capabilities. Commonly, hardware Peterson’s algorithm [20] were proposed in [21], [22], [10],
implementations for BCH decoding are based on the Berlekamp- [11]. Such decoders can be more efficient than the BMA for
Massey algorithm (BMA). However, for single, double, and triple BCH codes with small error correcting capabilities, i.e. single,
error correcting BCH codes, Peterson’s algorithm can be more double, and triple error correcting codes.
efficient than the BMA. The known hardware architectures of
Peterson’s algorithm require Galois field inversion. This inversion In this work, we propose an inversion-less version of Pe-
dominates the hardware complexity and limits the decoding terson’s algorithm for triple error correcting BCH codes. This
speed. This work proposes an inversion-less version of Peterson’s algorithm is more efficient than the decoders employing Galois
algorithm. Moreover, a decoding architecture is presented that is field inversion [21], [11]. Moreover, the proposed inversion-
faster than decoders that employ inversion or the fully parallel less Peterson’s algorithm provides more flexibility regarding
BMA at a comparable circuit size.
the hardware implementation and enables pipelining to speed
I. I NTRODUCTION up the decoding. A decoding architecture for such a pipelined
architecture is presented.
Concatenated codes using BCH codes of moderate length The paper is organized as follows. In the next section, we
and with low error correcting capability have recently be introduce the notation and briefly discuss Peterson’s algorithm,
applied for error correction in optical communication as well which is the basis of the proposed decoding procedure. The
as in storage systems. Such coding systems require high code calculating of the error location polynomial for single, double,
rates, very high throughput with hard-input decoding, and low and triple errors along with the proposed inversion-less algo-
residual error rates. These requirements can be meet by gener- rithm are presented in Section III. In Section IV, we propose a
alized concatenated codes, product codes, half-product codes, hardware architecture for this algorithm and compare its speed
or staircase codes. For instance, generalized concatenated and the array consumption with other algorithms.
codes with inner BCH codes were investigated in [1], [2], [3],
[4], [5]. Moreover, product code constructions based in BCH II. P ETERSON ’ S ALGORITHM
codes were proposed in [6], [7], [8], [9]. Hardware architec- In this section, we briefly revise Peterson’s algorithm and
tures for such codes were proposed for instance in [10], [11], introduce the notations. The received vector is r(x) = v(x) +
[12], [13], [14]. Similarly, implementations for fast decoding e(x), where v(x) = v0 + v1 x + . . . + vn−1 xn−1 is a codeword
of staircase codes requires fast BCH decoding [15], [16]. of length n and e(x) = e0 + e1 x + . . . + en−1 xn−1 is the error
Due to the required code rates, BCH codes that can only vector. S1 , S2 , . . . , S2t−1 denote the syndrome values which
correct single, double, or triple errors are used. The decoding are defined as
of the concatenated codes typically requires multiple rounds Si = r(αi ) = e(αi ), (1)
of BCH decoding. Hence, the achievable throughput depends where α is the primitive element of the Galois field GF (2m ).
strongly on the speed of the BCH decoder. Moreover, BCH For binary BCH codes, the following relation holds
codes that correct only two or three errors are used in random-
access memory (RAM) applications [17], [18], [19], that S2i = Si2 . (2)
require high data throughput and a very low decoding latency.
Let ν be the actual number of errors and t is the error
BCH decoding consists of three steps: syndrome calculation,
correcting capability of the BCH code. The coefficients of
calculation of the error location polynomial, and the Chien
the error location polynomial σ(x) = σ0 + σ1 x + . . . + σt xν
search which determines the error positions. For BCH codes
satisfy a set of equations called Newton’s identities. In matrix
of moderate length (over Galois fields GF (26 ), . . . , GF (212 )),
form these equations are
the syndrome calculation and the Chien search can be per-
formed in parallel structures that calculate all syndrome values Aν Δ ν = S ν . (3)
DOI: 10.30420/454862002
delay v(x)
With σ0 = 1, the (i × i) matrix These solutions are used in [21], [24], [11] for decoding
⎛ ⎞ BCH codes. The main difference between [21] and [11] is
1 0 0 0 ... 0
⎜ S2 S 1 0 ... 0⎟ the implementation of the Galois field inversion in Equation
⎜ 1 ⎟
⎜ 0⎟ (9). For instance, in [11] a parallel hardware implementation
Ai = ⎜ S 4 S3 S2 S1 ... ⎟, (4)
⎜ .. .. .. .. .. ⎟ is proposed. This architecture requires only 4 Galois field mul-
⎝ . . . . ... .⎠ tipliers, but additionally a Galois field inversion is required.
S2i S2i−1 S2i−2 ... The complexity and the throughput of this architecture are
the vector of coefficients determined by the inversion. For the Galois field GF (210 ) the
⎛ ⎞ size of the inversion is about twice the size of a multiplier and
σ1
⎜σ2 ⎟ the length of the critical path is four times longer than that
⎜ ⎟
Δi = ⎜ . ⎟ , (5) of a multiplier. In [21] the inversion is implemented using a
⎝ .. ⎠ look-up table, which is only efficient for small Galois fields,
σi because the table size is of order O(m2m ). Even for moderate
and the syndrome vector Galois field sizes, e.g. m = 8, . . . , 12, such look-up tables are
⎛ ⎞ costly if multiple instances of the decoder are required.
−S1
⎜ ⎟ In the following, we propose an algorithm for triple errors
⎜ −S3 ⎟
Si = ⎜ .. ⎟. (6) that omits the Galois field inversion similar to the approach
⎝ . ⎠ in [24] that considers double errors. Omitting inversion reduces
−S2i+1 the hardware complexity and speeds up the calculation. First,
Note that the matrix Ai is singular for i > ν. Hence, we consider the case for single and double errors. Not that the
Peterson’s algorithm first calculates the number of errors roots of the error location polynomial do not change, if we
ν. Starting with i = t the determinant Di = det(Ai ) is multiply all coefficients with a non-zero factor. For instance,
calculated. If Di = 0 then the algorithm reduces the size of multiplying the right hand side of Equation (8) with S1 = 0,
the matrix Ai (decreases i) until Di = det(Ai ) = 0 holds we obtain an equivalent solution
and Equation (3) can be solved.
σ(x) = S1 + S12 x + D2 x2 (10)
Finally, the Chien search determines the error positions by
searching for the roots of the error location polynomial. The
for ν = 2 with the determinant
calculation of σ(αi ) for i = 0, . . . , n − 1 can be conducted in
parallel using simple logic operations [13], [17]. D2 = S3 + S13 . (11)
III. C ALCULATING THE ERROR LOCATION POLYNOMIAL
FOR SINGLE , DOUBLE , AND TRIPLE ERRORS Note that for ν = 1 and ν = 2, S1 is non-zero. For a single
For single, double, and triple errors the following direct error in position i we have S1 = αi = 0. Similarly, for two
solutions of the Newton’s identities follow [23] errors in positions i and j, we have S1 = αi +αj = 0, because
αi = αj . Equation (10) is also a solution for ν = 1, because
σ(x) = 1 + S1 x for ν = 1 (7) D1 = S1 = 0 and D2 = 0 holds for a single error.
3
S 3 + S1 2 Next, we consider the case ν ≥ 2. For ν = 2 and ν = 3,
σ(x) = 1 + S1 x + x for ν = 2 (8)
S1 we have D2 = 0 [21]. To see that first consider ν = 2, we
S 2 S3 + S5 2 have S1 = αi + αj and S3 = α3i + α3j . Hence,
σ(x) = 1 + S1 x + 1 x +
S3 + S13
S13 + S3 = (αi + αj )3 + α3i + α3j
3 S12 S3 + S5
S1 + S3 + S1 x3 for ν = 3 (9) = αi+2j + α2i+j = 0 for i = j. (12)
S3 + S13
Similarly, for ν = 3 we have S1 = αi + αj + αk and S3 = The size of a bit-parallel multiplier grows with order O(m2 )
α3i + α3j + α3k . Consequently, and the critical path with O(m). The Galois field inversion
is often implemented using Fermat’s little theorem, which
S13 + S3 = (αi + αj + αk )3 + α3i + α3j + α3k
requires only a single multiplier and a squaring operation, but
= αi+2j + αi+2k + αj+2k + m−1 clock cycles [25]. Hence, the total number of basic logic
αj+2i + αk+2i + αk+2j . (13) operations per inversion is of order O(m3 ). On the other hand,
the addition and squaring operations are of order O(m) with a
The last term is the determinant of the following matrix critical path length O(1). Consequently, these two operations
⎛ ⎞
1 αi α2i are neglected in the following discussion.
⎝1 αj α2j ⎠ . (14)
1 αk α2k S3 S5
S1
This matrix has full rank, because the columns are linearly
independent. Hence, D2 = 0 holds for ν = 3.
Now, multiplying the right hand side of Equation (9) by D2 , ( )2
we obtain an equivalent solution for ν = 3 as
σ(x) = D2 + S1 D2 x + δ2 x2 + D3 x3 (15)
with
δ2 = S12 S3 + S5 (16)
and the determinant
D3 = S1 (S2 S3 + S1 S4 ) + S32 + S1 S5 . (17)
S1 D2 δ2
Using (1), (11), and (16) we obtain
D3 = S1 (S12 S3 + S5 ) + S16 + S32 ( )2
= S1 δ2 + D22 . (18)
The decoding procedure is summarized in Algorithm 1. This
algorithm can easily be adapted to the decoding of single
and double error correcting codes, e.g. setting D3 = 0 for
double error correcting BCH codes. This is important for
the decoding of GC codes that use nested inner codes [13],
[14], where the error correcting capability increases from level
to level. In [14], Chase decoding of the inner BCH codes σ1 σ0 σ3 σ2
is used for soft-input decoding. This requires multiply BCH
decoding operations for each received BCH codeword. Due Fig. 2. Hardware architecture of the decoder pipeline.
to the complexity of the soft-input decoding and the small
performance gain for the better protected levels, the Chase Algorithm 1 can be implemented performing all operations
decoding is limited to the first three levels in [14]. These are in parallel. Such an implementation requires four multipliers
single, double, and triple error correcting BCH codes. and has a critical path length of two multipliers. It is more
efficient than the implementation proposed in [11]. The archi-
Algorithm 1 Inversion-less Peterson algorithm tecture in [11] uses three multipliers and one inversion, where
calculate D2 , δ2 , D3 the logic for the inversion is about twice the size of a multiplier
if D3 == 0 then (for GF (210 )) and the critical path length of the inversion is
return σ(x) = S1 + S12 x + D2 x2 equivalent to four multiplications. The total critical path in [11]
else has a length that is equivalent to six multiplications. Hence,
return σ(x) = D2 + S1 D2 x + δ2 x2 + D3 x3 at a smaller size the proposed algorithm has a significantly
shorter critical path.
Moreover, the proposed algorithm enables pipelined ar-
IV. H ARDWARE ARCHITECTURE chitectures that can speed up the decoding. Whereas the
In this section, we present a hardware architecture for the Galois field inversion is an atomic operation which limits
proposed decoding algorithm and compare its speed (critical the efficiency of pipelining. Figure 2 present such a pipeline
path length) and the array consumption with other algorithms. (without control logic). The pipeline requires four multipliers
Not that the critical path length and the circuit size is domi- and additional registers (three registers of width m bits) to
nated by the Galois field multipliers and Galois field inversion. store intermediate results. The pipeline reduces the critical
TABLE I
R ESULTS FOR THE FPGA IMPLEMENTATION FOR THE PROPOSED ALGORITHM .
[20] W. Peterson, “Encoding and error-correction procedures for the Bose- inverses in GF(2m ) using normal bases,” Information and computation,
Chaudhuri codes,” IRE Transactions on Information Theory, vol. 6, vol. 78, no. 3, pp. 171–177, 1988.
no. 4, pp. 459–470, September 1960. [26] W. Liu, J. Rho, and W. Sung, “Low-power high-throughput BCH error
[21] E.-H. Lu, S.-W. Wu, and Y.-C. Cheng, “A decoding algorithm for triple- correction VLSI design for multi-level cell NAND flash memories,” in
error-correcting binary BCH codes,” Information Processing Letters, IEEE Workshop on Signal Processing Systems Design and Implementa-
vol. 80, no. 6, pp. 299 – 303, 2001. tion (SIPS), oct. 2006, pp. 303 –308.
[22] S. Lin and D. J. Costello, Error Control Coding. Upper Saddle River, [27] J. Freudenberger and J. Spinner, “A configurable Bose-Chaudhuri-
NJ: Prentice-Hall, 2004. Hocquenghem codec architecture for flash controller applications,” Jour-
[23] Y. Jiang, A practical guide to error-control coding using Matlab. Artech nal of Circuits, Systems, and Computers, vol. 23, no. 2, pp. 1–15, Feb
House, 2010. 2014.
[24] N. Ahmadi, M. H. Sirojuddiin, A. D. Nandaviri, and T. Adiono, “An [28] J. Freudenberger, M. Rajab, and S. Shavgulidze, “A soft-input bit-
optimal architecture of BCH decoder,” in 2010 4th International Confer- flipping decoder for generalized concatenated codes,” in 2018 IEEE
ence on Application of Information and Communication Technologies, International Symposium on Information Theory (ISIT), June 2018, pp.
Oct 2010, pp. 1–5. 1301–1305.
[25] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative