0% found this document useful (0 votes)
70 views9 pages

A Triple Burst Error Correction Based On Region Selection Code

Uploaded by

sonali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views9 pages

A Triple Burst Error Correction Based On Region Selection Code

Uploaded by

sonali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1214 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO.

8, AUGUST 2023

A Triple Burst Error Correction Based


on Region Selection Code
Felipe Silva , Alan Pinheiro, Jarbas A. N. Silveira , and César Marcon , Senior Member, IEEE

Abstract— The evolution of microelectronics boosts more scal- increase data reliability [5], [6]. In aggressive environments,
able and complex circuit designs, providing high processing speed such as space, a multiple cell upset (MCU) induced by the
and greater storage capacity. However, reliability issues have strike of high-energy cosmic particles becomes more likely to
grown significantly as electronic devices scale down, increasing
the fault rate, mainly in critical applications exposed to radiation. occur with CMOS technology reduction, especially in SRAM
Memories are sensitive to charged particles, which can corrupt memories. Typically, an MCU appears in memory as a burst
data due to the transient effects. Error correction codes (ECCs) error pattern, which means that an energized particle may
are highly applied to mitigate data failures, increasing memory affect a group of neighbor cells, provoking either continuous
reliability. The matrix region selection code (MRSC) is an ECC errors (adjacent errors) or spaced errors by n bits in a word
designed to correct a high rate of adjacent errors in memory but
less effectively for nonadjacent errors. However, MRSC has a (nonadjacent errors) [7], [8], [9]. Since SEC-DEDs do not
2-D structure that makes it challenging to implement in memory correct an MCU properly, more efficacious ECCs have become
where one address is accessed at a time. This article introduces a design trend.
the triple burst error correction based on region selection code Several ECC techniques focusing on mitigating errors due
(TBEC-RSC), an ECC that uses MRSC concepts, converting the to the occurrence of MCUs in memory have been proposed
MRSC format to a 1-D structure. TBEC-RSC was implemented
and evaluated in a 16-bit data version; however, the code is recently [10], [11], [12], [13], [14], [15], [16], [17], [18],
easily extensible to the higher base-2 data words (e.g., 64 bits). [19], [21], [22], [23], [24], [25], [26], [27], [28]. Some of
Experimental results showed that TBEC-RSC corrects 100% of them proposed exploring elaborated 2-D ECCs for increasing
triple burst errors and more than 40% of 8-bit burst errors. correction capacity [10], [11], [12], [13], [14], [15], [16],
Index Terms— Critical application, error correction code [17], [18], [19], [21], [22]; others explore the error correction
(ECC), multiple cell upset (MCU), reliable memory. efficacy targeting a specific memory hierarchy level, i.e., cache
[23], [24], [25], [26], [27], [28]. Notwithstanding the error
I. I NTRODUCTION correction improvements being a significant contribution of
these ECCs, the designers should also consider energy, area,
A MODERN and complex integrated circuit (IC) has high
integration levels and processing power, enclosing com-
plex modules implemented by numerous components. This
and delay aspects caused by the ECC usage, since complex
ECCs can compromise the performance of applications with
enhancement emerged driven by the shrinkage of the CMOS limited energy and time resources, such as space ones.
technology in the nanoscale era, which brings together the In this context, matrix region selection code (MRSC) [10] is
increase of single event effects (SEEs) induced by the impact an efficacious and low-cost ECC for MCU correction, which
of charged particles. Additionally, single cell upset (SCU) is a uses a 2-D parity encoding scheme. The MRSC decoding
type of SEE that causes memory bitflip [1], [2]. Modern ICs divides the data bits into regions and uses logic equations to
usually contain memory devices whose occurrence of an SCU specify the location of the errors, creating a region selection
can lead to a total operation failure; thus, dealing with SEE is algorithm (RSA). MRSC is a 2-bit error corrector that also
a vital concern for safety-critical applications [3], [4]. covers several adjacent error patterns with lower synthesis
Single error correction-double error detection (SEC-DED) costs when compared to other robust ECCs. The same authors
is a type of error correction code (ECC) widely applied to of MRSC propose eMRSC [11], an extended version of
MRSC that explores the RSA capacities to improve the ECC
Manuscript received 24 July 2022; revised 31 December 2022 and 21 March efficacy.
2023; accepted 27 April 2023. Date of publication 19 May 2023; date
of current version 26 July 2023. This work was supported in part by the This article proposes the triple burst error corrector based
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and on region selection code (TBEC-RSC), a 2-D ECC scheme
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) mapped in a 1-D physical structure. TBEC-RSC employs the
under Finance Code 001. (Corresponding author: Felipe Silva.)
Felipe Silva, Alan Pinheiro, and Jarbas A. N. Silveira are with the main concepts presented in the MRSC algorithm, although
Department of Teleinformatics, Federal University of Ceará, Fortaleza, with some modifications for improving error correction effi-
Ceará 60455-970, Brazil (e-mail: [email protected]; [email protected]; cacy, enabling the correction of burst errors with high coverage
[email protected]).
César Marcon is with the Polytechnic School, Pontifical Catholic University and similar implementation costs compared with the original
of Rio Grande do Sul, Porto Alegre, Rio Grande do Sul 90619-900, Brazil MRSC. TBEC-RSC was implemented and evaluated in a
(e-mail: [email protected]). 16-bit data version; however, the code is easily extensible to
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TVLSI.2023.3273085. higher base-2 data words (e.g., 64 bits). Experimental results
Digital Object Identifier 10.1109/TVLSI.2023.3273085 showed that TBEC-RSC corrects 100% of triple burst errors
1063-8210 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
SILVA et al.: TRIPLE BURST ERROR CORRECTION BASED ON REGION SELECTION CODE 1215

TABLE I
B URST E RROR PATTERNS AVAILABLE FOR l R ANGING F ROM 2 TO 8

Fig. 1. MRSC structure—16-bit data and 32-bit codeword [10].

Li et al. [17] proposed a 3-bit BECC extendable to QUAEC,


which is based on error space satisfiability (ESS) and unique
syndrome satisfiability (USS). ESS estimates the number of
and more than 40% of 8-bit burst errors. The experiments syndromes required to reach an error correction rate, allowing
focused on the memories that store a data block per address; the authors to scale the ECC redundancy. USS is applied in
although the tested results might differ slightly, this same linear block codes to guarantee that the parity-check matrix H
technique can be applied to different memory structures. will produce unique syndrome values for each error pattern.
They considered two criteria to optimize the ECCs: 1) smallest
II. R ELATED W ORK hamming weight (SHW) of H , aiming to decrease the number
of logic gates for reducing the area and power costs; and
Most ECCs are for protecting memory data consider adja- 2) SHW of the heaviest row of H , aiming to decrease
cent and burst errors that depends on the amount of energy and the encoding and decoding delay. By applying those steps,
how the radiation strikes the cells. An adjacent error affects the authors produced two versions of the 3-bit BECC with
multiple cells in a near neighborhood of a memory block; QUAEC.
errors with up to one cell are considered adjacent [9]. A burst
error encompasses noncontinuous bitflips. Gracia-Morán et al.
III. F OUNDATIONS OF MRSC
[16] state that multiple errors span l bits (burst length) in a
word of continuous bits where, at least, the first and last bits MRSC is a low-cost 2-D ECC proposed in [10] that uses
are in error, meaning that a burst errors can affect adjacent or parity and check bits to encode 16-bit data in a 32-bit
nonadjacent cells. Therefore, adjacent is one possible pattern codeword. Fig. 1 shows the codeword structure of MRSC.
of a burst error. Equation (1) calculates the number of burst The MRSC codeword comprehends four sets of bits:
error patterns (NBPs) for l bits, and Table I presents some of 1) 16-bit data divided into A–D words of four bits each;
these patterns for l bits 2) four diagonal bits (Di1 –Di4 ); 3) four parity bits (P1 –P4 );
and 4) eight check bits (XA1,3 , XA2,4 , XB1,3 , XB2,4 , XC1,3 ,
NBP = 2(l−2) . (1) XC2,4 , XD1,3 , XD2,4 ).
On the one hand, codes focusing on adjacent error cor- Equations (2)–(5) display how the check bits (XA, XB, XC,
rection, such as [10], [11], [12], [13], [14], [15], reduce and XD) are calculated using XOR (⊕) operations among
the synthesis cost compared to other approaches that deal data columns indexed by v = {1, 2} and w = {3, 4}
with complex error patterns but are inefficacious in dealing (e.g., XA1,3 = A1 ⊕ A3 )
with burst errors. On the other hand, burst error correction XAv,w = Av ⊕ Aw (2)
codes (BECCs) like [16], [17], [18], [19], [20], [21] require
XBv,w = Bv ⊕ Bw (3)
more complex decoding algorithms that usually imply more
area and energy consumption. XCv,w = Cv ⊕ Cw (4)
BECCs are developed regarding radiation particle incidence XDv,w = Dv ⊕ Dw . (5)
and memory bit organization; thus, these ECCs have started
Equation (6) computes the Di bits using x = {1, 2, 3, 4}
to replace SEC-DED usage for critical applications. Several
and y = {2, 1, 4, 3} to select the data columns (e.g., Di1 = A1
BECCs apply a syndrome matchup table to identify each error
⊕ B2 ⊕ C1 ⊕ D2 ). Equation (7) calculates the parity bits P
type with a single syndrome value. Next, we clarify the BECC
according to the data columns indexed by x = {1, 2, 3, 4}
techniques presented in [16] and [17] to compare with the
(e.g., P1 = A1 ⊕ B1 ⊕ C1 ⊕ D1 )
TBEC-RSC proposals.
Gracia-Morán et al. [16] developed ECCs with high error Dix = A x ⊕ B y ⊕ C x ⊕ D y (6)
correction capability and low redundancy using the flexible Px = A x ⊕ Bx ⊕ C x ⊕ Dx . (7)
unequal error control (FUEC) methodology presented in [22].
They proposed the FUEC-double adjacent error corrector Fig. 2 illustrates that MRSC organizes the data bits splitting
(DAEC), FUEC-triple adjacent error corrector (TAEC), and the matrix into three regions (R1 –R3 ).
FUEC-quadruple adjacent error corrector (QUAEC). These MRSC estimates the matrix region with more errors and
codes use unique parity-check matrices to encode a 16-bit corrects it with check bits, performing three steps.
word, producing the FUEC-DAEC, FUEC-TAEC, and FUEC- 1) The syndromes of the redundancy bits are computed by
QUAEC, 23-, 24-, and 25-bit words, respectively. applying XOR operations between the original values of

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
1216 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 8, AUGUST 2023

Fig. 2. MRSC regions of data bits [10].


Fig. 4. TBEC-RSC codeword structure.
TABLE II
C RITERIA FOR S ELECTING MRSC(16,32) R EGION
which enhances the decoding algorithm to deal with burst
errors (adjacent and nonadjacent errors). Besides, TBEC-RSC
does not require the 2-D scheme implementation of MRSC
and provides 100% of 3-bit burst error correction.
TBEC-RSC uses a similar MRSC encoding algorithm, pre-
senting the modifications in the placement of the codeword
bits, computation of the redundancy bits, and decoding algo-
rithm. This section explains the alterations employed in MRSC
to create TBEC-RSC. We depicted some correction exam-
ples for comparing the MRSC with TBEC-RSC operations,
thus, clarifying the changes implemented and enlightening the
TBEC-RSC benefits.
The 2-D ECC format appliance has two drawbacks: 1) the
Fig. 3. SX matrix organizations to correct the data bits of the regions (a) R1 correction efficacy drops for nonadjacent errors, and 2) the
and R2 and (b) R3 .
implementation is hindered since the codeword is divided
into smaller words, requiring multiple memory addresses to
the redundancy bits (Di, P, and X ) and the recalculated replicate the codeword format. We changed the 2-D MRSC
bits after the occurrence of errors (RDi, RP, and RX) codeword to the 1-D TBEC-RSC format to mitigate these
drawbacks, as shown in Fig. 4. Regarding the 4 × 4 data
SDi = Di ⊕ RDi (8) bits matrix, every column was employed as part of the first
SP = P ⊕ RP (9) 16 bits of the codeword to keep the errors in the same region.
We adjusted the placement of the redundancy bits to maxi-
SX = X ⊕ RX. (10)
mize the correction efficacy. The equations used to encode the
2) Error correctionif any of the following conditions are Di and CB bits are the same as MRSC, although we modified
true: the equations of the P bits to extend the TBEC-RSC correction
a) diagonal syndrome (SDi) and parity syndrome (SP) efficacy. TBEC-RSC uses (11) and (12) to compute the P bits,
must have at least one value equal to one, and with x = {1, 3} and y = {2, 4} (e.g., P3 = A3 ⊕ A4 ⊕
b) two or more check bit syndromes (SX) are equal B3 ⊕ B4 )
to one.
Px = A x ⊕ A y ⊕ Bx ⊕ B y (11)
3) Error region selection according to the correction pro-
cess and Table II criteria (only 1 criterion is satisfied). Py = C x ⊕ C y ⊕ Dx ⊕ D y . (12)
The errors are corrected by performing XOR operations with These changes allow P bits to detect the nonadjacent burst
the SX bit matrix in the selected region. Fig. 3(a) and (b) errors of l = 3 (101 and 111). For instance, if this pattern
present the SX matrix organizations for correcting the R1 and occurs in bits A1 and C1 of the MRSC codeword, both SP
R2 regions, and R3 region, respectively. and SDi would not detect errors, since parity does not detect
The MRSC logic can fully correct up to 2-bit burst errors, double errors. The new P bits equation splits the errors into
although the code loses efficacy as the burst length increases, different equations, enabling SP to detect and correct the error
especially for nonadjacent error situations. Moreover, the region. The TBEC-RSC decoding algorithm adds a new error
MRSC correction efficacy depends on the 2-D scheme; thus, condition to cover a specific situation, i.e., the second decoding
the same structure must be applied in a memory device step presented in Section III—Error correction.
to achieve the same correction results, hindering the ECC The redundancy bits of the original MRSC only partially
implementation. cover a triple adjacent error pattern. For instance, following
the TBEC-RSC structure presented in Fig. 4, if a triple error
IV. T RIPLE B URST E RROR C ORRECTOR BASED ON occurs in bits P3 , XA13 , and XA24 , condition b of the second
R EGION S ELECTION C ODE (TBEC-RSC) A LGORITHM decoding step is true, implying an erroneous change in data
The 16-bit MRSC was previously evaluated for the adjacent bits. Thus, the decoder checks whether the special condition
errors, presenting about 70% correction efficiency for the triple occurs before analyzing conditions a and b, guaranteeing to
errors [11]. Inspired by MRSC, we proposed TBEC-RSC, correct all adjacent triple errors. This special condition is

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
SILVA et al.: TRIPLE BURST ERROR CORRECTION BASED ON REGION SELECTION CODE 1217

Fig. 5. 16-bit word example in TBEC-RSC and MRSC formats.

Fig. 7. MRSC not correcting a non-adjacent triple burst error.

Fig. 6. TBEC-RSC correcting a non-adjacent triple burst error.

verified by checking if SDi is null, SP has at least one value


equal to one, and SX has two or more values equal to one.
If this special condition returns false, the a and b decoding
Fig. 8. TBEC-RSC correcting an adjacent triple burst error.
conditions are checked to maintain the correction process.
The total correction capacity of MRSC requires storing
the codeword in four memory addresses, implying writing
and reading delay overhead. TBEC-RSC optimizes the logic
presented in MRSC and increases its correction efficacy, while
the entire codeword can be stored in a unique address.

A. Examples of Error Correction Comparing MRSC and


TBEC-RSC Efficacies
This section describes and discusses examples of triple
errors to highlight the TBEC-RSC improvements compared to
MRSC. Consider the 16-bit word “1110000111110000” being
applied in the examples, which, following the TBEC-RSC and
MRSC organizations described in Fig. 4, results in the data Fig. 9. MRSC not correcting an adjacent triple burst error.
distribution illustrated in Fig. 5.
Fig. 6 shows the codeword containing a nonadjacent burst
Fig. 9 presents the same error pattern in a similar situation
error of l = 3 from the bits B1 to D1 of the TBEC-RSC
for the MRSC codeword. This last case satisfies the error
codeword (surrounded by the red rectangle). Fig. 6 also shows
decoding conditions, allowing us to proceed with the correc-
the decoding results and the selected region.
tion algorithm, which results in a miss-correction.
Notice that R1 was selected since SP detected the burst
error. As a comparison, consider the 16-bit word encoded by
MRSC and its codeword with the same burst error pattern B. TBEC-RSC Scalability
corrected by TBEC-RSC. Fig. 7 displays the MRSC codeword The TBEC-RSC encoding divides the data bits into four
and decoding results, emphasizing that neither SDi nor SP groups. Let S be the encoded data size and n be the number
of MRSC detects errors in the data bits. Despite one of the of bits per group, then S = 2k ∀k ∈ N, k ≥ 4, and n = (S/4).
decoding conditions of MRSC being respected (SX with two This format allows TBEC-RSC to scale to the base-2 data
values equal to one), the algorithm cannot identify the error words larger than S = 16, e.g., 32 and 64 bits. Fig. 10 shows
region, resulting in a noncorrection. how the TBEC-RSC scales.
Fig. 8 depicts an adjacent triple error, requiring a unique A remarkable feature that is observed when extending
decoding condition to prevent TBEC-RSC miss-correction. TBEC-RSC is that the overall error correction rate of the code
Observe that the special condition proposed returns true, increases as the codeword grows. This increase occurs because
thus concluding the correction procedure. Since the data bits MRSC and TBEC-RSC correct errors concentrated in a region.
were not affected, the decoding returns the correct information. Increasing the data bits also raises the region size, meaning

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
1218 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 8, AUGUST 2023

Fig. 12. Pattern possibilities with a burst length of 2.

Fig. 10. TBEC-RSC scalability scheme.


Fig. 13. ECC correction efficacy according to the burst error length.

a 32-bit codeword, a burst length of 2 can occur in 31 places.


Notice that all ECCs were evaluated under the same error
patterns, although Table III presents the exact number of
pattern possibilities of a burst length for each ECC.
Fig. 11. TBEC-RSC regions examples for 16-bit and 32-bit formats. Fig. 13 presents the correction efficacy of all ECCs eval-
uated. The proposed TBEC-RSC corrected all 3-bit error
patterns. On the contrary, the MRSC correction efficacy is
that larger errors are more likely to be found in a single region. significantly reduced when nonadjacent errors, some illustrated
Fig. 11 presents the correcting regions for the 32-bit TBEC- in Section IV-A, were injected. TBEC-RSC has the higher
RSC version, which can be compared to the 16-bit version of correction rates from 5 to 8 burst errors; it is only surpassed
Fig. 2. in 4-bit scenario by FUEC-QUAEC, which corrects nearly
In resume, more extensive regions increase the chance of 15% more errors. Meanwhile, the compared codes preserved
errors being concentrated in a single region, raising the error the results shown in their primary literature, ensuring the
correction rate. Section V presents the scalability effect on the reliability of the simulation setup applied.
error correction rate. Note that, except for the MSRC, the compared codes
are linear, meaning that the redundancy bits produce unique
V. C ORRECTION E FFICACY E VALUATION syndrome values, enabling only to correct specific error sit-
This section compares the correction efficacy of TBEC- uations. Therefore, these codes exhibit a significant loss in
RSC, MRSC, and the other robust techniques presented in the correction performance as the error pattern becomes more
Section II, i.e., FUEC-DAEC, FUEC-TAEC, and FUEC- aggressive, correcting the errors that affect only the redun-
QUAEC [16], and TBEC-QUAEC with lower synthesis cost dancy bits. The TBEC-RSC correction efficacy is smoothly
[17]. We designed the ECCs in Verilog and performed an reduced due to the code region algorithm that makes a linear
experiment that applied burst error patterns in all codeword approach work like a matrix approach, enabling it to detect
positions of the ECCs, then checked whether the correction and correct multiple errors not covered by the other codes.
was successful. For instance, none of the compared codes corrected more than
Table III shows the total burst error possibilities per burst 20% of 5-bit burst errors, while TBEC-RSC corrected more
length for each ECC, considering the burst error patterns than 80% of 5-bit error patterns.
presented in Table I. Fig. 14 shows the error correction rate of TBEC-RSC ver-
The set of possible errors may differ for all ECCs since sions implemented with 16, 32, and 64 data bits. As expected,
the number of error possibilities depends on the size of the all versions fixed scenarios with three or fewer errors. Besides,
codeword. Besides, each burst length has a unique value of the 32- and 64-bit versions achieve a higher error correction rate
error possibilities. For instance, Fig. 12 shows that considering for a more significant number of errors, as there are more bits

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
SILVA et al.: TRIPLE BURST ERROR CORRECTION BASED ON REGION SELECTION CODE 1219

TABLE III
PATTERN P OSSIBILITIES W ITH B URST L ENGTH VARYING
F ROM 1 TO 8 B ITS

Fig. 14. ECC correction efficacy according to the burst error length—T-
BEC-RSC scaling analysis.

of redundancy and less chance of bits of data being affected


(Section IV-B explains this finding).

VI. R ELIABILITY AND MTTF E STIMATIONS


This section presents the reliability and mean time to fail-
ure (MTTF) evaluations regarding each MCU resulting from
SEE. We performed this experiment following the definitions
employed in works [11], [22], [29] and using the same
equations of [11] described next.
Let w and c be the data and redundancy bits of an ECC,
w + c be the codeword size, t be the time in days, and λ be the
error rate by day, then (13) calculates the probability of having
a codeword with errors at a time t. Besides, λ depends on
memory technology and the environment in which the device
was inserted Fig. 15. Reliability for 4096-word memory (λ = 10−5 upsets/bit/day).

pEM(t) = 1 − e−(w+c)λ t . (13)


Let i be the number of errors; then, (14) estimates the the R(t) curve over 4000 days with M = 4096. In addition,
probability of occurring i errors in a w + c codeword at time t Table IV displays the MTTF results according to λ (10−5 ,
10−4 , 10−3 upsets/bit/day) and M (4096, 8192, and 16 384
w+c
   i
p E i (t) = × 1 − e−λ t × e−λ (w+c−i)t . (14) codewords), therefore, enabling us to assess the effect of
i
variating both λ and M on the ECC reliability.
Equation (15) calculates the codeword reliability r (t) at On the one hand, reliability is directly proportional to
time t. pCMi is the correction probability for i errors extracted code efficacy, which implies the insertion of redundancy bits.
from the results presented in Fig. 13. Let Me be the max- On the other hand, reliability is inversely proportional to
imum number of Perrors that can arise (our experiments use the increase of the codeword bits. Consequently, there is a
Me = 8); then, i=1 Me
pE i (t) × pCMi computes the probabil- tradeoff while planning for efficacy versus redundancy to
ity of occurring errors on a codeword that a given ECC can achieve optimal reliability. The combined effect of this tradeoff
correct at time t. Moreover, 1 − pEM(t) is the probability of on the occurrence of errors over time produces the results
not having errors in a codeword over time t in Fig. 15 and Table IV, which reveal that TBEC-RSC has
Me the second-highest reliability among all the evaluated ECCs.
FUEC-QUAEC presented the highest reliability, with nearly
X
r (t) = 1 − pEM(t) + pE i (t) × pCMi . (15)
i=1
35% less MTTF than TBEC-RSC, mainly because it requires
fewer redundancy bits.
Let M be the number of codewords in the memory; then,
The MTTF analysis itself does consider the synthesis cost,
(16) calculates the memory device reliability R(t) through the
which is one of the main attributes of an ECC. The design
product of all codeword reliabilities at time t
of an ECC, especially for those focused on critical applica-
R(t) = r (t) M . (16) tions with low resources, must consider the overall cost that
the technique brings in encoding and decoding procedures.
Equation (17) estimates the MTTF of a memory device Section VII shows the estimative of synthesis cost and tradeoff
protected by an ECC through the R(t) integration over time evaluation.
Z ∞
MTTF = R(t)dt. (17)
0 VII. S YNTHESIS AND T RADEOFF E VALUATIONS
The reliability experiment was designed in MATLAB, con- This section presents the synthesis results of the analyzed
sidering memories with M × (w +c) format. Fig. 15 presents codes. Besides, we estimated through two different approaches

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
1220 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 8, AUGUST 2023

TABLE IV
MTTF R ESULTS FOR THE E VALUATED ECC S R EGARDING M = 4096,
8192, AND 16384 C ODEWORDS , AND λ = 10−5 ,
10−4 , 10−3 U PSETS /B IT /DAY

Fig. 16. CSC results per burst error length.

synthesis results to the original MRSC. At the same time,


TBEC-RSC presents the highest overall error correction rate
among all the evaluated codes. As explained in Section II, the
ECCs proposed in [16] and [17] employ a syndrome matchup
table; thus, as the code correction capacity rises, the amount of
TABLE V logic required increases, raising the area consumption, power
S YNTHESIS R ESULTS OF THE E NCODERS AND D ECODERS OF THE dissipation, and delay. Finally, we observed that this pattern
E VALUATED ECC S R EGARDING 28 AND 65 NM CMOS T ECHNOLOGY was maintained on both 28 and 65 nm.

B. Tradeoff Evaluation
Silva et al. [10], [11], and Argyrides et al. [22] presented the
equations that represent a tradeoff between error coverage and
synthesis cost. Meanwhile, the authors in [16] also considered
the redundancy bits percentual increase as a part of the tradeoff
metric. We evaluated the codes applying both metrics, and the
results are presented next.
1) Error Correction Coverage Per Synthesis Cost: This
experiment considers the synthesis cost (SC) represented by
(18), which reflects the parameters computed in Section VII-A
SC = Area × Power × Delay. (18)

their tradeoff between coverage and synthesis cost: 1) CSC The encoder circuit is negligible compared to the decoder
based on the metric presented in [10], [11], and [22]; and circuit; therefore, we only consider the decoder SC. Equa-
2) CSCR based on the metric presented in [16]. tion (19) describes the error correction coverage per synthesis
cost (CSC), a metric that correlates SC with the error correc-
tion coverage (CC) - the results presented in Fig. 13
A. Synthesis Results
The 16-bit ECCs were designed in Verilog and synthesized CC
CSC = . (19)
to 28 and 65 nm CMOS technology by Cadence Genus SC
Synthesis Solution. Table V presents the synthesis results of Fig. 16 depicts the CSC results of each of the ECC for
the encoders and decoders of the six evaluated ECCs, enabling the burst error scenarios regarding a 65 nm CMOS technol-
us to compare area, power, and delay according to CMOS ogy. We normalized the results for better visualization and
technology variation. comprehension.
All analyzed ECCs have simple encoding processes that do TBEC-RSC reaches the highest efficacy results across all
not require expressive logic; thus, the encoder circuit is negli- the error patterns. Since all the codes, except MRSC and
gible compared to the decoder circuit. The decoding method- TBEC-RSC, require matchup tables to correct errors, reducing
ology applied in TBEC-RSC is based on the MRSC algorithm correction power reduces the CSC results since they were the
that requires simple logic equations to detect and correct codes with the lowest synthesis results. The compared codes
the errors. Consequently, TBEC-RSC presented equivalent have a limited scope of correction and use a syndrome look-up

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
SILVA et al.: TRIPLE BURST ERROR CORRECTION BASED ON REGION SELECTION CODE 1221

Since RI is 1 (100%) for TBEC-RSC, its CSCR results for


the first two error scenarios are nearly 60% lower than FUEC-
DAEC; as for the other codes, the RI is below 1. However,
TBEC-RSC surpasses all ECCs for the remaining errors.
Even with lower redundancy, the synthesis costs of FUEC-
TAEC, FUEC-QUAEC, and TBEC-QUAEC still represents
a heavyweight in their results. Codes that use a matchup
table decoding algorithm could improve their error correction
efficacy by adding new values in the correction table. However,
this would require the addition of more conditions and logic,
which increases the synthesis cost, and, for this metric, the
evaluation could backfire.
MRSC and TBEC-RSC are not constrained by the matchup
logic, which also shows how efficient synthesis cost and
error coverage are for the RSA applied in these codes. This
efficiency enables them to maintain good CSC and CSCR
results throughout the entire experiment, even with the last
Fig. 17. CSCR results per burst length.
one pulling down the tradeoff due to these codes having a
higher number of redundancy bits.
TABLE VI
RI VALUES FOR E ACH ECC
VIII. C ONCLUSION
This article introduced TBEC-RSC, an ECC that extends
and enhances MRSC [11], [16]. TBEC-RSC has a new code-
word format, exchanging from a 2-D to a 1-D structure; minor
algorithm and bit positioning changes significantly increased
error correction efficacy without impacting the synthesis and
redundancy costs. Besides, our approach is extensible to higher
data bits. We assessed the efficacy of correcting adjacent and
table to correct errors; this reduces their efficiency per cost of nonadjacent errors with TBEC-RSC, MRSC, and some FUEC
errors not scoped to be corrected. For instance, FUEC-DAEC codes. TBEC-RSC presented the highest average error correc-
has their tradeoff results diminished as the number of errors tion rate and practically the same synthesis cost as MRSC,
grows, even though their cost is like TBEC-RSC and MRSC. the ECC with the lowest overhead of all analyzed. TBEC-
2) Error Correction Coverage Per Cost of Synthesis and RSC showed the second-best results regarding MTTF. Even
Redundancy: The CSC metric assesses only the ECC decoder having more redundancy bits than the other ECCs analyzed,
costs, not considering the costs of data and redundancy stored it achieved the best overall results regarding error coverage
in the memory. However, depending on the memory technol- per synthesis cost.
ogy, the area, and power costs can be orders of magnitude
higher than those attained with the decoder, which may lead R EFERENCES
the reader to wrong conclusions. To mitigate this problem,
[1] E. Ibe, H. Taniguchi, Y. Yahagi, K.-I. Shimbo, and T. Toba, “Impact
we proposed using the error correction coverage per cost of scaling on neutron-induced soft error in SRAMs from a 250 nm to
of synthesis and redundancy (CSCR) metric that considers a 22 nm design rule,” IEEE Trans. Electron Devices, vol. 57, no. 7,
the impact of redundancy on the memory. CSCR imposes a pp. 1527–1538, Jul. 2010.
[2] P. A. Ferreyra, C. A. Marques, R. T. Ferreyra, and J. P. Gaspar,
penalty on ECC proportional to the ratio between the number “Failure map functions and accelerated mean time to failure tests: New
of redundancy and data bits. Thus, regardless of technology approaches for improving the reliability estimation in systems exposed to
and memory size, ECCs with more redundancy are penalized single event upsets,” IEEE Trans. Nucl. Sci., vol. 52, no. 1, pp. 494–500,
Feb. 2005.
more. [3] M. Nicolaidis, Soft Errors in Modern Electronic Systems. Berlin, Ger-
The redundancy impact (RI) is calculated by (20), and many: Springer, Nov. 2012.
CSCR is computed by (21), which aggregates the effect of [4] R. Baumann, “Radiation-induced soft errors in advanced semiconductor
technologies,” IEEE Trans. Device Mater. Rel., vol. 5, pp. 301–316,
RI to CSC Sep. 2005.
Number of redundancy bits [5] V. Gherman, S. Evain, F. Auzanneau, and Y. Bonhomme, “Pro-
RI = (20) grammable extended SEC-DED codes for memory errors,” in Proc.
Number of data bits IEEE VLSI Test Symp. (VTS), May 2011, pp. 140–145.
CSC [6] R. Hentschke, F. Marques, F. Lima, L. Carro, A. Susin, and R. Reis,
CSCR = . (21) “Analyzing area and performance penalty of protecting different digital
RI modules with Hamming code and triple modular redundancy,” in Proc.
Fig. 17 shows the results achieved for each ECC considering 15th Symp. Integr. Circuits Syst. Design, 2002, pp. 95–100.
[7] A. J. Olazábal and J. P. Guerra, “Multiple cell upsets inside aircrafts.
all error scenarios tested and normalized. Table VI presents the New fault-tolerant architecture,” IEEE Trans. Aerosp. Electron. Syst.,
RI values for each analyzed ECC. vol. 55, no. 1, pp. 332–342, Feb. 2019.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.
1222 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 8, AUGUST 2023

[8] B. Varghese, S. Sreelal, P. Vinod, and A. R. Krishnan, “Multiple bit error Felipe Silva received the master’s degree in telein-
correction for high data rate aerospace applications,” in Proc. IEEE Conf. formatics engineering from the Federal University of
Inf. Commun. Technol., Apr. 2013, pp. 1086–1090. Ceará (UFC), Fortaleza, Brazil, in 2018, where he
[9] C. Ogden and M. Mascagni, “The impact of soft error event topography is currently working toward the Ph.D. degree.
on the reliability of computer memories,” IEEE Trans. Rel., vol. 4, His research interests are in the fields of error cor-
pp. 966–979, Dec. 2017. rection codes for embedded systems, fault-tolerant
[10] F. Silva, W. Freitas, J. Silveira, O. Lima, F. Vargas, and C. Marcon, systems, and real-time systems.
“An efficient, low-cost ECC approach for critical-application memories,”
in Proc. 30th Symp. Integr. Circuits Syst. Design (SBCCI), Aug. 2017,
pp. 198–203.
[11] F. Silva, W. Freitas, J. Silveira, C. Marcon, and F. Vargas, “Extended
matrix region selection code: An ECC for adjacent multiple cell
upset in memory arrays,” Microelectron. Rel., vol. 106, Mar. 2020,
Art. no. 113582.
[12] J. Li, P. Reviriego, L. Xiao, Z. Liu, L. Li, and A. Ullah, “Low delay
single error correction and double adjacent error correction (SEC-
DAEC) codes,” Microelectron. Rel., vol. 97, pp. 31–37, Jun. 2019.
[13] S. Choi, H. K. Ahn, B. K. Song, J. P. Kim, S. H. Kang, and S. Jung,
“A decoder for short BCH codes with high decoding efficiency and low
power for emerging memories,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 27, no. 2, pp. 387–397, Feb. 2019.
[14] D. Freitas, D. Mota, C. Marcon, J. Siveira, and J. Mota, “LPC: An error Alan Pinheiro received the M.Sc. degree in tele-
correction code for mitigating faults in 3D memories,” IEEE Trans. informatics engineering from the Federal University
Comput., vol. 70, no. 11, pp. 2001–2013. Nov. 2021. of Ceará (UFC), Fortaleza, Brazil, in 2019, where
[15] F. Garcia-Herrero, A. Sánchez-Macián, and J. A. Maestro, “Low delay he is currently working toward the Ph.D. degree in
non-binary error correction codes based on orthogonal Latin squares,” teleinformatics engineering.
Integration, vol. 76, pp. 55–60, Jan. 2021. His research interests are on-chip communication
[16] J. Gracia-Morán, L. J. Saiz-Adalid, D. Gil-Tomás, and P. J. Gil-Vicente, architectures, fault tolerance, embedded systems,
“Improving error correction codes for multiple-cell upsets in space and real-time systems.
applications,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26,
no. 10, pp. 2132–2142, Oct. 2018.
[17] J. Li, P. Reviriego, L. Xiao, C. Argyrides, and J. Li, “Extending 3-bit
burst error-correction codes with quadruple adjacent error correction,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 2,
pp. 221–229, Feb. 2018.
[18] J. Li, L. Xiao, J. Guo, and X. Cao, “Efficient implementations of multiple
bit burst error correction for memories,” in Proc. 14th IEEE Int. Conf.
Solid-State Integr. Circuit Technol. (ICSICT), Oct. 2018, pp. 1–3.
[19] J. Li, P. Reviriego, and L. Xiao, “Low delay 3-bit burst error correction
codes,” J. Electron. Test., vol. 35, no. 3, pp. 413–420, Jun. 2019.
[20] A. Das and N. A. Touba, “Online correction of hard errors and soft errors
via one-step decodable OLS codes for emerging last level caches,” in
Proc. IEEE Latin Amer. Test Symp. (LATS), Mar. 2019, pp. 1–6. Jarbas A. N. Silveira received the Ph.D. degree in
[21] A. Das and N. Touba, “A new class of single burst error correcting teleinformatics engineering from the Federal Univer-
codes with parallel decoding,” IEEE Trans. Comput., vol. 69, no. 2, sity of Ceará (UFC), Fortaleza, Brazil, in 2015.
pp. 253–260, Feb. 2020. He has been an Adjunct Professor with the Telein-
[22] C. Argyrides, H. R. Zarandi, and D. K. Pradhan, “Matrix codes: formatics Department, UFC, since 2009, where he is
Multiple bit upsets tolerant method for SRAM memories,” in Proc. 22nd with the Engineering Laboratory Computer Systems.
IEEE Int. Symp. Defect Fault-Tolerance VLSI Syst. (DFT), Sep. 2007, His research interests are embedded systems on
pp. 340–348. digital circuits, computer architecture, on-chip com-
[23] H. Farbeh, L. Delshadtehrani, H. Kim, and S. Kim, “ECC-United Cache: munication architectures, fault tolerance, and real-
Maximizing efficiency of error detection/correction codes in associative time systems.
cache memories,” IEEE Trans. Comput., vol. 70, no. 4, pp. 640–654,
Apr. 2021.
[24] D. Yoon and M. Erez, “Memory-Mapped ECC: Low-cost error protec-
tion for last-level caches,” ACM SIGARCH Comput. Archit. News, vol. 3,
pp. 116–127, Jun. 2009.
[25] S. Paul, F. Cai, X. Zhang, and S. Bhunia, “Reliability-driven ECC
allocation for multiple bit error resilience in processor cache,” IEEE
Trans. Comput., vol. 60, no. 1, pp. 20–34, Jan. 2011.
[26] J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. Hoe, “Multi-bit
error tolerant caches using two-dimensional error coding,” in Proc. 40th
Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO), Dec. 2007,
pp. 197–209.
[27] E. Cheshmikhani, H. Farbeh, and H. Asadi, “ROBIN: Incremental César Marcon (Senior Member, IEEE) received
oblique interleaved ECC for reliability improvement in STT-MRAM the Ph.D. degree in computer science from the
caches,” in Proc. 24th Asia South Pacific Design Autom. Conf., Federal University of Rio Grande do Sul (UFRGS),
Jan. 2019, pp. 173–178. Porto Alegre, Brazil, in 2005.
[28] S. G. Ghaemi, I. Ahmadpour, M. Ardebili, and H. Farbeh, “SMARTag: He has been a Professor at the School of Com-
Error correction in cache tag array by exploiting address locality,” puter Science, Pontifical Catholic University of Rio
in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2018, Grande do Sul (PUCRS), Porto Alegre, since 1995,
pp. 1658–1663. where he is an Advisor of Ph.D. graduate students at
[29] C. Argyrides, R. Chipana, F. Vargas, and D. K. Pradhan, “Reliability the Graduate Program in Computer Science. He has
analysis of H-Tree random access memories implemented with built in more than 150 papers published in prestigious jour-
current sensors and parity codes for multiple bit upset correction,” IEEE nals and conference proceedings.
Trans. Rel., vol. 60, no. 3, pp. 528–537, Sep. 2011. Dr. Marcon is a Brazilian Computer Society (SBC) Member.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on August 18,2023 at 08:40:28 UTC from IEEE Xplore. Restrictions apply.

You might also like