0% found this document useful (0 votes)
73 views11 pages

Design Techniques For Fault Tolerance

While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for F’PGA-based systems that comes at a reduced cost in terms of design time, volume, and weight. We partition the physical design into a set of tiles. In response to a component failure, we capitalize on the unique reconfiguration capabilities of F’PGAs and replace the affected tile with a functionally equivalent tile that does not rely on the faulty component,

Uploaded by

Noah Okitoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views11 pages

Design Techniques For Fault Tolerance

While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for F’PGA-based systems that comes at a reduced cost in terms of design time, volume, and weight. We partition the physical design into a set of tiles. In response to a component failure, we capitalize on the unique reconfiguration capabilities of F’PGAs and replace the affected tile with a functionally equivalent tile that does not rely on the faulty component,

Uploaded by

Noah Okitoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Fault Tolerance

Designing Fault-Tolerant
Techniques for SRAM-Based
FPGAs
Fernanda Gusmão de Lima Kastensmidt Gustavo Neuberger, Renato Fernandes Hentschke,
State University of Rio Grande do Sul Luigi Carro, and Ricardo Reis
Federal University of Rio Grande do Sul

Our high-level fault tolerance tech-


Editors’ note: nique combines time and hardware
FPGAs have become prevalent in critical applications in which transient faults redundancy to cope with upsets in SRAM-
can seriously affect the circuit’s operation. This article presents a fault
based FPGAs. This technique reduces the
tolerance technique for transient and permanent faults in SRAM-based
number of I/O pads, and therefore power
FPGAs. This technique combines duplication with comparison (DWC) and
dissipation, in the interface compared to
concurrent error detection (CED) to provide a highly reliable circuit while
maintaining hardware, pin, and power overheads far lower than with classic the well-known triple modular redun-
triple-modular-redundancy techniques. dancy (TMR) solution. Our goal is to
—Dimitris Gizopoulos, University of Piraeus; and reduce hardware overhead (which is
Yervant Zorian, Virage Logic three times more in TMR than the original
area of the unprotected design) to close
to twice the original area, maintaining the
ICS ARE SENSITIVE to upsets that occur in aero- same reliability and consequently reducing power dis-
space. More recently, ICs have also become sensitive to sipation. We’ve evaluated our technique in two types of
upsets at ground level because of the continual evolution circuits: multipliers and digital filters.
of fabrication technology for semiconductors. Drastic
device shrinkage, power supply reduction, and increas- Radiation effects on SRAM-based
ing operating speeds significantly reduce noise margins FPGAs
and thus reliability because of the internal noise sources A radiation environment contains various charged
that very deep-submicron ICs face.1 This trend is particles, generated by sun activity, that interact with sil-
approaching a point at which it will be infeasible to pro- icon atoms, exciting and ionizing the atomic electrons.3
duce ICs that are free from these effects. Consequently, At ground level, neutrons are the most frequent causes
fault tolerance is no longer a matter exclusively for aero- of upsets.4 When a single heavy ion strikes the silicon,
space designers; it’s important for the designers of next- it loses its energy through the production of free elec-
generation ground-level products as well. tron-hole pairs, resulting in a dense ionized track in the
FPGAs are popular for design solutions because they local region. Protons and neutrons can cause a nuclear
improve logic density and performance for many appli- reaction when passing through the material. The recoil
cations. SRAM-based FPGAs, in particular, are highly also produces ionization, generating a transient current
flexible because they are reprogrammable, allowing pulse that can cause an upset in the circuit.
onsite design changes. However, because the repro- A single particle can hit either the combinational or
grammability leads to a high logic density in terms of the sequential logic in the silicon.5 When a charged par-
SRAM memory cells, SRAM-based FPGAs are also sen- ticle strikes a memory cell’s sensitive nodes, such as a
sitive to radiation and require protection to work in drain in an off-state transistor, it generates a transient
harsh environments.2 current pulse that can mistakenly turn on the opposite

552 0740-7475/04/$20.00 © 2004 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Block RAM

Lookup
F1 table
Flip-flop M M M M
F2 M M
F3 M
F4
M
M

Soft-error upset Xilinx Virtex


(bit flip)

Figure 1. Bits sensitive to single-event upsets (SEUs) in the configurable-logic-block tile schematic. Inputs F1
through F4 are the four 1-bit input signals of the lookup table. M is the configuration memory cell.

transistor’s gate. The effect can invert the stored value— lookup tables (LUTs), flip-flops, and CLB configuration
that is, produce a bit flip in the memory cell. This effect cells, and interconnections, as Figure 1 shows. All these
is called a single-event upset (SEU) or soft error, and it’s configuration bits are potentially sensitive to SEUs;
a major concern in digital circuits. When a charged par- hence, we targeted them in our investigation.
ticle hits the combinational logic block, it also gener- In an ASIC, the effect of a particle hitting either the
ates a transient current pulse. This phenomenon is combinational or the sequential logic is transient; the
called a single-event transient (SET). only variation is how long the fault lasts. A fault in the
In FPGAs, an upset has a peculiar effect when it hits the combinational logic is a transient logic pulse in a node
combinational and sequential logic mapped into the pro- that can disappear according to the logic delay and
grammable architecture. For example, consider SRAM- topology. In other words, a storage cell might or might
based FPGAs such as those from Xilinx’s Virtex series, one not latch a transient fault from the combinational logic.
of the most popular series of programmable devices on Faults in the sequential logic manifest themselves as bit
the market. Virtex devices include a flexible, regular archi- flips, which remain in the storage cell until the next
tecture comprising an array of configurable logic blocks load. In an SRAM-based FPGA, customizable memory
(CLBs) surrounded by programmable I/O blocks, all inter- cells—SRAM cells (see Figure 1)—implement both the
connected by a hierarchy of fast and versatile routing user’s combinational and sequential logic. When an
resources.2 The CLBs provide the functional elements for upset occurs in the combinational logic synthesized in
constructing logic; the I/O blocks provide the interface the FPGA, it corresponds to a bit flip in one of the LUT’s
between the package pins and the CLBs. A general rout- cells or in the cells that control the routing. An upset in
ing matrix interconnects the CLBs. This matrix includes an LUT memory cell modifies the implemented combi-
an array of routing switches located at the intersections of national logic, as Figure 2a shows. This upset has a per-
horizontal and vertical routing channels. Virtex devices manent effect, and is correctable only at the next load
also dedicate 4,096-bit memory blocks called block-select of the configuration bitstream. This effect is similar to a
RAMs, clock delay-locked loops (DLLs) for clock-distrib- stuck-at fault at 1 or 0 in the combinational logic defined
ution delay compensation and clock domain control, and by that LUT. Thus, a storage cell latches the upset from
two tristate buffers associated with each CLB. the FPGA’s combinational logic, unless the FPGA uses
Users can quickly program a Virtex device by load- some detection technique. An upset in the routing can
ing a configuration bitstream (a collection of configu- connect or disconnect a wire in the matrix, as Figure 2b
ration bits) into it. They can change device functionality shows. It also has a permanent effect, which can travel
at any time by loading in a new bitstream. The bitstream to an open or a short circuit in the combinational logic
contains all the information to configure the program- implemented by the FPGA. The configuration bit-
mable storage elements in the matrix located in the stream’s next load corrects this fault.

November–December 2004
553
Fault Tolerance

A B C D not very susceptible to


neutrons, but now as tran-
Configurable Configurable
1 logic block logic block sistor size decreases and
1 logic density increases,
1 FPGAs are becoming more
0 vulnerable.
0
1 Fault tolerance in
0
SRAM-based
0
Configurable Configurable FPGAs
logic block logic block
There are two ways to
0
implement fault-tolerant
1
designs in SRAM-based
1
FPGAs. The first is to
1
design a new FPGA matrix
1
of fault-tolerant elements.
0
Configurable Configurable These new elements can
0 logic block logic block replace the old ones in the
1 same architecture topolo-
(a) (b) gy, or you could develop a
new architecture to im-
Figure 2. Example upsets in the SRAM-based FPGA architecture: an upset in the lookup prove robustness. Either
table because of logic modification (a), and an upset in the routing because of an way, the cost will be high,
undesirable connection (b). though it will vary accord-
ing to development time,
number of required engi-
When an upset occurs in the user sequential logic neers, and foundry technology. Another option is to pro-
synthesized in the FPGA, it has a transient effect tect the high-level description using redundancy,
because the CLB flip-flop’s next load corrects it. An targeting the FPGA architecture. You could use a com-
upset in the embedded block RAM has a permanent mercial FPGA to implement the design, and apply the
effect, and fault tolerance techniques must correct it. SEU mitigation technique to the design description
Engineers must apply these techniques in the architec- before synthesizing the redundant blocks in the FPGA.
tural or high-level description, because the bitstream’s This approach is far less expensive than the previous one
load can’t change the memory state without interrupt- because here users are responsible for protecting their
ing the application’s normal operation. It’s also possi- own designs; new chip development and fabrication are
ble to find, in the CLB, the SET upsets in the not necessary. Thus, the user can choose the fault toler-
combinational logic, such as input and output multi- ance technique, and consequently control the area, per-
plexers for routing control. (Rebaudengo, Reorda, and formance, and power dissipation overheads.
Violante also discuss the effects of upsets in the FPGA The high-level SEU mitigation technique used most
6
architecture. ) often today to protect designs synthesized in the Virtex
Radiation tests on Xilinx FPGAs show the effects of architecture is based mainly on TMR combined with
SEUs in the design application and prove the necessity scrubbing, which places a continuous load on the bit-
for using fault-tolerant techniques in aerospace applica- stream. The TMR mitigation scheme uses three identical
tions.7 A fault-tolerant system designed into SRAM-based logic circuits (redundant blocks 0, 1, and 2) synthesized
FPGAs must cope with the peculiarities just discussed: in the FPGA. These circuits perform the same task in tan-
transient and permanent effects of an SEU in the combi- dem, with a majority voter circuit comparing corre-
national logic, short and open circuits in the design con- sponding outputs. Details of the TMR technique for Virtex
nections, and bit flips in the flip-flops and memory cells. are available elsewhere,9 and Lima et al. present more
Ohlsson et al. also analyzed the effect of neutrons in an examples.10 The correct implementation of TMR circuitry
SRAM-based FPGA from Xilinx.8 At that time, FPGAs were in the Virtex architecture depends on the type of data

554 IEEE Design & Test of Computers


Sequential-logic flip-flops and majority voters

Input pads tr0 Check 0


Majority
tr0 voter
Clock 0

Majority tr1 Check 1


tr1 voter
Clock 1

Majority tr2 Check 2


tr2 voter
Clock 2
Output pads
Combinational logic with check voters

Figure 3. Triple modular redundancy for Xilinx FPGAs. The throughput logic is triplicated, represented by TMR
combinational modules tr0, tr1, and tr2. The registers are also triplicated and are voted on by majority voters; they
also have a mechanism to correct upsets in the multiplexers.

structure you need to mitigate. Logic falls into four dif- gy. Because full TMR generates every logic path in tripli-
ferent structure types: throughput, state machine, I/O, and cate, it’s necessary to bring these three logic paths back to
special features (select-RAM blocks, DLLs, and so on). a single path that doesn’t create a single point of failure.
Throughput logic is a logic module of any size or func- You can do this by placing TMR output voters inside the
tionality, synchronous or asynchronous, where all logic output logic block. Figure 3 illustrates the TMR technique.
paths flow from the module’s inputs to its outputs with- Scrubbing lets a system repair SEUs in the configu-
out forming a logic loop. In this case, all that’s necessary ration memory without disrupting operations. The
is to triplicate the logic, creating three redundant logic Virtex Select-MAP interface performs this scrubbing.
parts (0, 1, and 2). No voters are required, because the When an FPGA is in this mode, an external oscillator
FPGA output will be voted on later by default. State- generates a configuration clock that drives the pro-
machine logic is any structure where a registered output, grammable ROM (PROM) and the FPGA. At each clock
at any register stage in the module, feeds back into any cycle, new data is available on the PROM data pins. One
prior stage in the module, forming a registered logic loop. example is the Flash PROM XQR18V04, which provides
This structure is common in accumulators, counters, and a parallel frequency of up to 264 Mbps at 33 MHz. The
any custom state machine or state sequencer in which scrubbing cycle time depends on the configuration
each internal register’s state depends on its own previous clock frequency and the readback bitstream size.
state. In this case, it’s necessary to triplicate the logic and Previous results based on fault injection and radia-
to have majority voters in the outputs. To ensure that a tion ground testing show the Virtex TMR design tech-
register doesn’t lock on the wrong value, each redundant niques’ reliability.8,11 However, the TMR technique has
logic part in the feedback path has a voter so that the sys- limitations, such as high area overhead, three times
tem can recover itself. One LUT can easily implement the more input and output pins, and a significant increase
majority voter. For designs constrained by available logic in power dissipation. Many applications can accept
resources, you can implement the majority voter using these limitations, but some cannot.
Virtex tristate buffers rather than LUTs.
The primary purpose of using a TMR design method- Reducing TMR overheads by
ology is to remove all single points of failure from the combining hardware and time
design. Therefore, each redundant part that uses FPGA redundancy
inputs should have its own set of inputs. Thus, if an input To reduce the number of pins used by the TMR
suffers a failure, it affects only one of the redundant logic approach and to deal with permanent upset effects, we
parts. The outputs are the key to the overall TMR strate- present a new technique based on time and hardware

November–December 2004
555
Fault Tolerance

ity and safety compared to self-checking-based fault-tol-


B A A B
erant schemes.11 Their experiments indicate that the
higher the module’s complexity, the greater the differ-
ence in reliability between TMR and the self-checking
Combinational Combinational scheme. The self-checking fault-tolerant scheme is more
logic block (dr0) CED CED logic block (dr1)
reliable than TMR if it does not exceed the self-check-
ing overhead bound of 73%.
We extend the idea of using a self-checking fault-tol-
Fault-free Fault-free erant scheme to FPGAs. Our method combines dupli-
status = status cation with comparison (DWC) with a concurrent error
out0 out1 detection (CED) machine based on time redundancy
Fault detection that works as a self-checking block. DWC detects faults
in the system, and CED detects which blocks are fault
Figure 4. Duplication with comparison (DWC) combined with free. Figure 4 shows the general scheme. Two combi-
concurrent error detection (CED). national logic blocks run simultaneously in the DWC
technique: modules dr0 and dr1. A comparator in the
output can detect a mismatch and signal a fault detec-
redundancy to protect the user’s combinational logic. tion. If a mismatch occurs, the CED block evaluates
TMR still protects the sequential logic to avoid the accu- whether the logic is fault free by analyzing the combi-
mulation of faults, because scrubbing doesn’t change national logic’s properties.
the content of a user’s memory cell. Researchers have proposed many methods for using
Lubaszewski and Courtois discussed TMR’s reliabil- CED blocks based on time redundancy to detect per-

A B B A A B B A

Encode Encode Encode Encode Encode Encode Encode Encode

ST 1 0 1 0 ST ST 0 1 0 1 ST ST 1 0 1 0 ST ST 0 1 0 1 ST

dr0 dr1 dr0 dr1

Decode dr0 Clock 1 dr1 Decode Decode dr0 Clock 1 dr1 Decode
Clock 0 Clock 0
= = = = = =
tc0 Hc tc1 tc0 Hc tc1

Voter Voter

ST Enable_dr0 Enable_ dr1 ST_error ST Enable_dr0 Enable_dr1 ST_error

Cycle I Cycle II
(normal (detection
operation) operation)

t t

(a) (b)

Figure 5. DWC combined with CED technique for SRAM-based FPGAs: normal (a) and fault detection (b) operation.
ST is a state signal from the voter block that puts the system in detection operation (ST = 1); dr0 and dr1 are the
combinational logic blocks; Tc0 and Tc1 are time comparisons; and Hc is the hardware comparison. The voter block
also generates a state error signal (ST_error) and signals to enable the fault-free block (enable_dr0 and enable_dr1).

556 IEEE Design & Test of Computers


dr0 dr1
dr0 dr1
Enable_dr0

Pad
en0 en1 en2 dr0

tr0 Clock 0 tr1 Clock 1 tr2 Clock 2 + d Enable_dr0 2

tr1 tr2 tr0 tr2 tr0 tr1 Enable_dr1

dr1
Majority Majority Majority Pad
voter voter voter

trv0 trv1 trv2


(a) (b)

Figure 6. Example implementations when the combinational output is registered (a) or in the pad (b). Each
majority voter block receives the signal from the tr0, tr1, and tr2 registers. The enable_dr0 and enable_dr1 signals
decide which fault-free blocks should pass through the logic output to the registers or to the pads. To improve
reliability in routing, there are three enable signals (en1, en2, and en3), each a 1-bit signal with a logic value of 0
or 1. The outputs from the majority voter blocks are trv0, trv1, and trv2.

manent faults. These include bitwise inversion, recom- fault-free module, tr2 receives that module’s output, and
puting with shift operands (RESO), and recomputing continues receiving it until the next chip reconfigura-
with swapped operands (REWSO). We implement the tion (fault correction). By default, the circuit starts pass-
CED block using Patel and Fung’s RESO technique.12 ing the output of dr0 to tr2. For unregistered outputs, the
This RESO method includes encoding and decoding circuit can drive the signals directly to the next combi-
blocks and a register. national module or to the I/O pads, as Figure 6b shows.
During normal operation when time t0, dr0, and dr1 The important characteristic of our method is that it
are working simultaneously, the CED block stores the doesn’t incur a high performance penalty when the sys-
outputs in sample registers for further comparison, and tem has no faults or only a single fault. This method
the voter block continually compares the dr0 and dr1 out- needs only one clock cycle in a hold operation to detect
puts, as Figure 5a shows. If a mismatch occurs between the faulty module; then it operates normally again with-
these outputs, the output registers hold their original out performance penalties. The final clock period is the
value for an extra clock cycle, while the CED block’s original clock period plus the propagation delay of the
RESO detects the fault. During this second clock cycle, encoders, decoders, and output comparator.
the operands shift prior to use such that errors from per- The voter block contains comparators and a small
manent faults in the combinational logic are different in state machine to identify the operation’s fault-free state
the first calculation than in the second. Comparing the or to signal an error. Figure 7 shows this logic’s state dia-
results can identify these different errors, as Figure 5b gram. The state machine’s inputs are hardware com-
shows. The encoding blocks are simple multiplexers, parison Hc and time comparisons Tc0 and Tc1,
and the decoding blocks are simple connections. represented by the 2-bit signal, Tc. The state machine’s
For registered outputs, each output goes directly to outputs constitute a 4-bit vector (shown in Figure 7 after
the input of the user’s TMR register. Figure 6a shows the the slash) indicating the detection state (ST), the error
logic scheme. Block dr0 connects to TMR combina- state, enable_dr0, and enable_dr1. Signals enable_dr0
tional module tr0, and block dr1 connects to module tr1. and enable_dr1 are used for the unregistered outputs
While the circuit searches for faults, the user’s TMR reg- (Figure 6b); when the output is registered, only
ister holds its previous value. When the circuit finds the enable_dr0 is used (Figure 6a).

November–December 2004
557
Fault Tolerance

the correct voting, provided no upset


Tc = "00" / 1011
Hc = '0' / 0000 Tc = "11" / 1111 occurs in both redundant blocks.

Hc = '1' / 1011 Experimental results


To evaluate our technique’s fault cov-
erage, we chose two arithmetic-based cir-
Upset cuits: a multiplier and a canonical finite
No upset
Hc = '0' / 0000 detection
impulse response (FIR) digital filter. The
developed tools automatically generated
the multipliers and filters protected by
Hc = '0' / 0000
DWC-CED. We evaluated these case study
Tc = "10" / 0010
circuits in terms of fault coverage, area,
performance, and power dissipation.

Hc = '0' / 0000 Tc = "01" / 0001 Fault coverage


We developed a fault coverage test
dr1 dr0 system to evaluate the DWC-CED tech-
fault free fault free
nique’s robustness in the presence of
upsets. The system automatically
Hc = '1' / 0001 inserted structures to enable automatic
Hc = '1' / 0010
fault injection in high-level descrip-
tions, replacing all design nodes with
Figure 7. State diagram of the DWC-CED voter circuitry. Numbers after a one fault injection component, a 4-to-1
slash indicate the 4-bit vector outputs. Numbers with single quotation multiplexer, so that users can insert all
marks indicate Tc or Hc values. Double quotation marks indicate that if types of faults and as many as neces-
the input is that value, the output is the one after the corresponding slash; sary. If the multiplexer’s select signal is
if the input is another value (also indicated by double quotation marks), 00, the original signal goes to the out-
the output is the one after that corresponding slash. put; if the signal is 01, the output is a
constant 0 (stuck-at-0 emulation); if the
signal is 10, a constant 1 propagates
In both TMR and our method, scrubbing corrects (stuck-at-1 emulation).
upsets in the user’s combinational logic, and the CLB For the first case study, we chose an 8-bit multipli-
flip-flops’ TMR scheme corrects upsets in the user’s er, along with a 9-bit multiplier to apply the RESO
sequential logic. Scrubbing must be continuous to guar- technique without losses in the most significant bit.
antee that only one upset has occurred between two We implemented multipliers using cascaded full
reconfigurations in the design. Some constraints are adders. The 8-bit multiplier had 528 faulty nodes,
necessary for our method to function properly, just as 1,056 faults in total (stuck-at 0 or 1). The 9-bit multi-
with TMR. First, there must not be upsets in more than plier had 675 faulty nodes, 1,350 faults in total. In both
one redundant module, including the state machine’s cases, the two original operands had 8 bits, resulting
detection and voting circuit. Consequently, we must use in 2 16 (65,536) combinations of input vectors. We
assigned area constraints to reduce the probability of injected all combinations of faults and input vectors:
short circuits between dr0 and dr1. Second, the scrub- 69,206,016 for the 8-bit multiplier, and 88,473,600 for
bing rate should be fast enough to avoid the accumu- the 9-bit version.
lation of upsets in two different redundant blocks. We chose a canonical digital FIR filter circuit for our
Upsets in the detection and voting circuit don’t interfere second case study; the multipliers had constant coeffi-
with the system’s proper execution because the logic is cients, resulting in an optimized area and minimal faulty
already duplicated. In addition, upsets in this logic’s nodes. Our developed system automatically generated
latches are not critical, because they’re refreshed every a 9-tap, 8-bit FIR canonical filter. The multiplier coeffi-
clock cycle. Assuming a single upset occurs per chip cients were 2, 6, 17, 32, and 38. Because of the 8-bit
between scrubbing, it doesn’t matter if an upset alters input, there were 28 (256) combinations of input vectors

558 IEEE Design & Test of Computers


to test. The total number of faulty nodes
Table 1. Fault coverage of recomputing with shift operands (RESO) techniques in SRAM-
in the FIR filter, including all multipliers
based FPGAs.
and adders, was 4,208. We tested all pos-
sible combinations of input vectors and No. of No. of Detected
faults, a total of 1,077,248. Circuit injected faults detected faults faults (percent)
The system exhaustively injected the 8-bit multiplier 69,206,016 69,176,011 99.95
faults in all nodes of the test circuits for 9-bit multiplier 88,473,600 88,473,600 100.00
each input vector, sensitive node, and 8-bit FIR filter 1,077,248 1,077,248 100.00
redundant blocks mult_dr0 and mult_dr1.
The fault injection system operated with
two clocks, one to control the change of input vectors, Area, performance, and power dissipation
and the other to control the change of faults. A counter To check area, performance, and power dissipation,
controlled the total number of combinations of input our first test circuit was a 16-bit multiplier with a regis-
vectors and faults inserted in the circuit. We injected all ter in the output. We compared three implementations
possible combinations. of this circuit in the XCV300-PQ240 FPGA: no fault tol-
In all cycles, the voter block from the DWC-CED tech- erance, TMR, and our technique (DWC-CED for perma-
nique compared the outputs of modules dr0 and dr1. If nent faults using RESO). The application was to multiply
the outputs were equal (Hc = 0), then a fault occurring a set of input numbers for 2,000 ns, with the inputs
in one of the circuits did not generate an error in the changing every 100 ns. We evaluated each circuit’s
output. Therefore, for real-time operations, we could power dissipation using Xilinx’s XPower tool.
ignore this fault, and no detection operation was Table 2 shows the results in terms of area, perfor-
necessary. If a fault generated an error in the output mance, and power dissipation for these multipliers.
(Hc = 1), the voter compared the output of dr1 with the Using our DWC-CED method, we reduced not only the
recomputing circuit’s decoded output. If the outputs number of I/O pins but also the area. The prototype
were not equal (Tc1 = 1), the technique under test was board used a Virtex part with 240 I/O pins (166 available
able to detect the fault. The voter also compared the for the user). With TMR, we were unable to synthesize
output of dr0 to the recomputing circuit’s decoded out- the (16 × 16)-bit multiplier. However, implementing the
put. If the outputs were equal (Tc0 = 0), the technique same multiplier with our technique, we could fit it into
was able to detect a fault-free module. the chip and occupy less area.
A fault was undetected if there was a mismatch in In terms of performance, the standard multiplier
the output of dr0 and dr1 (Hc = 1), and the technique without fault tolerance had a maximum delay of 54 ns
could detect neither the faulty module (status Tc1 = 0) for the specific application, the TMR version had a delay
nor the fault-free module (status Tc0 = 1). An incre- of 56 ns, and our DWC-CED method had a delay of 62
mented counter shows the number of total undetected ns, representing an 11% degradation in performance.
faults. Reading this counter from the prototype board, Power dissipation was less in the DWC-CED than the
we calculated the percentage of undetected faults. The TMR technique, mainly because of differences in the
results in Table 1 show that all variations of RESO had logic, connections, and I/Os.
good results in terms of fault coverage for arithmetic- The second test circuit was an 11-tap, 9-bit, digital low-
based circuits. pass filter, shown in Figure 8. We multiplied the original

Table 2. Results for a 16-bit multiplier with a register in the output implemented in an XCV300-PQ240 FPGA.

Fault No. of
tolerance Maximum No. of four-input No. of Estimated power dissipation (mW)
technique delay (ns) I/O pads LUTs flip-flops Clock Nets Logic Inputs Outputs Total
None 54 67 495 32 7 88 186 2 29 312
TMR 56 201 1,709 96 22 305 718 7 88 1,140
DWC-CED 62 169 1,706 162 22 282 542 5 83 934

November–December 2004
559
Fault Tolerance

IN_tr0 R2_tr0 R3_tr0 R4_tr0 R5_tr0 R6_tr0 R7_tr0 R8_tr0 R9_tr0 R10_tr0 R11_tr0 Sequential
logic
Pads

C1_dr0 C2_dr0 C3_dr0 C4_dr0 C5_dr0 C6_dr0 C5_dr0 C4_dr0 C3_dr0 C2_dr0 C1_dr0
Combinational
X X X X X X X X X X X
logic
Pads
+ + + + + + + + + +
OUT_dr0

Figure 8. Digital, low-pass filter with 11 taps and 9 bits. The figure represents only one redundant block (dr0) out
of two for the combinational logic, and one redundant block (tr0) out of three for the sequential logic. IN_tr0 is the
input to TMR combinational module tr0; R2_tr0 through R11_tr0 are the registers of tr0; C1_dr0 through C6_dr0 are
constants of the filter. In these labels, dr0 indicates that DWC protects the combinational logic such that only dr0
and dr1 (not shown) are necessary, and OUT_dr0 is the output of dr0.

Table 3. Results for a digital, 11-tap, 9-bit FIR filter implemented in the XCV300-PQ240 FPGA.

Fault No. of
tolerance Maximum No. of four-input No. of Estimated power dissipation (mW)
technique* delay (ns) I/O pads LUTs flip-flops Clock Nets Logic Inputs Outputs Total
None 48 27 508 90 8 85 145 1 748 987
TMR 58 93 1,779 270 32 350 504 2 823 1,711
DWC-CED 63 75 1,738 308 25 324 530 2 19 900
* DWC-CED stores the output in registers, whereas the standard (no fault tolerance) technique and TMR do not.

coefficients calculated using Matlab (http://www.matlab. erance) approach because our technique uses fewer
com) by a constant of 512. The final multiplier coefficients input and output pins compared to TMR, uses less logic,
were 1, –1, –9, 6, 73, and 120. and stores the output in a register, whereas the standard
Table 3 compares the results in terms of area, per- approach has the combinational logic going directly to
formance, and power dissipation for this digital filter the output pads. The DWC-CED technique also saves
implemented with no fault tolerance, TMR, and our power because the output voter passes only one of the
DWC-CED technique. In this case, TMR also protected logic-registered outputs to the pads while the other one
the registers, whereas the DWC-CED using RESO pro- waits in the used one in case of a fault. TMR does not
tected the combinational logic (multipliers and register the outputs but rather votes on them in the out-
adders). The CED block resides at the outputs, where it put pads, consuming more power.
votes on the correct pad output from dr0 or dr1. Results
show that the FIR filter occupies a little bit less area in
the FPGA when DWC-CED rather than TMR protects it. WE’VE DISCUSSED only SEUs occurring in the SRAM pro-
The results also show that our method uses 19% fewer grammable cells that are permanent until the next recon-
pins than TMR. In terms of performance, TMR had a figuration. However, a circuit operating in outer space can
maximum delay of 58 ns for this test application, 20% suffer from a total ionization dose and other effects that
higher than the standard (no fault tolerance) approach. can provoke permanent physical damages in the circuit.
Our DWC-CED technique had a maximum delay of 63 We hope to explore these areas in the future. ■
ns (8% higher than TMR) for this application.
The DWC-CED technique’s power dissipation was References
considerably less than with TMR. But DWC-CED’s power 1. A.H. Johnston, “Scaling and Technology Issues for Soft
dissipation was also less than the standard (no fault tol- Error Rates,” Proc. 4th Ann. Research Conf. Reliability,

560 IEEE Design & Test of Computers


Stanford Univ., 2000; http://parts.jpl.nasa.gov/docs/ Based Reconfigurable Computing,” Proc. Int’l Conf. Mili-
Scal-00.pdf. tary and Aerospace Applications of Programmable Logic
2. “Virtex 2.5 V Field Programmable Gate Arrays,” DS003, Devices (MAPLD 02), NASA Office of Logic Design,
v2.5, Product Specification, 2 Apr. 2001, Xilinx; 2002, pp. 1-8.
http://direct.xilinx.com/bvdocs/publications/ds003.pdf. 8. M. Ohlsson et al., “Neutron Single Event Upsets in
3. J. Barth, C. Dyer, and E. Stassinopoulos, “Space, SRAM-Based FPGAs,” Proc. IEEE Nuclear Space
Atmospheric, and Terrestrial Radiation Environments,” Radiation Effects Conf. (NSREC 98), IEEE Press,
IEEE Trans. Nuclear Science, vol. 50, no. 3, June 2003, 1998, pp. 1-4; http://www.xilinx.com/appnotes/
pp. 466-482. FPGA_NSREC98.pdf.
4. E. Normand, “Single Event Upset at Ground Level,” 9. C. Carmichael, “Triple Module Redundancy Design
IEEE Trans. Nuclear Science, vol. 43, no. 6, Dec. 1996, Techniques for Virtex Series FPGA,” Xilinx Application
pp. 2742-2750. Notes 197, v1.0, Mar. 2001, p. 137;
5. D. Alexandrescu, L. Anghel, and M. Nicolaidis, “New http://www.xilinx.com/bvdocs/appnotes/xapp197.pdf.
Methods for Evaluating the Impact of Single Event Tran- 10. F. Lima et al., “A Fault Injection Analysis of Virtex FPGA
sients in VDSM ICs,” Proc. IEEE Int’l Symp. Defect and TMR Design Methodology,” Proc. European Conf. Radi-
Fault Tolerance in VLSI Systems, IEEE CS Press, 2002, ation and Its Effects on Components and Systems
pp. 99-107. (RADECS 01), IEEE CS Press, 2001, pp. 275-282.
6. M. Rebaudengo, M.S. Reorda, and M. Violante, “Simula- 11. M. Lubaszewski and B. Courtois, “A Reliable Fail-Safe
tion-Based Analysis of SEU Effects of SRAM-Based System,” IEEE Trans. Computers, vol. 47, no. 2, Feb.
FPGAs,” Proc. Int’l Workshop Field-Programmable Logic 1998, pp. 236-241.
and Applications, IEEE CS Press, 2002, pp. 607-615. 12. J. Patel and L. Fung, “Multiplier and Divider Arrays with
7. E. Fuller, M. Caffrey, and P. Blain, “Radiation Test Concurrent Error Detection,” Proc. Int’l Symp. Fault-Tol-
Results of the Virtex FPGA and ZBT SRAM for Space erant Computing, IEEE CS Press, 1982, pp. 325-329.

IEEE Design & Test Call for Papers


IEEE Design & Test, a bimonthly publication of the IEEE Computer Society and the IEEE Circuits and Systems Society, seeks original manuscripts
for publication. D&T publishes articles on current and near-future practice in the design and test of electronic-products hardware and supportive
software. Tutorials, how-to articles, and real-world case studies are also welcome. Readers include users, developers, and researchers concerned
with the design and test of chips, assemblies, and integrated systems. Topics of interest include

■ Analog and RF design, ■ IC design and test,


■ Board and system test, ■ Logic design and test,
■ Circuit testing, ■ Microprocessor chips,
■ Deep-submicron technology, ■ Power consumption,
■ Design verification and validation, ■ Reconfigurable systems,
■ Electronic design automation, ■ Systems on chips (SoCs),
■ Embedded systems, ■ VLSI; and
■ Fault diagnosis, ■ Related areas.
■ Hardware/software codesign,
To submit a manuscript to D&T, access Manuscript Central, http://cs-ieee.manuscriptcentral.com. Acceptable file formats include MS Word,
PDF, ASCII or plain text, and PostScript. Manuscripts should not exceed 5,000 words (with each average-size figure counting as 150 words toward
this limit), including references and biographies; this amounts to about 4,200 words of text and five figures. Manuscripts must be doubled-spaced,
on A4 or 8.5-by-11 inch pages, and type size must be at least 11 points. Please include all figures and tables, as well as a cover page with author
contact information (name, postal address, phone, fax, and e-mail address) and a 150-word abstract. Submitted manuscripts must not have been
previously published or currently submitted for publication elsewhere, and all manuscripts must be cleared for publication.
To ensure that articles maintain technical accuracy and reflect current practice, D&T places each manuscript in a peer-review process. At least
three reviewers, each with expertise on the given topic, will review your manuscript. Reviewers may recommend modifications or suggest additional
areas for discussion. Accepted articles will be edited for structure, style, clarity, and readability. Please read our author guidelines (including
important style information) at http://www.computer.org/dt/author.htm.

Submit your manuscript to IEEE Design & Test today!


D&T will strive to reach decisions on all manuscripts within six months of submission.

November–December 2004
561
Fault Tolerance

Fernanda Gusmão de Lima and SRAM memories. Neuberger has a BS in com-


Kastensmidt is a professor in the puter engineering from the Federal University of Rio
Department of Digital Systems Engi- Grande do Sul. He is a member of the ACM.
neering at the State University of Rio
Grande do Sul in Guaíba, Brazil, and Renato Fernandes Hentschke is
also an associate professor at the Institute of Infor- a PhD student at the Institute of Infor-
matics of the Federal University of Rio Grande do Sul matics of the Federal University of Rio
in Porto Alegre, Brazil. Her research interests include Grande do Sul. His research interests
VLSI testing and design, fault effects, fault-tolerant include design automation for physical
techniques, and programmable architectures. Kas- design; and algorithms for placement, routing, and
tensmidt has a BS in electrical engineering, and an MS congestion estimation. Hentschke has an MS and a BS
and a PhD in computer science and microelectronics, in computer science from the Federal University of Rio
all from the Federal University of Rio Grande do Sul. Grande do Sul. He is a member of the ACM.
She is a member of the IEEE.
Luigi Carro is a professor in the
Gustavo Neuberger is a PhD stu- Electrical Engineering Department
dent at the Institute of Informatics of and the graduate program at the Insti-
the Federal University of Rio Grande tute of Informatics of the Federal Uni-
do Sul. His research interests include versity of Rio Grande do Sul. His
fault tolerance, radiation effects, DFT, research interests include mixed-signal design, DSP,
mixed-signal and analog testing, and fast system pro-
totyping. Carro has a BSc in electrical engineering, an
MSc in computer science, and a PhD in computer sci-
ence, all from the Federal University of Rio Grande do
Sul. He is a member of the IEEE and the ACM.

JOIN A
Ricardo Reis is a professor at the
Institute of Informatics of the Federal
University of Rio Grande do Sul, and

THINK
the Latin America liaison for IEEE
Design & Test. His research interests
include VLSI design, CAD, physical design, design

TANK
methodologies, and fault-tolerant techniques. Reis has
a BSc in electrical engineering from the Federal Uni-
versity of Rio Grande do Sul, and a PhD in computer
science and microelectronics from the Institut Nation-

L
ooking for a community targeted to your
area of expertise? IEEE Computer Society al Polytechnique de Grenoble, France. He is a vice
Technical Committees explore a variety president of the International Federation for Informa-
of computing niches and provide forums for tion Processing and a member of the IEEE.
dialogue among peers. These groups influence
our standards development and offer leading
Direct questions and comments about this article
conferences in their fields.
to Fernanda Gusmão de Lima Kastensmidt, PO Box
Join a community that targets your discipline. 15064, Porto Alegre – RS – Brasil, 91501-970;
[email protected].
In our Technical Committees, you’re in good company.
For more information on this or any other computing topic,
www.computer.org/TCsignup/ visit our Digital Library at http://www.computer.org/
publications/dlib.

562 IEEE Design & Test of Computers

You might also like