0% found this document useful (0 votes)
102 views17 pages

Fault Tolerance

Fault tolerance is the ability of a system to continue operating correctly even if some components fail. This is important for mission-critical applications like medical devices. There are several options to consider for making components fault tolerant, such as how critical, likely to fail, and expensive they are. Common fault tolerance techniques include redundancy, self-checking circuits, and reconfiguration. Redundancy uses extra hardware for detection and recovery. Self-checking circuits produce error signals. Reconfiguration locates and replaces faulty parts. The best technique balances factors like area, cost, power and speed for the application.

Uploaded by

cmohamedyousuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views17 pages

Fault Tolerance

Fault tolerance is the ability of a system to continue operating correctly even if some components fail. This is important for mission-critical applications like medical devices. There are several options to consider for making components fault tolerant, such as how critical, likely to fail, and expensive they are. Common fault tolerance techniques include redundancy, self-checking circuits, and reconfiguration. Redundancy uses extra hardware for detection and recovery. Self-checking circuits produce error signals. Reconfiguration locates and replaces faulty parts. The best technique balances factors like area, cost, power and speed for the application.

Uploaded by

cmohamedyousuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

A survey of Fault Tolerance in

FPGA
By C. Mohamed Yousuf
• Fault Tolerance
– Ability of system to continue error-free operation in
presence of unexpected fault
– to a design which is able to continue operation, possibly at
a reduced level, rather than failing completely, when some
part of the system fails

• Important in mission-critical & safety-critical


applications
– E.g., medical, aviation, banking, etc.
– Errors very costly
• A number of choices have to be examined to determine
which components should be fault-tolerant:

• How critical is the component? In a car, the radio is not


critical, so this component has less need for fault-tolerance.

• How likely is the component to fail? Some components,


like the drive shaft in a car, are not likely to fail, so no fault-
tolerance is needed.

• How expensive is it to make the component fault-


tolerant? Requiring a redundant car engine, for example,
would likely be too expensive both economically and in
terms of weight and space, to be considered.
Faults
• Permanent Faults(Hard faults)
– Due to manufacturing defects, early life failures, wearout
failures
– Wearout failures due to various mechanisms
• e.g., electromigration, hot carrier degradation, dielectric
breakdown, etc.
• Temporary Faults (Soft faults)
– Only present for short period of time
– Caused by external disturbance or marginal design
parameters
Fault Tolerance
• Redundancy
– Static Redundancy
– Dynamic Redundancy
– Hybrid redundancy
• Self Checking and Testing
• Reconfigurable Architecture
Redundancy
• Hardware redundancy
– Addition of replicated modules,
– and use of extra circuits for fault detection
• Information redundancy
– addition of extra information to data, to allow error
detection and correction, and self checking cricuits
– Self checking
• Only fault free circuit will produce a valid code word
m Functional k
Inputs Outputs
Logic
k

m Error
Check Bit Checker
Indication
Generator c
Write Data Word

Read Data Word

Generate Write
Data Word Check Bits
In Check
Bits Memory
Calculated
Check Bits

Read
Generate Check Bits

Data Word Correct Syndrome


Out
Data
• Software redundancy
– Programing
• Time redundancy
– Hardware- and information- redundancy requires
extra hardware. This could be avoided by doing
operations several times in the same module and
check the results, in stead of doing it in parallel on
several modules and compare the outputs. This
reduces the amount of hardware at the expense
of using additional time,
At circuit level- Red. used
• Duplication, where complementary logic
structures that produces the same responses
are compared
• Self checking
• Reconfigurable structures
– Location of faults within a replaceable units
– Reconfiguration of FPGA structure in accordance
with diagnostic data obtained
– 1. fully reconfigurable 2. partially reconfigurable
On the system level:
• Replication (TMR)
– Only detection, no recovery
– Majority 2 out of 3 voter is used to determine the correct
results
• Buit in Self Test (BIST)
On line BIST (during normal operation)
- Error correction and detection
Off line testing (function suspended)
- functional test
- based on the information of CUT, ensuring that,
the function of the logic behaves properly
- structural test
ensuring free from physical faults
Test Pattern Generation (TPG)

Circuitry Under Test


BIST
Control Unit
CUT

Test Response Analysis (TRA)


Fault Tolerant Merits Issues Applications
Technique

Redundancy Maximizes MTTF More area Satellites


Long Life applications More power Spacecraft
Easy to implement costly Implanted Biomedical

Self checking Circuits High speed Complex logic Reliable Real-Time


Reduced chip area Systems
Reduced power Satellites
Reduced cost Spacecraft
Problem if self checking circuit Implanted Biomedical
itself fails

Reconfiguration No extra hardware circuits Synchronization issues after Mainstream Low-Cost


Performance is improved reconfiguration Systems
Reduced area etc., If required, need further floor Consumer Electronics
planning Personal Computers
Conclusion
The parameter that needs to be consider for an
efficient Fault tolerant circuit are:
• Area overheads
• Cost effective
• Power consumption
• Speed
By taking considerations of advantages of
different methods and combining them, one can
design a fault tolerance circuits with the
parameters considered above.

You might also like