Fault Injection Emulation for Systems in FPGAs: Tools, Techniques and Methodology, a Tutorial
Abstract
:1. Introduction
- Based on hardware fault injection.
- Based on software fault injection.
- Based on simulation fault injection.
- Based on emulation fault injection.
Graphical Index of the Paper Structure
2. Reliability Assurance for Communication System Modules
3. Background Concepts
- Controller is the element that generates the error campaign and computes the obtained results.
- Fault injector oversees the implementation of the fault injection defined by the controller.
- Design under test (DUT) is the target circuit to be studied in the presence of SEUs.
- Monitor is the message passing interface to trace all the system interactions.
3.1. Hardware-Based Fault Injection
3.2. Software-Based Fault Injection
- A timeout, where a timer expires launching the injection.
- A software trap where the control is transferred to the fault injector module.
- A code insertion alters the program instructions causing the fault injection.
3.3. Simulation-Based Fault Injection
- Those which modify the high-level description of the target design with a saboteur module, which is in charge of the fault injection process.
- Those which use the built-in commands of a simulator like “force”, to inject errors into the simulation of the design, not in the hardware description of the design itself.
3.4. Emulation-Based Fault Injection
4. Radiation Effects on SRAM-Based FPGAs
- Application layer: includes the logic and memory elements managed by the user’s design.
- Configuration layer: includes the logic and memory elements that allow the designer to configure the logic and routing resources in the application layer.
- SEUs induced in the Application layer are shown as transient errors that could alter the stored data or the state of the user logic memory elements such as Flip-Flops or BRAMs (ASIC failure model).
- SEUs affecting the Configuration layer produce persistent errors that can be reverted using a reconfiguration process. This kind of error consists in a bit flip which, in case of being an essential bit, may change the design functionality. This can have two consequences:
- ▪
- Change a routing element connection or disconnecting internal wires.
- ▪
- Change a logic element modifying the behavior of a LUT belonging in a CLB.
5. Fault-Tolerant Techniques for FPGAs Based on Reconfiguration
- Scrubbing is a technique used to correct and prevent errors in the information stored in memory. In FPGAs, scrubbing can be used to mitigate both persistent errors in SRAM cells (i.e., the configuration memory) and transient errors in user-memory elements such as BRAMs. To perform configuration memory scrubbing, the configuration memory data must be read sequentially from the start to the end and compared to the original configuration bitstream or an error check code such as a cyclic redundancy check (CRC) [43].
- Dynamic partial reconfiguration allows run-time reconfiguration without application layer interruption. This technique cannot detect errors by itself, so it must be combined with other error detection techniques such as those based on redundancy. These correction techniques take advantage of the subdivision of the configuration memory into frames, which contain information related to the configuration of specific parts of the design.
6. An Emulation Framework for Fault Injection
- The LogiCORE IP Soft Error Mitigation (SEM) Controller version 4.1 [44].
- A Nexys 4 DDR board based on the latest Artix-7™ FPGA from Xilinx [45].
- A design under test (DUT) implemented for FPGA, in order to measure its dependability in case of soft errors.
- A universal asynchronous receiver transmitter (UART) module to implement communication between both FPGA and host.
6.1. Soft Error Mitigation (SEM) IP Controller
- The clock input signal (clk) should be mapped to the global clock of the design.
- The Monitor Interface is an UART that serializes status information generated by the SEM IP controller for serial transmission between the host computer and the FPGA over the TX/RX lines:
- ▪
- Monitor RX signal receives as inputs the commands that SEM IP core interprets in order to perform functions like, for example, injections.
- ▪
- Monitor TX signal reports from the SEM IP core, the new states achieved among these: idle, initialization, observation, injection, correction, fatal error and classification.
6.2. Nexys 4 DDR Based on Artix-7 FPGA
- Peripheral module (Pmod) ports, for the serial communication tasks between the FPGA and the computer, supporting monitoring functions like send and receive data. The Pmod is an input/output interface board, developed by Digilent, that enables a simple connection between the FPGA and other standardized sockets from other external devices or even computers. Pmod ports avoid welding wires to the FPGA as they are convenient and easy to plug modules.
- FPGA configuration reset button, allows to reset the FPGA after each fault injection. Note that because of an emulated SEU which permanently alters the configuration layer, the FPGA requires to be reset. After these reset conditions, the FPGA must be configured again loading the original bitstream. For this purpose, a flash memory included in the board contains a copy of the stored design, in order to automatize the configuration after each fault is injected, avoiding a manually load through the Vivado tool.
6.3. Design for the Experimental Set-Up
6.4. Universal Asynchronous Receiver-Transmitter
- DUT composed of ROM, CIRCUIT 1, CIRCUIT 2 and the CHCK.
- UART for communication of the errors.
- SEM IP core included in Vivado.
7. Emulation Workflow Step by Step
“sem_0_sem_cfg, sem_0_sem_example, sem_0_sem_mon, sem_0_sem_mon_fifo, sem_0_sem_mon_piso, sem_0_sem_mon_sipo and “sem_0_sem_example.xcd”.
- Initialization (01): Once the configuration has been completed, the FPGA sends the global set/reset signal and the SEM IP controller starts. If the initialization process has been completed in the correct way, the controller moves to the observation state showing in the monitor the next (Figure 22):
- Observation (02): When the controller is in the observation state, status_observation variable is set, and the SEM IP controller watches the FPGA configuration looking for errors. In case of an error, the controller transits to the correction state to recover the original configuration automatically. If no error exists, when the SEM IP controller receives a command, it is executed. Both “enter idle” (moves to idle state) and “status report” commands are supported in the current state.
- Correction (04): When the controller is in the correction state, status_correction variable is set. If the SEM IP is setup for correction to repair or correction by enhanced repair, it tries to correct the error through algorithmic methods. If the error can be corrected, the SEM IP instance uses the partial reconfiguration feature in order to modify the affected frame with the good information and resets the status_uncorrectable variable. In case that the error cannot be corrected, the controller sets the status_uncorrectable variable. When this situation occurs, the FPGA must be reconfigured. Once correction is completed, the controller moves back to observation state.
- Idle (00): When the controller reaches this state, it is prepared to execute both error injection and software reset commands which are supported in this state.
- Injection (10): When the controller is in this state, the injections action can be performed. It happens when an error injection command is executed from the previous idle state. The emulation of the strike of one SEU into the configuration memory is achieved by flipping the bit which corresponds to the memory address provided in the error injection command. After each injection, the controller moves from the injection to the idle state automatically. At the end of error injection, the controller transits to the observation state (Figure 23).
- Band: 9600
- Settings: 8-N-1
- Flow Control: None
- Terminal Setup: VT100
- TX Newline: CR (Terminal transmits CR [0x0D] as end of line)
- RX Newline: CR+LF (Terminal receives CR [0x0D] as end of line, and expands to CR+LF [0x0D, 0x0A])
- Local Echo: NO
- command sends the controller to the observation state.
- I command sends the controller to the idle state.
- S command requests a status report.
- N command performs an error injection. This command is only supported in the idle state. The interface is: N {10-digit hex value}
- R command performs a software reset. This command is only supported in the idle state. The interface is: R {2-digit hex value}
8. ACME: Speeding Up the Injection Performance
- Model of board (Nexys 4 DDR for this tutorial).
- An EBD file that lists all the essential bits in the bitstream.
- Coordinates of the pBlock where the DUT is allocated into the FPGA. This information might be defined in the constraint file, through the routine:
9. Automating the Fault Injection Process
10. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Artola, L.; Velazco, R.; Hubert, G.; Duzellier, S.; Nuns, T.; Guerard, B.; Peronnard, P.; Mansour, W.; Pancher, F.; Bezerra, F. In Flight SEU/MCU Sensitivity of Commercial Nanometric SRAMs: Operational Estimations. IEEE Trans. Nucl. Sci. 2011, 58, 2644–2651. [Google Scholar] [CrossRef]
- Azambuja, J.R.; Nazar, G.; Rech, P.; Carro, L.; Kastensmidt, F.L.; Fairbanks, T.; Quinn, H. Evaluating Neutron Induced SEE in SRAM-Based FPGA Protected by Hardware- and Software-Based Fault Tolerant Techniques. IEEE Trans. Nucl. Sci. 2013, 60, 4243–4250. [Google Scholar] [CrossRef]
- Baumann, R. Soft Errors in Advanced Computer Systems. IEEE Des. Test Comput. 2005, 22, 258–266. [Google Scholar] [CrossRef]
- Irom, F.; Farmanesh, F.F.; Johnston, A.H.; Swift, G.M.; Millward, D.G. Single-event upset in commercial silicon-on-insulator PowerPC microprocessors. IEEE Trans. Nucl. Sci. 2002, 49, 3148–3155. [Google Scholar] [CrossRef]
- Velazco, R.; Bessot, D.; Duzellier, S.; Ecoffet, R.; Koga, R. Two CMOS memory cells suitable for the design of SEU-tolerant VLSI circuits. IEEE Trans. Nucl. Sci. 1994, 41, 2229–2234. [Google Scholar] [CrossRef]
- Díez-Acereda, V.; Khemchandani, S.L.; del Pino, J.; Mateos-Angulo, S. RHBD Techniques to Mitigate SEU and SET in CMOS Frequency Synthesizers. Electronics 2019, 8, 690. [Google Scholar] [CrossRef] [Green Version]
- Carmichael, C. Triple modular redundancy design techniques for virtex series FPGA. Presented at the Application Notes 197, San Jose, CA, USA, 6 July 2006. [Google Scholar]
- Reviriego, P.; Maestro, J.A.; Ruano, O. Efficient Protection Techniques Against SEUs for Adaptive Filters: An Echo Canceller Case Study. IEEE Trans. Nucl. Sci. 2008, 55, 1700–1707. [Google Scholar] [CrossRef]
- Reddy, A.; Banerjee, P. Algorithm-based fault detection for signal processing applications. IEEE Trans. Comput. 1990, 39, 1304–1308. [Google Scholar] [CrossRef]
- Ruano, O.; Maestro, J.A.; Reviriego, P. A Methodology for Automatic Insertion of Selective TMR in Digital Circuits Affected by SEUs. IEEE Trans. Nucl. Sci. 2009, 56, 2091–2102. [Google Scholar] [CrossRef]
- Ziade, H.; Ayoubi, R.; Velazco, R.; Idriss, T. A New Fault Injection Approach to Study the Impact of Bitflips in the Configuration of SRAM-Based FPGAs. Int. Arab J. Inf. Technol. 2011, 8, 155–162. [Google Scholar]
- Haissam, Z.; Rafic, A.; Velazco, R. A Survey on Fault Injection Techniques. Int. Arab J. Inf. Technol. 2004, 1, 171–186. [Google Scholar]
- Aranda, L.A.; Sanchez-Macian, A.; Maestro, J.A. ACME: A Tool to Improve Configuration Memory Fault Injection in SRAM-Based FPGAs. IEEE Access 2019, 7, 128153–128161. [Google Scholar] [CrossRef]
- Available online: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_fault_injection.pdf (accessed on 1 February 2021).
- Gao, Z.; Reviriego, P.; Pan, W.; Xu, Z.; Zhao, M.; Wang, J.; Maestro, J.A. Efficient Arithmetic-Residue-Based SEU-Tolerant FIR Filter Design. IEEE Trans. Circuits Syst. II Express Briefs 2013, 60, 497–501. [Google Scholar] [CrossRef]
- Gao, Z.; Reviriego, P.; Zhao, M.; Wang, J.; Maestro, J.A. Efficient Single Event Upset-Tolerant FIR Filter Design Based on Residue Number for OBP Satellite Communication Systems. China Commun. 2013, 10, 55–67. [Google Scholar]
- Reviriego, P.; Bleakley, C.J.; Maestro, J.A. Structural DMR: A Technique for Implementation of Soft-Error-Tolerant FIR Filters. IEEE Trans. Circuits Syst. II Express Briefs 2011, 58, 512–516. [Google Scholar] [CrossRef] [Green Version]
- Liu, S.-F.; Reviriego, P.; Maestro, J.A. Enhanced Implementations of Hamming Codes to Protect FIR Filters. IEEE Trans. Nucl. Sci. 2010, 57, 2112–2118. [Google Scholar] [CrossRef]
- Maestro, J.A.; Reviriego, P.; Reyes, P.; Ruano, O. Protection against soft errors in the space environment: A finite impulse response (FIR) filter case study. Integration 2009, 42, 128–136. [Google Scholar] [CrossRef]
- Reviriego, P.; Ruano, O.; Maestro, J.A. Implementing Concurrent Error Detection in Infinite-Impulse-Response Filters. IEEE Trans. Circuits Syst. II Express Briefs 2012, 59, 583–586. [Google Scholar] [CrossRef]
- Gil, P.; Blanc, S.; Serrano, J.J. Pin-Level Hardware Fault Injection Techniques. In Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation. Frontiers in Electronic Testing; Benso, A., Prinetto, P., Eds.; Springer: Boston, MA, USA, 2003; Volume 23. [Google Scholar]
- Madeira, H.; Rela, M.; Moreira, F.; Silva, J.G. RIFLE: A General Purpose Pin-level Fault Injector. In Proceedings of the 1st European Dependable Computing Conference (EDCC-1); Springer-Verlag: Berlin, Germany, 1994; pp. 199–216. [Google Scholar]
- Choi, G.S.; Iyer, R.K. FOCUS: An Experimental Environment for Fault Sensitivity Analysis. IEEE Trans Comput 1992, 41, 1515–1526. [Google Scholar] [CrossRef]
- Arlat, J. Validation de la Sûreté de Fonctionnement Par Injection de Fautes. Méthode Mise en Œuvre et Application. Ph.D. Thesis, LAAS-CNRS, Toulouse, France, December 1990. [Google Scholar]
- Martínez, R.J.; Gil, P.J.; Martín, G.; Pérez, C.; Serrano, J.J. Experimental Validation of High-Speed Fault-Tolerant Systems Using Physical Fault Injection. In Proceedings of the Dependable Computing for Critical Applications 7 (DCCA-7), San Jose, CA, USA, 6–8 January 1999; Volume 12, pp. 249–265. [Google Scholar]
- Gunnetlo, O.; Karlsson, J.; Tonn, J. Evaluation of Error Detection Schemes Using Fault Injection by Heavy-ion Radiation. In The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers; IEEE CS Press: Los Alamitos, CA, USA, 1989; pp. 340–347. [Google Scholar]
- Karlsson, J.; Arlat, J.; Leber, G. Application of Three Physical Fault Injection Techniques to the Experimental Assessment of the MARS Architecture. In Proceedings of the Fifth Annual IEEE International Working Conference on Dependable Computing for Critical Applications; IEEE CS Press: Los Alamitos, CA, USA, 1995; pp. 150–161. [Google Scholar]
- Kanawati, G.A.; Kanawati, N.A.; Abraham, J.A. FERRARI: A Tool for the Validation of System Dependability Properties. In Proceedings of the 22nd Annual International Symposium Fault-Tolerant Computing; IEEE CS Press: Los Alamitos, CA, USA, 1992; pp. 336–344. [Google Scholar]
- Dawson, S.; Jahanian, F.; Mitton, T. ORCHESTRA: A probing and fault injection environment for testing protocol implementations. In Proceedings of the IEEE International Computer Performance and Dependability Symposium, Urbana-Champaign, IL, USA, 4–6 September 1996; p. 56. [Google Scholar] [CrossRef]
- Stott, D.T.; Kalbarczyk, Z.; Iyer, R.K. Using NFTAPE for Rapid Development of Automated Fault Injection Experiments; Research Report; Center for Reliable and High-Performance Computing: University of Illinois at Urbana Champaign: Champaign, IL, USA, 1999. [Google Scholar]
- Segall, Z.; Vrsalovic, D.; Siewiorek, D.; Yaskin, D.; Kownacki, J.; Barton, J.; Dancey, R.; Robinson, A.; Lin, T. FIAT-Fault Injection Based Automated Testing Environment. In Proceedings of the 18th International Symposiumon Fault-Tolerant Computing (FTCS-18), Tokyo, Japan, 27–30 June 1988; pp. 102–107. [Google Scholar]
- Carreira, J.; Madeira, H.; Silva, J. Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers. IEEE Trans. Softw. Eng. 1998, 24, 125–136. [Google Scholar] [CrossRef] [Green Version]
- Ruano, O.; Maestro, J.A.; Reyes, P.; Reviriego, P. A Simulation Platform for the Study of Soft Errors on Signal Processing Circuits through Software Fault Injection. In Proceedings of the 2007 IEEE International Symposium on Industrial Electronics, Vigo, Spain, 4–7 June 2007; pp. 3316–3321. [Google Scholar] [CrossRef] [Green Version]
- Single Event Effects Simulation Tool. Available online: http://www.nebrija.es/aries/sst.htm (accessed on 1 February 2021).
- Jenn, E.; Arlat, J.; Rimén, M.; Ohlsson, J.; Karlsson, J. Fault Injection into VHDL Models: The MEFISTO Tool. In Predictably Dependable Computing Systems; ESPRIT Basic Research Series; Randell, B., Laprie, J.C., Kopetz, H., Littlewood, B., Eds.; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
- Sieh, V.; Tschache, O.; Balbach, F. VERIFY: Evaluation of reliability using VHDL-models with embedded fault descriptions. In Proceedings of the IEEE 27th International Symposium on Fault Tolerant Computing, Seattle, WA, USA, 24–27 June 1997; pp. 32–36. [Google Scholar] [CrossRef]
- Guzman-Miranda, H.; Tombs, J.N.; Aguirre, M.A. FT-UNSHADES-uP: A platform for the analysis and optimal hardening of embedded systems in radiation environments. In Proceedings of the 2008 IEEE International Symposium on Industrial Electronics, Cambridge, UK, 30 June–2 July 2008; pp. 2276–2281. [Google Scholar] [CrossRef]
- Alderighi, M.; Casini, F.; D’Angelo, S.; Mancini, M.; Codinachs, D.M.; Pastore, S.; Poivey, C.; Sechi, G.R.; Sorrenti, G.; Weigand, R. Experimental Validation of Fault Injection Analyses by the FLIPPER Tool. IEEE Trans. Nucl. Sci. 2010, 57, 2129–2134. [Google Scholar] [CrossRef]
- Cieslewski, G.; George, A. SPFFI: Simple, Portable SPFFI: Simple, Portable FPGA Fault Injector FPGA Fault Injector. Available online: https://www.academia.edu/637875/SPFFI_Simple_Portable_FPGA_Fault_Injector (accessed on 1 February 2021).
- Harward, N.A. Measuring Soft Error Sensitivity of FPGA Soft Processor Designs Using Fault Injection. Available online: https://scholarsarchive.byu.edu/etd/5699/ (accessed on 1 February 2021).
- Tonfat, J. Frame-Level Redundancy Scrubbing Technique for SRAM-Based FPGAs. Available online: https://www.lume.ufrgs.br/handle/10183/143194 (accessed on 1 February 2021).
- Asadi, G.; Tahoori, M.B. Soft Error Rate Estimation and Mitigation for SRAM-based FPGAs. In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, Monterey, CA, USA, 20–22 February 2005; pp. 149–160. [Google Scholar]
- Heiner, J.; Sellers, B.; Wirthlin, M.; Kalb, J. FPGA Partial Reconfiguration via Configuration Scrubbing. In Proceedings of the IEEE International Conference on Field Programmable Logic and Applications (FPL), Prague, Czech Republic, 31 August–2 September 2009; pp. 99–104. [Google Scholar]
- Xilinx. Soft Error Mitigation Controller v4.1 LogiCORE IP Product Guide. Available online: https://www.xilinx.com/support/documentation/ip_documentation/sem/v4_1/pg036_sem.pdf (accessed on 1 February 2021).
- DIGILENT. Nexys 4 DDR Reference Manual. Available online: https://reference.digilentinc.com/reference/programmable-logic/nexys-4-ddr/reference-manual (accessed on 1 February 2021).
- The ACME tool. Available online: http://www.nebrija.es/aries/acme.htm (accessed on 1 February 2021).
Layer | Element | SEU Consequence | |
---|---|---|---|
Configuration Layer | Routing | Muxes | Wrong input selection, open net, wrongly driven or left open |
PIP | Wrong connection o disconnection between nets | ||
Buffers | Output net wrongly driven or left open | ||
Logic | LUT | Wrong function inputs and outputs | |
Control bits | Wrong function inputs and outputs | ||
Tie Offs | Wrong function initialization | ||
Application layer | RAM Blocks | Wrong application data | |
CLB Flip-flops | Wrong application data or state |
Modules | Files |
---|---|
Design for the experimental set-up | rom.vhd |
circuit1.vhd | |
circuit 2.vhd | |
checker.vhd | |
UART | uart.vhd |
Pmod JA | Pmod JB | Pmod JC | Pmod JD | Pmod JXDAC |
---|---|---|---|---|
JA1: C17 | JB1: D14 | JC1: K1 | JD1: H4 | JXADC1: A13 (AD3P) |
JA2: D18 | JB2: F16 | JC2: F6 | JD2: H1 | JXADC2: A15 (AD10P) |
JA3: E18 | JB3: G16 | JC3: F2 | JD3: G1 | JXADC3: B16 (AD2P) |
JA4: G17 | JB4: H14 | JC4: G6 | JD4: G3 | JXADC4: B18 (AD11P) |
JA7: D17 | JB7: E16 | JC7: E7 | JD7: H2 | JXADC7: A14 (AD3N) |
JA8: E17 | JB8: F13 | JC8: J3 | JD8: G4 | JXADC8: A16 (AD10N) |
JA9: F18 | JB9: G13 | JC9: J4 | JD9: G2 | JXADC9: B17 (AD2N) |
JA10: G18 | JB10: H16 | JC10: E6 | JD10: F3 | JXADC10: A18 (AD11N) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ruano, Ó.; García-Herrero, F.; Aranda, L.A.; Sánchez-Macián, A.; Rodriguez, L.; Maestro, J.A. Fault Injection Emulation for Systems in FPGAs: Tools, Techniques and Methodology, a Tutorial. Sensors 2021, 21, 1392. https://doi.org/10.3390/s21041392
Ruano Ó, García-Herrero F, Aranda LA, Sánchez-Macián A, Rodriguez L, Maestro JA. Fault Injection Emulation for Systems in FPGAs: Tools, Techniques and Methodology, a Tutorial. Sensors. 2021; 21(4):1392. https://doi.org/10.3390/s21041392
Chicago/Turabian StyleRuano, Óscar, Francisco García-Herrero, Luis Alberto Aranda, Alfonso Sánchez-Macián, Laura Rodriguez, and Juan Antonio Maestro. 2021. "Fault Injection Emulation for Systems in FPGAs: Tools, Techniques and Methodology, a Tutorial" Sensors 21, no. 4: 1392. https://doi.org/10.3390/s21041392