Fault Tolerant CAN Bus Control SystemImplemented Into FPGA
Fault Tolerant CAN Bus Control SystemImplemented Into FPGA
net/publication/261240834
CITATIONS READS
2 617
4 authors, including:
Zdeněk Kotásek
Brno University of Technology
109 PUBLICATIONS 469 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
State Synchronization for Run-Time Recovery of a Fault Tolerant Soft-Core Processor View project
All content following this page was uploaded by Karel Szurman on 07 January 2015.
Abstract—For various types of applications, it is necessary to copies of the original circuit and a majority voter. A single fault
guarantee maximal level of fault tolerance and high reliability in any of the redundant hardware modules will not produce an
of components, avionic and railway applications can serve as error at the output as the majority voter will select the correct
an example of these applications. In these devices, electronic result from the remaining two correctly working modules [5].
components are exhibited to the environment conditions, from
among them especially cosmic radiation can have an undesired The reliability of FPGA-based FT systems cannot be eval-
and destructive effect. In this paper, the basic ideas of the design uated by means benchmark programs or typical test method-
and implementation of CAN bus based control system into FPGA ologies only, but by observing the system behaviour after the
platform is described. The bus control system uses CANAerospace appearance of a SEU. The SEU simulation in the SRAM-based
application protocol. The fault tolerance features of the developed FPGA is very important for evaluating FTS designs properties
system are improved by TMR architecture. Then, experiments
with SEU injection into the FPGA configuration memory with
together with a verification of the correct behaviour of the
both non-TMR and TMR architectures are described, the results fault recovery. For these purposes, several SEU simulation
presented and evaluated. In these experiments, SEU injection techniques were developed [6],[7]. A bitstream SEU simulator
framework developed during our previous research was used was created in order to study the impact of upsets within
which injects SEU failures into running FPGA design. the configuration memory on FPGA designs and its properties
were presented in [8]. Radiation-induced upsets are simulated
by modification of the configuration memory contents. An
I. I NTRODUCTION important goal was the development of fault simulator which
The complexity of digital systems have a significant impact is able to insert SEUs into the configuration bitstream. In [9],
on reliability and diagnostic features of these systems. High the authors present a framework for the evaluation of fault-
reliability is important feature which is required in various tolerant designs implemented into SRAM-based FPGAs using
applications of electronic components. Very often digital sys- emulated SEUs. The SEU injection process is performed by
tems are implemented as a Fault Tolerant System (FTS) [1]. inserting emulated SEUs in the device using its configuration
Fault-tolerance (FT) is an important system property for many bitstream file.
applications, e.g. space industry, aerospace, automotive, etc [2].
II. M OTIVATION AND G OALS OF THE R ESEARCH
FPGA-based systems are becoming increasingly popular
for space-based applications due to their high-throughput ca- Our previous activities were oriented on creating a new
pabilities and relatively low cost. When faults are detected methodology of FT systems design into SRAM-based FPGA
in any part of the system implemented into FPGA then a platforms where the main principles of PDR were used as
possibility to reconfigure it and extend its lifetime exists. the recovery mechanism in situations when an SEU occurs in
For this purpose, the Partial Dynamic Reconfiguration (PDR) the implementation. The principles of methodology together
of FPGA circuit can be used [3]. SRAM-based FPGAs are with properties and experiments of several FT architectures
susceptible to radiation-induced Single Event Upsets (SEUs). were presented in [10]. A technique for design of highly
SEU causes the change in the state of a digital memory element dependable communication structure in SRAM-based FPGA
caused by an ionizing particle. As the ionizing particle passes was presented in [11]. The architecture of the multicore system
through the device, charge can be transferred from one node and the structure of fault tolerant bus with cache memories
to another. This charge transfer can lower the voltage of a were demonstrated. The next goal of our research was to
memory cell and change its internal state. SEU occurrence in develop an external SEU generator and verify its ability to
FPGA memory can be seen as a big problem for many digital insert SEU to the required position in the bitstream. This gives
systems. Therefore, many FT techniques have been proposed us the opportunity to test the behaviour of FT architectures
and tested for mitigating SEUs in systems implemented into and their reaction to SEUs. Developed SEU test platform
FPGAs. Most FT techniques use hardware redundancy to allows us to insert multiple SEUs in one run and simulate
reduce the probability of failure. Triple Modular Redundancy the occurrence of higher number of SEUs. The properties
(TMR) is a well known fault mitigation technique that uses of external SEU generator and experiments with SEU test
redundant hardware to tolerate faults causing by SEUs as platform were presented in [12]. The results gained from SEU
well [4]. A circuit protected by TMR has three redundant experiments together with investigated FT architectures can
be used to compute reliability parameters of dependability range of IDs. Further specification of the application protocol
models. and messages is in [15].
To summarize, so far we primarily concentrated on de-
veloping fault tolerant designs composed of functional units.
For these designs, a platform for the verification of their
ault tolerant qualities was developed by our team. We also
realize that buses and their control systems are integral parts of
digital systems. Therefore, we felt the need to design CAN bus
based control system (which uses CANAerospace application
protocol), implement it into FPGA, and use our test platform
for the verification of SEUs injection effects on the correct
function of the bus system. We worked with two versions of the
design TMR based design and non-TMR (i.e. those which just
cover the required function) under the assumption that we have
no clear information about the relation between the position of
Fig. 1. Format of CANAerospace message
particular bit in the bitstream and the function implemented in
FPGA. The goal of the research was to verify how successful
The architecture of the control system is composed of
the injection into both versions of the implementations is.
application and communication part. In the communication
Evidently, SEU injection into TMR based applications will be
part, CAN bus is controlled by CAN CTRL unit which uses
more successful than the injection into non-TMR application
the MCP2515 driver and SPI Master unit for communicating
because the size of the area in FPGA is greater. Anyway, the
with the CAN module and controlling the MCP2515 circuit
goal was to verify this hypothesis and gain precise data. The
by SPI instructions. The main function of the CAN CTRL
results of our targeted activities in the area of CAN bus control
unit is to read and write CAN frames, to interrupt handling
system design and the verification of its resilience against SEU
and provide configuration sequence for the MCP2515 cir-
attacks are presented in this paper.
cuit. The application part is formed by the CANAerospace
calc. It provides basic mathematical functions for distributed
III. CAN B US C ONTROL S YSTEM D ESIGN computing via the CAN bus. Each function is operating as
Implemented CAN Bus Control System allows to connect CANAerospace service in one defined channel. Except of
FPGA-based systems through CAN bus which is interfaced mathematical functions, the calc implements basic IDS service
on the small PCB (Printed Circuit Board) module by the SPI for its identification as is required by the CANAerospace
interface. This module consists mainly of CAN Bus transceiver protocol for each such application.
and CAN controller MCP2515 [13] with integrated SPI in-
terface. It implements CAN protocol version 2.0B [14] with A. TMR Implementation of CAN Bus Control
maximal communication speed 1MB/s. MCP2515 contains
For the experiments, the CAN bus control system was
2 buffers for received frames and 3 buffers for transmitted
implemented as TMR system to increase fault tolerant param-
frames. It also contains several filters and masks for control
eters, the architecture is shown in Fig 2. The control system
of receiving process. The MCP2515 circuit is controlled via
is included into calc unit. The unit can be replicated and all
the SPI interface due several instructions which can be used
its inputs can be interconnected. The units have the following
for reading or writing from/to registers and buffers. The SPI
input signals: SPI interface, interrupt, clock and asynchronous
interface supports 0/0 and 1/1 modes, it is able to communicate
reset. All the outputs of calc units are brought to majority voter
with maximal clock 10 MHz. Asynchronous events on the
inputs which selects correct inputs and propagates them to the
CAN bus are handled by interrupt system.
output.
Our CAN control system supports standard 11-bit ID of
transmitted CAN frames. CAN ID value is also used for
defining priorities. Frames with lower ID have higher priority
and are transferred preferentially. CAN frame contains 8B data
field without any information about its meaning. To increase
information value and the usability of CAN frame we applied
CANAerospace application protocol [15]. Protocol definition
is widely open to user-defined message types and protocol
implementations. CANAerospace message extends data field
in the CAN frame. Message is dived into header and data
part, its specific structure is shown in Fig. 1. Message header
contains Node ID for identification of transmitting or addressed
station, Data type for definition of message data format and Fig. 2. TMR architecture of CAN Bus Control System
size, Service ID for specification of used node service and
Message code for order identification while sequential trans- IV. SEU S I NJECTION AND I MPLEMENTATION D ETAILS
fer of messages. CANAerospace protocol uses CAN ID for
identification of 7 basic types of messages and their priority. The configuration memory of FPGA Virtex 5 is divided
Each type of message has allocated specific channel defined by into frames, each frame is the smallest addressable unit. It
means that every operation with the configuration memory 3) SEUs are randomly injected into the area where the
must be done with the complete frame, each frame being design CAN bus control system is implemented. SEUs will be
divided into two parts - upper and lower. Each frame has injected as long as the correct function is indicated, i. e. the
a predefined length of 1312 bits (41 words, each consisting CAN bus control system works.
of 32 bits). For addressing the contents of the configuration
4) Both the ordinary (non TMR) and TMR versions are
memory, Frame Address Register (FAR) is used. The FAR is
loaded into FPGA alternately. As a result, for both versions
subdivided into 5 fields: block type, one bit indicating upper
equal number of implementations is tested.
or lower part of the frame, row address, column address and
minor address. Block type identifies interconnect and block
configuration of the FPGA. The procedure of injecting SEUs
into the configuration memory consists of the following steps:
• Frame selection- the frame into which SEU will be injected
is selected.
• Reading the frame - the contents of the configuration
memory belonging to the selected frame is read out.
• SEU injection - in the contents of the selected frame, one or
more bits are modified, the position is determined randomly
or on the basis of some algorithm.
• Loading the frame back into the configuration frame -
the frame is loaded back into the configuration memory. It is
important to say that the correct operation was not corrupted
during these steps.
The SEU injected into the design can have various impacts
on the functionality of the design, some SEUs corrupt the
functionality, some do not.
Due to the fact that FPGA producers do not usually
provide users with the information about the internal structure Fig. 3. SEU injection into the CAN bus control system
of FPGA, the placement of our design in the configuration
memory is not a deterministic procedure (the approximate V. E XPERIMENTS AND R ESULTS
placement of CLBs in the configuration memory can be
gained by Xilinx PlanAhead tool). Another aspect is also very During experiments, SEUs were injected into both versions
important - as the complete frame must be read out from the of the control system, the previously described framework was
configuration memory, it becomes very convenient to place used for this purpose. The non-TMR and TMR implemen-
the complete design into the smallest possible area. For this tations were both placed by means of UCF constraints into
purpose, ”area constraint” limitation can be used, it must be the predetermined region in FPGA configuration memory. This
included into the User Constraints File (UCF) of the design. region contains 1600 LUTs. The non-TMR version was placed
Both of the implemented versions of the CAN bus control on 26% (412 LUTs) of the total block area while the TMR
system were placed into the same narrowed area. Therefore, all version requires 77% (1221 LUTs). Thus, it can be derived that
logic blocks must be grouped into one physical block, AREA the function of non-TMR version is attacked (when compared
GROUP limitation can be used for this purpose. The block with the TMR version) with every third injected SEU.
is then assigned the size and the position. The limitation for
TMR based design is done in the following way:
INST "calc_unit_tmr0/*" AREA_GROUP = "pblock_1";
INST "calc_unit_tmr1/*" AREA_GROUP = "pblock_1";
INST "calc_unit_tmr2/*" AREA_GROUP = "pblock_1";
INST "tmr_voter_i/*" AREA_GROUP = "pblock_1";
AREA_GROUP "pblock_1" RANGE=SLICE_X12Y80:SLICE_X31Y99;
AREA_GROUP "pblock_1" PLACE=CLOSED;
Verification of the fault tolerance qualities and the re-
silience against SEU injections of the CAN bus control system
was performed on the ML506 development board with Virtex5
component (see Fig. 3). The experiments consist of the follow-
ing steps:
1) To ML506, CAN module was connected, the module
Fig. 4. Histogram of error states of the TMR and non-TMR architecture
is controlled by the implemented system. The Virtex5 FPGA
was interfaced with PC through JTAG cable and through CAN
For each of the implementations, more than 500 results
bus which is interconnected with RS232/CAN converter.
were gained. SEUs were injected into the design as long as the
2) The SEU framework sends repeatedly the identification function was not corrupted in some way. The results for both
request to the application in FPGA, IDS service is used for non-TMR and TMR architectures are seen in the histogram in
this purpose. It is checked whether any response from the Fig.4. The histogram demonstrates how many different values
application is delivered. appeared on system output (caused by an SEU injection) in a
CAN Bus control and how often they occurred. The graph in ACKNOWLEDGMENT
Fig.5 reflects how to increase the probability that the FT system
This work was supported by the European Commis-
fails with the increasing number of SEUs occurrences for both
sion within the FP7 project ”Efficient Systems and Propul-
non-TMR and TMR architectures. The result demonstrates that
sion for Small Aircraft ”ESPOSA”, contract No. ACP1-GA-
starting with certain number of SEUs injected into the design,
2011-284859-ESPOSA; project National COST LD12036-
TMR architecture becomes less reliable than the non-TMR
”Methodologies for Fault Tolerant Systems Design Devel-
one.
opment, Implementation and Verification”; project Centrum
excelence IT4Innovations (ED1.1.00/02.0070) and grant FIT-
S-11-1.
R EFERENCES
[1] J. A. Cheatham, J. M. Emmert, and S. Baumgart, “A survey of fault
tolerant methodologies for fpgas,” ACM Trans. Des. Autom. Electron.
Syst., vol. 11, no. 2, pp. 501–533, 2006.
[2] L. Sterpone, M. Aguirre, J. Tombs, and H. Guzmán-Miranda, “On the
design of tunable fault tolerant circuits on sram-based fpgas for safety
critical applications,” in DATE ’08: Proceedings of the conference on
Design, automation and test in Europe. New York, NY, USA: ACM,
2008, pp. 336–341.
[3] B. Osterloh, H. Michalik, S. A. Habinc, and B. Fiethe, “Dynamic partial
reconfiguration in space applications,” Adaptive Hardware and Systems,
Fig. 5. Probability of the FPGA design failure according to increasing number NASA/ESA Conference on, vol. 0, pp. 336–343, 2009.
of SEUs
[4] C. Bolchini, A. Miele, and M. D. Santambrogio, “Tmr and partial
dynamic reconfiguration to mitigate seu faults in fpgas,” in DFT ’07:
Two reasons exist why the TMR based design can be Proceedings of the 22nd IEEE International Symposium on Defect
possibly attacked more often: 1) TMR based designs need a and Fault-Tolerance in VLSI Systems. Washington, DC, USA: IEEE
greater area than the non-TMR design; 2) TMR based design Computer Society, 2007, pp. 87–95.
contain voter which is not in our experiments protected against [5] R. Oliveira, A. Jagirdar, and T. J. Chakraborty, “A tmr scheme for seu
defects in any way. As soon as the voter is attacked, the design mitigation in scan flip-flops,” in ISQED ’07: Proceedings of the 8th
International Symposium on Quality Electronic Design. Washington,
is completely corrupted and does not work. DC, USA: IEEE Computer Society, 2007, pp. 905–910.
[6] P. Kenterlis, N. Kranitis, A. Paschalis, D. Gizopoulos, and M. Psarakis,
VI. C ONCLUSIONS AND F UTURE R ESEARCH “A low-cost seu fault emulation platform for sram-based fpgas,” in 12th
In this paper, the basic ideas of the design and implementa- IEEE International On-Line Testing Symposium (IOLTS’06). New
York, NY, USA: ACM, 2006, pp. 235–241.
tion of CAN bus based control system into FPGA platform was
[7] M. Rebaudengo, M. S. Reorda, and M. Violante, “Simulation-based
described. The bus control system uses CANAerospace appli- analysis of seu effects on sram-based fpgas,” in Proceedings of the
cation protocol. The fault tolerance features of the developed Reconfigurable Computing Is Going Mainstream, 12th International
system were improved by TMR architecture. The experiments Conference on Field-Programmable Logic and Applications, ser. FPL
with SEU injection into both non-TMR and TMR architectures ’02. London, UK, UK: Springer-Verlag, 2002, pp. 607–615.
were described. In the experiments, SEU injection framework [8] E. Johnson, M. Caffrey, P. Graham, N. Rollins, and M. Wirthlin,
developed during our previous research was used which in- “Accelerator validation of an fpga seu simulator,” Nuclear Science,
IEEE Transactions on, vol. 50, no. 6, pp. 2147 – 2157, 2003.
jects SEU failures into running FPGA design. Based on our
[9] G. Asadi, S. G. Miremadi, H. R. Zarandi, and A. Ejlali, “Evaluation
experiments it can be concluded that: of fault-tolerant designs implemented on sram-based fpgas,” in Pro-
1) the size of the implementation has a strong impact on ceedings of the 10th IEEE Pacific Rim International Symposium on
Dependable Computing (PRDC’04). Washington, DC, USA: IEEE
resilience against SEUs (which is a well known experience), Computer Society, 2004, pp. 327–332.
we gained absolute figures, [10] M. Straka, J. Kastil, and Z. Kotasek, “Fault tolerant structure for sram-
2) for the purposes of the verification of SEU injections based fpga via partial dynamic reconfiguration,” in 13th EUROMICRO
Conference on Digital System Design DSD 2010. Washington, DC,
into FPGA-based designs, it is extremely important to know USA: IEEE Computer Society, 2010.
the relation between the position of particular bit in the
[11] M. Straka, J. Kastil, J. Novotny, and Z. Kotasek, “Advanced fault
bitstream and the function implemented in FPGA, otherwise tolerant bus for multicore system implemented in fpga,” in Design and
SEU injection is less successful. Anyway, we see the research Diagnostics of Electronic Circuits Systems (DDECS), 2011 IEEE 14th
and its result presented in this paper as an important step International Symposium, april 2011, pp. 397 –398.
for our future activities for which we have the following [12] M. Straka, L. Miculka, J. Kastil, and Z. Kotasek, “Test platform for
intentions: fault tolerant systems design properties verification,” in 15th IEEE
- we shall use specialized software which is able to International Symposium on Design and Diagnostics of Electronic
release the relation between the position of particular Circuits and Systems. New York, NY, USA: IEEE Computer Society,
2012, pp. 1–6.
bit in the bitstream and the function implemented
[13] Microchip Technology Inc, “MCP2515 - Stand-Alone CAN Controller
in FPGA and thus increase the number of functions with SPI Interface,” November 2005.
which will be corrupted during SEU injection (as an [14] Robert Bosch GmbH, “CAN Specification 2.0,” BOSCH, Stuttgart,
example RapidSmith can serve), Technical specification, 1991.
- to equip the design with such techniques which will [15] Michael Stock, “CANAerospace - Interface specification for airborne
increase the fault tolerance qualities of components CAN applications V 1.7,” Stock Flight Systems, 82335 Berg/Farchach,
supporting the design (e.g. voter). Germany, Technical specification, 2006.