TR986 Full
TR986 Full
net/publication/46483980
CITATION READS
1 3,577
3 authors:
Linas Laibinis
Vilnius University
121 PUBLICATIONS 785 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Yuliya Prokhorova on 05 June 2014.
Integrating FMEA
Title of the Technical Report
into Event-B Development
of Safety-Critical Control
Systems
Yuliya Prokhorova
Åbo Akademi University, Department of Information Technologies
Joukahaisenkatu 3-5 A, FIN-20520 Turku, Finland
[email protected]
Elena Troubitsyna
Åbo Akademi University, Department of Information Technologies
Joukahaisenkatu 3-5 A, FIN-20520 Turku, Finland
[email protected]
Linas Laibinis
Åbo Akademi University, Department of Information Technologies
Joukahaisenkatu 3-5 A, FIN-20520 Turku, Finland
[email protected]
TUCS Laboratory
Distributed Systems Laboratory
1. Introduction
A widespread use of software for controlling critical applications necessitates
development of techniques for ensuring its correctness. In other words, these techniques
should guarantee that software behaves according to its specification. However, to
achieve a high degree of system dependability, we should address not only software
correctness but also ensure that safety requirements are adequately represented in a
software specification.
Safety [11] is property of the system requiring that it will not harm its environment
or users. It is a system-level property that can be achieved via a combination of various
techniques for safety analysis. The aim of safety analysis is to uncover possible ways in
which system might breach safety and then devise the means to avoid these situations or
mitigate their consequences. There is a wide spectrum of techniques that facilitate the
analysis of possible hazards associated with the system, the means for introducing fault
tolerance to prevent occurrence of dangerous faults, as well as the techniques for
deriving functional requirements from the conducted safety analysis.
In this paper we focus on the use of Failure Modes and Effect Analysis (FMEA) – a
widely-used inductive technique for safety analysis [4] and [11]. We propose a
methodology that allows us to incorporate the results of FMEA into a formal system
specification. FMEA aims at a systematic study of the causes of components faults,
their global and local effects, and the means to cope with these faults. Since the fault
tolerance mechanisms are often implemented as a part of the developed software, this
information constitutes the necessary requirements that the controlling software should
fulfil.
Since safety is a system-level property, it requires modelling techniques that are
scalable to analyse the entire system. Scalability in the system analysis is achieved via
abstraction, proof and decomposition. The Event-B formalism [1] provides a suitable
framework that satisfies all these requirements. Event-B is a state-based formalism for
development of highly-dependable systems. The main development technique of Event-
B is refinement. In Event-B, we start system modelling at a highly-abstract level and, by
a number of correctness-preserving transformations called refinement steps, arrive at a
system specification that is close to the eventual implementation. Correctness of each
refinement step is verified by proofs.
In this paper we show how to incorporate the results of FMEA into the formal Event-
B development. Our approach enables elicitation and traceability of the safety
requirements that thus potentially enhance system dependability. The proposed
methodology is illustrated by a small case study.
The paper is structured as follows. Section 2 gives an overview of the related work.
In Section 3 we briefly present the Event-B method and also describe modelling of
control systems in Event-B. In Section 4 we propose a methodology for integrating the
results of FMEA into the Event-B development. Section 5 illustrates the proposed
approach by a case study – a heater controller. In Section 6 we give concluding remarks
and discuss our future work.
1
2. Modelling Control Systems in Event-B
2
a result of the non-deterministic assignment, x gets any value from S or it obtains such a
value x′ that Q(v, x′) is satisfied.
The semantics of Event-B events is defined using before-after predicates [8]. A
before-after predicate describes a relationship between the system states before and after
execution of an event. The formal semantics provides us with a foundation for
establishing correctness of Event-B specifications. To verify correctness of a
specification, we need to prove that its initialization and all events preserve the
invariant.
To check consistency of an Event B machine, we should verify two types of
properties: event feasibility and invariant preservation. Formally, for any event e,
Inv(v) ∧ ge(v) ⇒ ∃v´ . BAe(v, v´)
Inv(v) ∧ ge(v) ∧ BAe(v, v´) ⇒ Inv(v´)
where Inv is the model invariant, ge is the guard of the event e and BAe is the before-
after predicate of the event e.
The main development methodology of Event B is refinement – the process of
transforming an abstract specification to gradually introduce implementation details
while preserving its correctness. Refinement allows us to reduce non-determinism
present in an abstract model as well as introduce new concrete variables and events. The
connection between the newly introduced variables and the abstract variables that they
replace is formally defined in the invariant of the refined model. For a refinement step
to be valid, every possible execution of the refined machine must correspond to some
execution of the abstract machine.
The consistency of Event B models as well as correctness of refinement steps should
be formally demonstrated by discharging proof obligations. The Rodin platform [3], a
tool supporting Event B, automatically generates the required proof obligations and
attempts to automatically prove them. Sometimes it requires user assistance by invoking
its interactive prover. However, in general the tool achieves high level of automation
(usually over 90%) in proving.
3
the variables modelling the sensors. In contrast, the controller reads the variables
modelling sensors and assigns the variables modelling the actuators (Fig. 1). We assume
that the reaction of the controller takes negligible amount of time so the controller can
react properly on changes of the plant state.
4
The operation (event) Environment is used for modelling the plant. The operation
Detection models error occurrence by non-deterministic assignment to the variable
System_Failure. The operation Error_Recovery aborts the system if system failure is
detected, i.e., the variable System_Failure equals TRUE. The operation Prediction is
used for modelling expected values of variables. Such a behaviour essentially represents
a failsafe system. The failsafe error recovery is performed by forcing the system
permanently to a safe though non-operational state (obviously, this strategy is only
appropriate where shutdown of system is possible). The routine control is specified by
the operation Normal_Operation.
In this paper we consider safety-critical control systems, therefore safety properties
(formalized as safety invariants) should be verified formally, starting from the abstract
specification. The safety invariants added to the abstract specification are shown below
The first one states that, while no failure occurred, the system is not stopped. The
second requires that, when system failure is detected, the system has to be stopped by
the controller.
3.1. A Methodology
The development of safety-critical systems starts by identifying possible hazards and
proceeds with accumulating the detailed description of them, containing also the
necessary means to cope with the identified hazards.
Our methodology based on incorporation of the FMEA results in an Event-B
specification of a control system, as it is shown in Fig. 3.
Each refinement step may introduce one or a few system components into our formal
specification. According to our methodology, this introduction consists of three steps.
We start by making FMEA, which results in a worksheet for each component. It allows
us to identify failure modes, possible causes, local and system effects. Then, as an
intermediate form, we build an Event-B counterpart worksheet for each component in
order to represent each FMEA table field in Event-B terms.
Finally, the obtained results are incorporated into the refined specification. Please,
note that system components can be introduced on different abstraction levels, which
5
means that an abstract component, once introduced, may be late refined, e.g., replaced
by several concrete ones.
6
Figure 5: FMEA worksheet for an actuator
For example, to represent the sensor in our example, we declare the following
variables (Fig. 6): Sensor_Value and Sensor_Fault. These variables are used in the
following events: Environment, Detected_Sensor_Fault, Detected_No_Fault.
The identified failure mode can be formally defined using the constants
Sensor_max_threshold and Sensor_min_threshold (added into the model context). They
are detected in the dedicated event: Detected_Sensor_Fault. The condition
corresponding to the failure mode is Sensor_Fault = TRUE.
7
In this paper we do not consider a situation when components faults can be recovered
without shutdown of the whole system. Therefore, any sensor or actuator fault lead to
system failure. In Event-B this is represented via the safety invariant
System_Failure = TRUE ⇔ Sensor_Fault = TRUE ∨ Actuator_Fault = TRUE.
In other words, when a sensor fault occurs, system has to be stopped. The special event
Error_Recovery models this situation.
8
As we described above, to detect the actuator fault, we have to compare the received
sensor value with predicted one. The corresponding detection events model the system
reaction when the guard Sensor_Value > next_s_value_max ∨ Sensor_Value <
next_s_value_min is true. The remedial action for the actuator is the same as for the
sensor (i.e., system shutdown).
4. Case Study
To illustrate the proposed methodology, we describe a failsafe control system, which
has a controller, a sensor and an actuator. In our case it is a heater case study. The
sensor is a temperature sensor and the actuator is a heater. The controller receives a
temperature value from the sensor and switches the heater to one of two possible states
(ON or OFF) depending on the given temperature range.
Following our methodology we analyse system components and their faults, build a
FMEA table and represent the FMEA table fields in Event-B terms, then proceed by
refining an abstract specification using the obtained results.
9
Temp_ Sensor_Fault and the variable Actuator_Fault is Heater_Fault in the renewed
case study. The variables and invariants of the refined specification are shown in Fig. 8.
In the refinement we also replace the variable System_Failure modelling error
occurrence by the variables representing faults of system components, i.e., Temp_
Sensor_Fault and Heater _Fault. It is an example of data refinement. This data
refinement expresses our modelling assumption that the system error occurs only when
one or several system components fail. The refinement relation defines the connection
between the newly introduced variables and the variables that they replace. While
refining the specifications, we add this refinement relation as an additional invariant of
the refined machine:
10
The operation Environment, which is shown in Fig. 9, is used for modelling the
plant (i.e., the environment) of the heater. The variable Temp_Sensor_Value is updated
non-deterministically to model possible value change of the temperature sensor.
11
The operation Detected_Actuator_Fault also refines the operation Detection. We
strengthened the operation guard by adding new guards according to the results of
FMEA, shown in Fig. 7. The non-deterministic assignment to the variable
System_Failure is replaced by the deterministic assignment of the variable
Heater_Fault. It becomes equal to TRUE. Detected_No_Fault is another refinement of
the operation Detection. However, the non-deterministic assignment to the variable
System_Failure is not replaced by any of two variables, because they are already equal
to FALSE.
After the execution of one of the detection events discussed above the system has
three ways to continue its execution. The first case is when the temperature sensor or the
heater faults occur and as a result the system has to be stopped. Thus, the operation
Error_recovery, which is identical to its abstract counterpart, becomes enabled. The
other two cases are when there is no fault and system is functioning in the normal mode
(Fig. 11). These two events differ from each other by their guards and respective
actions. In one case, if the temperature sensor value is less than the maximum value but
more or equal than the middle value, the variable Heater_Value is assigned OFF. In the
other case, if the temperature sensor value is more than the minimum value but less than
the middle value, the variable Heater_Value is assigned ON.
In the next section we will make our model more tolerant by introducing the triple
module redundancy (TMR) arrangement for our sensor.
12
4.2. TMR Implementation of the Temperature
Sensor
In the specification obtained at the previous refinement step all errors are considered to
be equally critical, i.e., leading to the shutdown. While introducing redundancy at our
next refinement step, we obtain a possibility to distinguish between criticality of errors
and mask a single error of a system component. Application of Triple Modular
Redundancy (TMR) [11] in that case allows us to mask faults of a single sensor. TMR is
a well-known mechanism based on static redundancy. The general principle is to
triplicate a system module and introduce the majority voting to obtain a single result of
the module, as shown in Fig. 12.
Fig. 13 shows the control system described in Section 3.2 with three temperature
sensors. In our case study we model the temperature sensors and a voter as parts of a
plant. The controller only receives the result of voting and does not see particular
sensors.
Figure13: The case study system with the temperature sensor TMR
13
Figure 14: The FMEA table for the temperature sensor TMR
The representation of FMEA results in Event-B is shown in Fig. 15. The temperature
sensor TMR is modelled by using the following variables: Temp_Sensor1_Value,
Temp_Sensor2_Value, Temp_Sensor3_Value and the variable Temp_Sensor_Fault,
which is equal to the variable Temp_Sensor_Fault in the previous refinement step, and
the events Environment1 and Environment2_1 … Environment2_5, which are
shown in Fig. 16. The last five events are used for modelling the TMR voter.
Figure 15: Event-B representation of FMEA for the temperature sensor TMR
The occurrence of three temperature sensors faults are introduced in the operation
Environment1 by non-deterministic assignment of the appropriate variables. When
14
new sensors values are assigned, the voter can make a decision by identifying the failed
sensor and taking the majority view. The operations Environment2_1,
Environment2_2 and Environment2_3 are similar. They have the guards checking
whether two temperature sensors values are equal. The actions in these events assign
one of the equal values to the variable Temp_Sensor_Value. The operation
Environment2_4 checks that all tree sensors have equal values, while its action assigns
one of values to the variable Temp_Sensor_Value. The operation Environment2_5
compares sensors values on non-equality and assigns the variable Temp_Sensor_Value
with the constant Sensor_Err_val the value of which is less than Sensor_min_threshold.
It means that, if there are more than one temperature sensor faults in the system, the
system has to be stopped.
In this paper, we applied the proposed methodology for the heater case study. The
resulting specification were proven to show that the final specification of the system
meets all safety requirements, in particularly, that system failure always leads to the
necessary error recovery actions.
15
5. Related Work
Integration of the safety analysis techniques with formal system modelling has
attracted a significant research attention over the last few years. There are a number of
approaches that aim at direct integration of the safety analysis techniques into formal
system development. For instance, the work of Ortmeier et al. [9] focuses on using
statecharts to formally represent the system behaviour. It aims at combining the results
of FMEA and FTA to model the system behaviour and reason about component failures
as well as overall system safety. Moreover, the approach specifically addresses formal
modelling of the system failure modes. In our approach we define general guidelines for
integrating results of FMEA into a formal Event-B specification and the Event-B
refinement process. The available automatic tool support for the top-down Event-B
modelling ensures better scalability of our approach.
In our previous work, we have proposed an approach to integrating safety analysis
into formal system development within the Action System formalism [10, 13]. Since
Event-B incorporates the ideas of Action Systems into the B Method, the current work
is a natural extension of our previous results.
The research conducted by Troubitsyna [12] aims at demonstrating how to use
statecharts as a middle ground between safety analysis and formal system specifications
in the B Method. In our future work we will rely on this research to define patterns for
formal representation of system components as formal specifications in Event-B.
Another strand of research aims at defining general guidelines for ensuring
dependability of software-intensive systems. For example, Hatebur and Heisel [5] have
derived patterns for representing dependability requirements and ensuring their
traceability in the system development. In our approach we rely on specific safety
analysis techniques rather than on the requirements analysis in general to derive
guidelines for modelling dependable systems.
6. Conclusions
In this paper we presented an approach to integrating the safety analysis techniques into
the formal system development in Event-B. We demonstrated how to derive safety
requirements from FMEA in such a way that they could be easily captured in a formal
system specification. Our methodology facilitates requirements elicitation as well as
supports traceability of safety requirements within the formal development process. The
proposed guidelines for modelling components in Event-B demonstrate how to relate
specific fields in FMEA work-sheets with the corresponding elements of an Event-B
specification. As a result, the proposed approach integrates the means for fault
avoidance and fault tolerance and hence can potentially enhance dependability of
safety-critical control systems.
In our future work we are planning to create a library of formal models representing
typical components (sensors and actuators), error detecting mechanisms and recovery
actions. Such a library would allow us to define the typical refinement transformations
supporting correct incorporation of the safety analysis results into a formal system
16
specification. Moreover, it also would enable automatization of the refinement process
to support such pre-defined model transformations. We aim at exploring this approach
within a certain dedicated domain of critical systems.
In this paper we focused on analysing the requirements originating from the
inductive safety techniques. However, safety analysis usually combines several different
techniques that allow the designers to explore different aspects of system safety. While
FMEA provides us with a systematic way to analyse the failure modes of components, it
is unable to address the analysis of multiple system failures. In our future work we aim
at investigating how to combine the FMEA approach with such techniques as fault tree
analysis to guarantee safety in the presence of several component failures.
References
[1] J.-R. Abrial, “Modeling in Event-B: System and Software Engineering”,
Cambridge University Press, 2010.
[2] J.-R. Abrial, “The B-Book: Assigning Programms to Meanings”, Cambridge
University Press, 1996.
[3] Event-B and the Rodin Platform. Retrieved from http://www.event-b.org/, 2010.
[4] FMEA Info Centre. Retrieved from http://www.fmeainfocentre.com/, 2009.
[5] D. Hatebur and M. Heisel, “A Foundation for Requirements Analysis of
Dependable Software”, Proceedings of the International Conference on
Computer Safety, Reliability and Security (SAFECOMP), Springer, 2009,
pp. 311-325.
[6] Industrial use of the B method. Retrieved from ClearSy:
http://www.clearsy.com/pdf/ClearSy-Industrial_Use_of_%20B.pdf, 2008.
[7] L. Laibinis, and E. Troubitsyna, “Refinement of fault tolerant control systems
in B”, TUCS Technical Report, No. 603, 2004.
[8] C. Métayer, J.-R. Abrial, and L. Voisin, “Rigorous Open Development
Environment for Complex Systems (RODIN). Event-B”. Retrieved from
http://rodin.cs.ncl.ac.uk/deliverables/D7.pdf, 2005.
[9] F. Ortmeier, M. Guedemann and W. Reif, “Formal Failure Models”,
Proceedings of the IFAC Workshop on Dependable Control of Discrete Systems
(DCDS 07), Elsevier, 2007.
[10] K. Sere, and E. Troubitsyna, “Safety analysis in formal specification”. In
J. Wing, J. Woodcock, & J. Davies (Ed.), FM’99 – Formal Methods.
Proceedings of World Congress on Formal Methods in the Development of
Computing Systems, Lecture Notes in Computer Science 1709, II, 1999,
pp. 1564-1583.
[11] N. Storey, “Safety-critical computer systems”, Addison-Wesley, 1996.
17
[12] E. Troubitsyna, “Elicitation and Specification of Safety Requirements”,
Proceedings of the Third International Conference on Systems (ICONS 2008),
2008, pp. 202-207.
[13] E. Troubitsyna, “Integrating Safety Analysis into Formal Specification of
Dependable Systems”, Proceedings of the International Parallel and Distributed
Processing Symposium (IPDPS’03), 2003, p. 215b.
18
ISBN 978-952-12-2476-8
ISSN 1239-1891