100% found this document useful (1 vote)
90 views14 pages

Overview of IEC 61508 - Design of Electrical / Electronic / Programmable Electronic Safety-Related Systems

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 14

Overview of IEC 61508 - Design of electrical /

electronic / programmable electronic safety-related


systems
Simon Brown

The author is with the Health & Safety Executive, Magdalen House, Bootle,
Merseyside, L20 3 QZ, UK

This paper was originally published in the Computing & Control Engineering Journal,
vol. 11, no.11, February 2000 (Institution of Electrical Engineers, London)

This article reviews the principle requirements of IEC 61508 relating to the
specification and design of hardware and software in programmable electronic
systems intended for use in safety-related applications.

Introduction

The aim of the international standard IEC 615081 is to provide a route whereby
safety-related systems can be implemented using electrical or electronic or
programmable electronic technology in such a way that an acceptable level of
functional safety is achieved. The strategy of the standard is first to derive the safety
requirements of the safety-related system from a hazard & risk analysis and then to
design the safety-related system to meet those safety requirements taking into
account all possible causes of failure including random hardware faults, systematic
faults in both hardware and software and human factors.

This article reviews the concept of the safety lifecycle and the way in which the
safety requirements specification for an electrical / electronic / programmable
electronic system is developed. The methodology of IEC 61508 for the design of
hardware and software is described. In particular, the requirements of the standard
relating to quantified failure probability, hardware fault tolerance and avoidance /
control of systematic faults are explained.

The application of IEC 61508 will influence the requirements for subsystems (such
as sensors, programmable logic controllers or actuators) used in any part of a safety-
related system. The way in which such subsystems will need to be characterised, so
that compliance with IEC 61508 can be claimed, is discussed.

Scope of the standard

In principle, the standard can be applied to the implementation and operation


(including maintenance) of any safety-related control or protection system based on
electrical / electronic / programmable electronic (E/E/PE) technology (the emphasis
being on the more complex systems using computer-based technology). In broad
terms a system can be considered as ‘safety-related’ if a failure of the system to

23/07/01
IEESBrown.DOC
Ÿ Process plant emergency shut-down system
Ÿ Fire & gas detection system
Ÿ Machinery guard / access interlocking system
Ÿ Machinery emergency stop
Ÿ Crane automatic safe load indicator
Ÿ Railway signalling
Ÿ Steam boiler controls
Ÿ Fairground roller-coaster control system

Table 1 Some examples of safety-related systems

function correctly can lead to a situation where a person is exposed to a hazard (a


potential source of harm). Table 1 shows a number of systems which could be
classified as ‘safety-related’ on this basis.

IEC 61508 can be applied both to systems which operate ‘on demand’ (usually due
to some fault) as well as those which are required to operate continuously to
maintain a safe state. An example of a demand mode system would be an
emergency shut-down system on a chemical process plant which operates valves on
the plant to move the process to a safe state in the event of the pressure in a vessel
exceeding some limit. Demand mode systems are sometimes referred to as
‘protection systems’ because they act to protect against hazardous situations.

An example of a safety-related system which operates in continuous mode is a


motor drive control system on a paper making machine where it is necessary to
maintain the rotation speed of the paper feed at a slow crawl speed whilst the
machine operators work close to the moving rollers during maintenance activities.

The aim is to address all the possible causes of dangerous failures. Such failures
could arise due to faults in hardware, software in any part of the safety-related
system or from human error. Further, faults can be introduced at any stage of the
lifecycle of a system, from its initial concept, through design, installation and
operation to eventual decommissioning.

The scope and boundary of the system to which the standard is applied are entirely
within the hands of those who wish to claim compliance with the standard. Therefore,
a very important first activity is to clearly define the system boundaries. This leads to
a clear view as to which hazards should be considered during the later stages of the
safety lifecycle.

It should be noted that whilst IEC 61508 recognises that it is of primary importance to
eliminate hazards at source, the principles of inherent safety are outside the scope of
IEC 61508.

23/07/01
IEESBrown.DOC
Safety lifecycle

IEC 61508 uses the ‘safety lifecycle’ as a framework to structure its own
requirements and it is a basic requirement of the standard that a similar (though not
necessarily identical) lifecycle is used to structure the activities relating to the
specification, design, integration, operation, maintenance and eventual
decommissioning of an E/E/PE safety-related system. The essence is that all
activities relating to functional safety are managed in a planned and methodical way,
with each phase having defined inputs and outputs. This enables a process of
verification whereby a check is made at the conclusion of each phase to confirm that
the required outputs, have in fact been produced as planned. The ability to check (or
validate) that verification has been properly implemented throughout the safety
lifecycle is one of the foundations of functional safety. The premise is that such a
structured approach will minimise the number of systematic faults which are ‘built-in’
to the safety-related system. This is particularly important for programmable
systems because it cannot be assumed that testing alone will reveal potentially
dangerous faults.

Figure 1 shows the Overall Safety Lifecycle. The use of the term ‘overall’ reflects the
fact that it is necessary to develop the safety requirements for the E/E/PE safety-
related systems taking into account the contributions to safety which may result from
the use of other technology safety-related systems (such as pressure relief valves or
mechanical interlocks) as well as from external risk reduction facilities (such as fire
walls and bunds).

The design and integration of all the necessary safety related systems and risk
reduction facilities comes within within the realisation phase of the Overall Safety
Lifecycle. However, IEC 61508 only addresses in detail the realisation of safety-
related systems based on E/E/PE technology. It is during the realisation phase that
the hardware & software of the E/E/PE safety-related system(s) is designed and
integrated to meet the safety requirements.

Safety functions and Safety integrity levels (SILs)

Essentially, a safety function is an action which is required to ensure that the risk
associated with a particular hazard is tolerable. A safety function is specified in
terms of its functionality (the action required) and its safety integrity (the required
probability that the specified action will be carried out in order to achieve the required
risk reduction). An accurate specification of the safety functions in terms of
functionality and safety integrity is a corner-stone of IEC 61508. The specification for
a safety function is derived taking into account the nature of the hazard, and the risks
(in terms of likelihood and consequence) which the hazard presents in the absence
of the safety function. It is also necessary to form a view as to what is the tolerable
risk associated with each hazard. In the UK, in order to meet safety legislation, the
need for, and the required extent of, risk reduction will need to be assessed taking
into account the “ALARP” principle 2.

This assessment is undertaken for each hazard which falls within the defined system
boundaries. The result is a set of safety functions which together is called the

23/07/01
IEESBrown.DOC
“Overall Safety Requirements Specification”. This process is illustrated in Figure 2
for an example where there are 3 hazards (H1,H2,H3) within the system boundary,
with each hazard having an associated unacceptable risk (R1,R2,R3) which is
reduced to a tolerable level by the action of a safety function (SF1,SF2,SF3). The
Overall Safety Requirements Specification in this example consists of the Safety
Functions Requirements and Safety Integrity Requirements for each of the safety
functions, SF1, SF2 and SF3.

Allocation of safety requirements

The next stage is to decide how each of the safety functions is going to be
implemented, in terms of the type of safety-related system technology or external
risk reduction facility. This is the ‘Safety Requirements Allocation’ phase of the
Overall Safety Lifecycle. Each safety function is allocated to one or more safety-
related systems or risk reduction facilities in such a way as to meet the safety
functions requirements and safety integrity requirements for that function. The result
of the allocation process is, for each safety-related system or risk reduction facility, a
set of safety functions and associated safety integrity requirements.

In the example shown in Fig. 4, safety function SF1 is allocated to both an E/E/PE
safety-related system and to an ‘other technology’ safety related system. In this
case a single PES (PES 1) is used to perform all the safety functions allocated to
E/E/PE safety-related systems. Safety function SF2 is also allocated to E/E/PE
technology and hence will also be performed by PES1. Consequently, PES 1 is
required to perform 2 safety functions, SF1a and SF2, having safety integrity
requirements SIR1a and SIR2 respectively. Note that SIR1a will differ from the
safety integrity requirement of the safety function SF1, because SF1 has been
allocated to 2 different safety-related systems, each of which will take a share of the
integrity requirement.

Safety integrity levels (SILs)

The final stage in the development of the E/E/PES safety requirements is translate
the safety integrity requirements of those safety functions implemented in E/E/PE
safety related systems into safety integrity levels (SILs). If the safety integrity
requirements have been developed on a quantitative basis, then the SIL for a safety
function is determined simply by reference to Tables 2 & 3 according to whether the
safety integrity requirement is expressed in terms of:

a) the average probability of failure on demand (PFD), or

b) the probability of a dangerous failure per hour.

The SIL forms the basis for the qualitative grading of the techniques and measures
used for the avoidance and control of systematic faults in both hardware and
software whilst the quantitative target failure measure provides the upper limit for the
quantified estimate of failure probability.

23/07/01
IEESBrown.DOC
Safety integrity level Target failure measure
(Average probability of failure to perform its design function on
demand)
4 ≥ 10-5 to < 10-4
3 ≥ 10-4 to < 10-3
2 ≥ 10-3 to < 10-2
1 ≥ 10-2 to < 10-1

Table 2 Safety integrity levels for safety functions operating in the low demand demand mode
of operation

Safety integrity level Target failure measure


(Probability of a dangerous failure per hour)
4 ≥ 10-9 to < 10-8
3 ≥ 10-8 to < 10-7
2 ≥ 10-7 to < 10-6
1 ≥ 10-6 to < 10-5

Table 3 Safety integrity levels for safety functions operating in the high demand / continuous
mode of operation

Qualitative determination of safety integrity levels

IEC 61508 accepts that a quantitative approach towards the determination of the SIL
of a safety function is not always appropriate. In such situations, a qualitative
approach, such as a risk graph, can be used. Such an approach however requires
great care to ensure that adequate risk reduction is achieved. IEC 61508-5 provides
general guidance on the use of such techniques. If this approach is used there is still
a need to adopt quantitative target failure measure for the purpose of failure
probability modelling. In this case, the quantitative target failure measure is taken to
be the highest probability of failure associated with the SIL, according to Table 1 or
2.

Design of the E/E/PE safety-related system

Having specified the safety requirements for the E/E/PE safety-related system(s), the
task is then to design and integrate the hardware and software to meet those
requirements. IEC 61508 has requirements in 3 key areas, each of which must be
met in order for compliance with the standard to be claimed. These are:

a) quantified failure probability, and

b) hardware fault tolerance, and

c) avoidance and control of systematic faults

These are briefly reviewed as follows:

23/07/01
IEESBrown.DOC
Quantified failure probability

IEC 61508 requires that a quantitative analysis is undertaken to estimate the


probability of failure of each safety function in an E/E/PE safety-related system. This
is to ensure that the probability of failure of each safety function due to the following
failure causes is lower than the specified target failure measure:

a) random failures of hardware components within the E/E/PES, and

b) common cause failures within the E/E/PES, and

c) failures of any data communication processes used to support safety functions.

The effect of random hardware failures can be modelled using traditional reliability
and availability analysis techniques. For example, IEC 61508-6 gives guidance on
the use of reliability block diagrams and other techniques such as Markov analysis
may be used. The analysis should take into account the use of any automatic
diagnostics, and any periodic proof testing to reveal failures not detected by
diagnostics.

[Automatic diagnostic testing is a technique which can be put to particularly good


effect in electronic equipment where it is possible to detect a very high percentage of
those failures which could lead to danger by inhibiting the operation of the safety-
related system. However, for diagnostics to be effective, it is essential that the
E/E/PE safety-related system is able to take detect faults and take action quickly
enough to prevent a hazardous situation. A particularly important factor in this
consideration is the diagnostic test interval which needs to be short enough in
relation to the expected demand rate or, in the case of a high demand / continuous
safety function to the ‘process safety time’. The process safety time is the time it
would take for a fault in the equipment or equipment control system to develop into a
hazardous situation in the absence of the safety function.]

[Periodic proof testing is important where there is a possibility of failures occurring in


the safety-related system which will not be detected by automatic diagnostic tests.
For example, if 2 limit switches are wired in series such that the opening of a guard
on a machine opens both sets of switch contacts. The use of 2 switches provides
protection against the failure of one of the switches in the closed position. But in
order to ensure that the required level of protection is maintained over a period of
time it would be necessary to manually check the operation of each switch
individually. Proof tests will often require the partial disassemble of the equipment to
test the operation of redundant items.]

The requirement to take into account common cause failures is included because
redundancy is often used within an E/E/PE safety-related system to reduce the
probability of failure due to random hardware faults. In practice, the benefit to be
gained from the use of redundancy as a technique enhance reliability will be limited
by the likelihood of faults occurring simultaneously due to a common cause (e.g.
over-heating). IEC 61508-6 gives an example of one methodology which may be
used to take account of common cause failures, but other methodologies may be
equally acceptable.

23/07/01
IEESBrown.DOC
When data any form of data communication system (e.g. field bus) is used to support
a safety function it is necessary to ensure that the likelihood of safety-related
information being corrupted, lost or excessively delayed by the communication
process is less than the target failure measure.

Whilst there is no specific requirement to undertake a quantitative analysis with


regard to human factors, there is a general requirement that the design of the system
should take into account human capabilities and limitations. It would therefore be
expected that if a safety function requires human action (e.g. response to an alarm
condition), then the likelihood of the correct action being taken should be
considered3.

Hardware fault tolerance

An E/E/PE safety-related system is considered to be tolerant to a hardware fault if


the fault does not cause a loss of the safety function. For example, if a system
comprises 2 redundant parallel channels of identical hardware, each of which can
perform the safety function, then a random hardware fault in one of the channels will
not cause a loss of the safety function. Such a system can tolerate a single random
hardware fault and is referred to as having a hardware fault tolerance of 1. A 3
channel system, where a single channel can continue to perform the safety function
in the case of a fault in each of the other 2 channels is considered to have a
hardware fault tolerance of 2.

The concept of hardware fault tolerance can also be applied to subsystems within
the E/E/PE safety-related system. For example, 2 sensors arranged in a redundant
configuration can be thought of as a single sensor subsystem having a hardware
fault tolerance of 1. This is sometimes referred to as a ‘single redundant’
architecture.

IEC 61508-2 places an upper limit on the SIL which can be claimed for any safety
function on the basis of the fault tolerance of the subsystems which are used by the
safety function. These limits are referred to as ‘architectural constraints’ because
they are principally function of the architecture of the subsystem. The limit which
applies to any particular subsystem is a function of:

a) the hardware fault tolerance of the subsystem, and

b) the fraction of failures of the subsystem which can be regarded as ‘safe’ because
they are either in a mode which does not cause a loss of the safety function, or are
detected by automatic diagnostic tests (the so called safe-failure fraction), and

c) the degree of confidence in the behaviour of the subsystem under fault conditions

Reference should be made to IEC 61508-2 for a full description of how to derive the
limit taking into account the above factors. However, Table 4 shows the limits which
apply to ‘worst case’ and ‘best case’ in terms of the above parameters. In the
absence of sufficient information it would be necessary to assume ‘worst case’ and it

23/07/01
IEESBrown.DOC
would not be allowed to use a single channel subsystem, having zero hardware fault
tolerance, to support a safety function. However, provided that the specified criteria
can be fulfilled, in the ‘best case’ it would be possible to claim up to SIL 3 for a single
channnel subsystem.

Hardware fault tolerance of subsystem


0 1 2
Worst case Not allowed SIL 1 SIL 2
Best case SIL 3 SIL 4 SIL 4

Note: ‘Worst case’ is for a complex programmable subsystem, with low (or unknown) safe failure
fraction. ‘Best case’ is for a low complexity subsystem with a high safe failures fraction.

Table 4 SIL limits as determined by hardware fault tolerance of a subsystem

It is a central point of IEC 61508 that these architectural constraints impose a limit on
the SIL of a safety function which cannot be exceeded, even if the quantified
estimate of failure probability aligns with a higher SIL in terms of Table 1 or 2.

Avoidance and control of systematic faults

Systematic faults are those faults, in either hardware or software, which will always
result in a failure when a particular combination of circumstances (e.g. environmental
conditions or input signal states) arises. Such faults are often introduced during the
specification and design phases, but can also result errors introduced during
integration, operation and maintenance.

Unlike random hardware failures, the likelihood of systematic failures in hardware or


software cannot easily be estimated. Therefore, no analysis can be undertaken with
a view to confirming that a given design of hardware or software will result in a failure
rate, due to systematic faults, in line with the specified target failure measure.
Instead, the approach of IEC 61508 is to recommend that certain measures and
techniques are adopted during the design phase in an attempt to avoid the
introduction of systematic errors during design and also that the design of the
hardware and software should incorporate measures for the control of systematic
faults, should they arise during actual operation. These measures and techniques
have been selected on the basis of the judgement of the members of the IEC
working group which developed IEC 61508. They are graded according to the SIL
requirements and are presented in IEC 61508-2 for hardware and IEC 61508-3 for
software. Examples of some of these recommendations are shown in Tables 6 & 7.

23/07/01
IEESBrown.DOC
Technique/measure SIL1 SIL2 SIL3 SIL4
Observance of HR HR HR HR
guidelines and
standards
Project management HR HR HR HR

Documentation HR HR HR HR

Structured design HR HR HR HR

Modularisation HR HR HR HR

Use of well-tried R R R R
components
Semi-formal methods R R HR HR

Checklists - R R R

Computer-aided - R R R
design tools
Simulation - R R R

Inspection of the - R R R
hardware or walk-
through of the
hardware
Formal methods - - R R

R = recommended
HR = highly recommended

All the techniques marked ‘R’ in the grey shaded group are replaceable, but at least one of these is
required.

Table 6 Examples of recommendations to avoid introduction of faults during


design & development of E/E/PE safety related system (IEC 61508-2)

23/07/01
IEESBrown.DOC
Technique/Measure SIL1 SIL2 SIL3 SIL4
1 Fault detection and diagnosis ---- R HR HR
2 Error detecting and correcting R R R HR
codes

3a Failure assertion programming R R R HR


3b Safety bag techniques --- R R R
3c Diverse programming R R R HR
3d Recovery block R R R R
3e Backward recovery R R R R
3f Forward recovery R R R R
3g Re-try fault recovery mechanisms R R R HR
3h Memorising executed cases --- R R HR
4 Graceful degradation R R HR HR
5 Artificial intelligence - fault --- NR NR NR
correction
6 Dynamic reconfiguration --- NR NR NR
7a Structured methods including for HR HR HR HR
example, JSD, MASCOT, SADT
and Yourdon.
7b Semi-formal methods R R HR HR
7c Formal methods including for
example, CCS, CSP, HOL, --- R R HR
LOTOS, OBJ, temporal logic,
VDM and Z
8 Computer-aided specification tools R R HR HR

Appropriate techniques/measures should be selected according to the safety integrity level. Alternate
or equivalent techniques/measures are indicated by a letter following the number. Only one of the
alternate or equivalent techniques/measures has to be satisfied.

Table 7 examples of recommendations to avoid introduction of faults during software design &
development (IEC 61508-3)

Proven-in-Use

The measures and techniques for the avoidance & control of systematic faults as
recommended by IEC 61508 generally have to be incorporated as the system
progresses through the various phases of the safety lifecycle. It therefore may not
be possible to claim that an existing item of equipment, not designed according to
IEC 61508, is compliant with the standard in this regard. Nevertheless, there may
be a high degree of confidence, resulting from the previous use of the equipment in a
similar application, that the performance of the equipment, with regard to both
random hardware failures and systematic failures is such that the target failure
measure for the E/E/PE safety-related system can be achieved. In fact, the
previous use of both hardware and software can be a very effective way of proving
the suitability of equipment for use in a safety-related application.

However, this route should be used with extreme caution, especially in relation to
programmable electronic systems, because even minor differences between a
previous application can be the cause of unrevealed systematic faults. IEC 61508-2
defines the criteria which allow the use of such a ‘proven-in-use’ subsystem (which
might comprise both hardware & software). Key factors are the adequacy of the
records of past failures and the match between the previous conditions of use and
those which will be experienced in the intended application. Where there is any
mismatch it will be necessary to undertake analysis and/or testing to demonstrate
that the likelihood of unrevealed systematic faults is low enough.

23/07/01
IEESBrown.DOC
Requirements for subsystems

Typically, an E/E/PE safety-related system will comprise a number of subsystems


such as sensors (e.g. fire or gas detectors), logic or signal processors (e.g.
programmable logic controllers (PLCs), and actuators such as electrical contactors
or process control valves. Due to the fact that the safety integrity requirements
relate to the safety function, and not to any individual item of equipment or
subsystem, it is not correct to refer to any individual item of equipment as having a
safety integrity level (SIL). Rather, the characteristics of the individual subsystems in
terms of the key parameters of quantified failure probability, hardware fault tolerance
and techniques used avoidance and control of systematic faults (or prove-in-use
qualities) should be available so that the designer of the complete E/E/PE safety-
related system can meet the E/E/PE safety requirements taking into account the
contributions from all the subsystems used by the safety functions.

Conclusion

IEC 61508 provides a methodology for the determination of the safety requirements
specification of safety-related systems based on electrical / electronic /
programmable electronic technology. The aim is to ensure that the design and
performance of such systems is adequate to meet tolerable risk targets, taking into
account all sources failure including random hardware faults and systematic faults in
both hardware and software.

For a safety-related system based on electrical / electronic / programmable


electronic (E/E/PE) technology to be considered compliant with IEC 61508 it must
meet key requirements relating to quantified failure probability, hardware fault
tolerance and the avoidance and control of systematic faults in both hardware and
software. Additionally, the design must take into account human capabilities and
limitations.

The standard is set to influence the requirements for subsystems such as sensors,
logic controllers, signal processing electronics and actuators used in safety-related
applications.

Acknowledgment

This article is based on the work of the many experts within the IEC working groups
(IEC SC65A WG9, WG10) responsible for the development of IEC 61508. The
author gratefully acknowledges that work as the basis for this article.

Further reading

STOREY, N: ‘Safety-critical computer systems’ (Addison Wesley Longman, 1996)

23/07/01
IEESBrown.DOC
References

1 IEC 61508 - Functional safety of electrical / electronic / programmable electronic


safety-related systems (7 parts). Available as BS IEC 61508 from BSI, Milton
Keynes, UK, or from the IEC, Geneva (www.iec.ch).

2 Reducing Risks, Protecting People - A discussion document - Health & Safety


Executive, UK, 1999, HSE Books (www.hsebooks.co.uk)

3 Alarm Systems - A guide to design, management and procurement, The


Engineering Equipment and Materials Users Association, 1999, EEMUA publication
no.191

23/07/01
IEESBrown.DOC
Concept

Overall scope
definition

Hazard and risk


analysis

Overall safety
requirements

Safety requirements
allocation

Safety-related Safety-related External risk


systems: systems: reduction
Overall planning E/E/PES Other
facilities
OveralI OveralI technology
Overall
operation and
maintenance
safety
validation
8
installation and
commissioning
Realisation Realisation Realisation
planning planning planning

Overall installation
and commissioning

Overall safety Back to appropriate


validation Overall safety lifecycle
phase

Overall operation, Overall modification


maintenance and repair and retrofit

Decommissioning
or disposal

NOTE Activities relating to verification , management of functional safety and functional safety assessment are
not shown but are relevant to all the lifecycle phases.

Figure 1 Overall Safety Lifecycle

23/07/01
IEESBrown.DOC
System boundary

Hazards H1 H2 H3

Risks R1 R2 R3

Tolerable Risks TR1 TR2 TR3

Risk reduction ∆R1 ∆R2 ∆R3

Overall Safety Requirements


Safety Functions Requirements SF1 SF2 SF3
Safety Integrity Requirements SIR1 SIR2 SIR3

Allocation E/E/PES Other Technology External Risk


Reduction
PES 1 Facilities
SF1a
E/E/PES S I R 1 a SIL x Not considered further
Safety
in IEC 61508
Requirements SF2
SIR2 SIL y

E/E/PES requirements - IEC 61508-2


Software requirements - IEC 61508-3

Figure 2 - Development of Safety Requirements

23/07/01
IEESBrown.DOC

You might also like