Accident Investigation Methods 1603714804 PDF
Accident Investigation Methods 1603714804 PDF
Accident Investigation Methods 1603714804 PDF
Research
Study on Developments in
Accident Investigation Methods:
www.ski.se A Survey of the “State-of-the-Art”
Erik Hollnagel
Josephine Speziali
January 2008
S TAT E N S K Ä R N K R A F T I N S P E K T I O N
Swedish Nuclear Power Inspectorate
POST/POSTAL ADDRESS SE-106 58 Stockholm
BESÖK/OFFICE Klarabergsviadukten 90
TELEFON/TELEPHONE +46 (0)8 698 84 00
Purpose
The objective of this project was to survey the main accident investigation methods that
have been developed since the early or mid-1990s and to develop well grounded
principals or criteria that could be used to characterise the chosen methods.
Result
The different methods were catagorised due to the dimensions of coupling, going from
loose to tight, and interactions (tractability). This led to four groups where the nuclear
industry fit into the group that are tightly coupled and intractable and therefore need to
use methods that are suitable for those. Examples of such methods are FRAM
(Functional Resonance Accident Model) and STAMP (Systems-Theoretic Accident
Modeling and Process).
The majority of incidents that happens and are investigated by the nuclear industry can
however be characterised to the group that is less tightly coupled and more tractable.
Methods that suites that group are for example CREAM (Cognitive Reliability and
Error Analysis) and the MTO method. There are also many incidents/low level events
that can be investigated with even less powerful methods.
To get some guidance in choosing the right method a number of questions can be asked,
for example:
1. Was the accident similar to something that has happened before, or was it new
and unknown? (The reference should be the history of the installation, as well as
industry wide.).
2. Was the organisation ready to respond to the accident, in the sense that there
were established procedures or guidelines available?
3. Was the situation quickly brought under control or was the development
lengthy?
4. Was the accident and the material consequences confined to a clearly delimited
subsystem (technological or organisational) or did it involve multiple
subsystems, or the whole installation?
5. Were the consequences on the whole expected / familiar or were they novel /
unusual?
6. Were the consequences in proportion to the initiating event, or were they
unexpectedly large (or small)?
Through the study the SKI has increased its knowledge of different methods and their
range of use. The MTO method is suitable for incidents that are somewhat complex but
for simpler incidents/low level events it might be too powerful and time-consuming.
The important thing is that one is aware of ones choices and how they affect the result
and that the method chosen is appropriate for the situation so that the root causes can be
identified. No incidents is however prevented just by investigating them but there is also
a need for an organisation that deals with the results and makes sure that the right
countermeasures are taken
Further research
There are today no further projects planned by the SKI within this field. We are
however following the research in the field that is done by others.
Project information
SKI project coordinator: Pia Jacobsson.
SKI referens: SKI 2007/1819, SSM2008/177
Project number: 200703011
SKI Report 2008:50
Research
Study on Developments in
Accident Investigation Methods:
A Survey of the “State-of-the-Art”
Erik Hollnagel
Josephine Speziali
January 2008
7
1 Objective
The complexity of socio-technical systems has for many decades been steadily growing
across all industrial domains, including nuclear power production. One tangible
consequence is that many of the incidents and accidents that occur today defy simple
explanations, for instance in terms of cause-effect chains. To explain what happens
requires more elaborate approaches – which means more sophisticated models and more
powerful methods. Accident models provide the principles that can be used to explain
how accidents happen. They are a convenient way of referring to the set of axioms,
assumptions, beliefs, and facts about accidents that form the basis for understanding and
explaining specific events. The methods describe – or even prescribe – how an
investigation should be performed in order to produce an explanation of the accident,
typically in a step-by-step fashion. The purpose of the methods is to ensure that the
model concepts are applied consistently and uniformly, thereby limiting the
opportunities for subjective interpretations and variations. An accident investigation
should clearly not depend on personal insights and skills, but should rely on generalised
public knowledge and institutionalised common sense.
The development of new methods and approaches has often been driven by the inability
of established methods to account for novel types of accidents and incidents. Another
motivation has been a lack of efficiency, in the sense that recommendations and
precautions based on the usual explanations have not lead to the desired effects and
improvements. A third motivation has been new theoretical insights, although this rarely
has happened independently of the former.
The objective of this project was to make a survey of the main accident investigation
methods that have been developed in the last decade or so, i.e., since the early or mid-
1990s. The work consisted of two equally important parts. One was to compile a list of
methods corresponding to the overall selection criteria, and from that to select a subset
for more detailed consideration. The other was to develop an argued set of principles or
criteria that could be used to characterise the methods. The aim of this survey has not
been to recommend any specific method as the overall ‘best’, but rather to provide an
analysis and synthesis that can serve as the basis for a choice in concrete cases.
8
understanding why an accident happened for obvious reasons is the primary concern,
most methods emphasise that and pay little or no attention to the other parts of the
investigation.
2 Background
A previous SKI study (Harms-Ringdahl, 1996) surveyed fifteen methods for risk
assessment from an industrial perspective. Of these, the following four were
characterised as directly applicable to accident investigation:
z Deviation analysis (avvikelseanalys),
z Human Error Analytical Taxonomy (HEAT),
z Management Oversight and Risk Tree (MORT), and
z Safety Management and Organization Review Technique (SMORT),
while two were considered potentially applicable:
z CRisis Intervention in Offshore Production (CRIOP), and
z International Safety Rating System (ISRS).
9
z the emphasis on high reliability organisations, (e.g., Weick et al., 1999),
z the changing perspective on causality, moving from sequential models to
systemic models (Hollnagel, 2004),
z the associated change in view on “human error”, from the “old” look to the
“new” look (Dekker, 2006),
z the change from training in specific skills to training in general communication
and collaboration (Helmreich et al., 1999),
z the change from reactive to proactive safety, as marked by resilience
engineering, (Hollnagel, Woods & Leveson, 2006).
In the same period, i.e., since the mid-1990s, the growing complexity of socio-technical
systems has also necessitated the development of more powerful accident investigation
methods and analytical principles. This complexity, which was aptly diagnosed by
Perrow (1984), has unfortunately often been marked by serious accidents, and shows no
sign of abating. Some of the better known examples are the JCO accident at Tokai-
Mura, Japan (1999), the space shuttle Columbia disaster (2003), and the Überlingen
mid-air collision (2002) – plus literally thousands of small and large accidents in
practically every industrial domain. This development is not isolated to a specific
industrial domain, such as NPP, but has happened in many different industries and
service functions.
One consequence of this has been the realisation that accident investigation and risk
assessment are two sides of the same coin, in the sense that they consider the same
events or phenomena either after they have happened (retrospectively) or before they
happen (prospectively). In the prospective case there is, of course, the possibility that an
event may never occur; indeed, the main rationale for risk assessment is to ensure that
this is the case. The dependency between accident investigation and risk assessment has
been emphasised both by the so-called second generation HRA methods (in particular
ATHEANA, Cooper et al., 1996; CREAM, Hollnagel 1998; and MERMOS, Le Bot at
al., 1999), and is also a central premise for Resilience Engineering (Hollnagel, Woods,
& Leveson, 2006).
10
introduced the category of near-accidents, meaning those events that produced no injury
whatsoever although they had the potential power to do so (Ibid, p. 4). From the 1980s
and onwards it became common to refer to near misses, defined as situations “where an
accident could have happened had there been no timely and effective recovery” (van der
Schaaf & Kanse, 2004), and to incidents as something in between. (Depending on the
domain, the definitions often refer to the seriousness of the outcome, for instance
whether human life was lost.) This project has looked only at accidents, and has not
considered incidents or near misses. It is possible, and even likely, that the same
approach can be used to characterise how other outcome types are investigated, but to
argue this issue has been beyond the scope of the work reported here.
The purpose of an accident investigation is, of course, to understand why the accident
happened. This is often expressed as a question of finding the possible cause or causes,
and since the late 1970s or early 1980s it has been common both to look for clearly
recognisable causes (corresponding to Aristotle’s notion of effective cause 1 ) and to
point to the “human error” as a main cause of accidents (e.g., Hollnagel, 1998). As far
as the latter tendency is concerned, it is important to keep in mind that finding the
causes is a psychological rather than a logical process. In particular,
“... ‘human error’ is not a well defined category of human performance.
Attributing error to the actions of some person, team, or organisation is
fundamentally a social and psychological process and not an objective, technical
one.”
(Woods et al., 1994, p. xvii)
While there are few who will dispute the need to learn from experience, such learning
can come about in many different ways and may range from being thorough to being
quite superficial. To learn from experience requires more than collecting data from
accidents, incidents, and near-misses or building a company-wide database. Some
organisations nevertheless seem to believe that this is sufficient, probably because they
confuse data with experience. But whereas data are relatively easy to amass and can be
collected more or less as a routine or procedure, experience requires the investment of
considerable effort and time in a more or less continuous fashion. Accident
investigation is an important part of learning from experience. Some of the fundamental
issues that an investigation method must address are: what is reported – and when? how
events are analysed? how the results are used and communicated? and what the effects
are on safety and daily practice?
An accident investigation always follows a method or a procedure. There are many
different methods available, both between and within domains, that may differ with
respect to how well formulated and how well founded they are. The importance of
having a good method cannot be overstated. The method will direct the investigation to
look at certain things and not at others. A root cause analysis, for instance, will tend to
look for definitive causes while a ‘Swiss cheese’ or epidemiological analysis will tend
to look for latent conditions. It is simply not possible to begin an investigation with a
completely open mind, just as it is not possible passively to ‘see’ what is there. Accident
1 Aristotle proposed a distinction between four types of causes: (1) the material cause is that from
which something comes into existence, i.e., the parts of a system; (2) the formal cause tells us what
something is, the fundamental principles or general laws; (3) the efficient cause is that from which the
change or the ending of the change first starts, corresponding to the present day concept of a cause-effect
relation; and (4) the final cause, or the purpose, is that for the sake of which something exists or is done,
including both purposeful and instrumental actions and activities.
11
investigations, as well as searches in general, seem, to conform to the What-You-Look-
For-Is-What-You-Find (WYLFIWYF) principle (Hollnagel, 2008). Since an
investigation method always will bias the investigation, it is important that investigators
not only known the methods they use, in the sense that they are proficient users, but also
that they acknowledge the explicit and implicit assumptions that every method makes.
(In terms of the terminology, it is common to find the terms analysis and investigation
used as if they were synonyms. This is, of course, not the case, since an accident
investigation always is more comprehensive than an accident analysis. In addition to
making the analysis, an investigation requires planning, data collection, registration,
recommendations, implementation, and evaluation. The objective of this project has
been to look at accident investigation methods, but to do so it has been necessary also to
consider some accident analysis methods.)
12
Perrow proposed two descriptive dimensions to characterise different types of
accidents: interactions and coupling. With regard to the interactions a complex system –
in contrast to a linear system – was characterised by the following:
z Indirect or inferential information sources.
z Limited isolation of failed components.
z Limited substitution of supplies and materials.
z Limited understanding of some processes (associated with transformation
processes).
z Many control parameters with potential interaction.
z Many common-mode connections of components not in production sequence.
z Personnel specialization limits awareness of interdependencies.
z Proximate production steps.
z Tight spacing of equipment.
z Unfamiliar or unintended feedback loops.
According to Perrow, complex systems are difficult to understand and comprehend and
are furthermore unstable in the sense that the limits for safe operation (the normal
performance envelope) are quite narrow. Perrow contended that we have complex
systems basically because we do not know how to produce the same output by means of
linear ones. And once built, we keep them because we have made ourselves dependent
upon their products!
Systems can also be described with respect to their coupling, which can vary between
being loose or tight. The meaning of coupling is that subsystems and/or components are
connected or depend upon each other in a functional sense. Thus, tightly coupled
systems are characterised by the following:
z Buffers and redundancies are part of the design, hence deliberate.
z Delays in processing not possible.
z Sequences are invariant.
z Substitutions of supplies, equipment, personnel is limited and anticipated in the
design.
z There is little slack possible in supplies, equipment, and personnel.
z There is only one method to reach the goal.
z Tightly coupled systems are difficult to control because an event in one part of
the system quickly will spread to other parts.
Perrow used these two dimensions of interactions and coupling to illustrate differences
among various types of systems, cf. Figure 1.
13
Figure 1: The coupling-interaction diagram (Perrow, 1984)
The worst possible combination with regard to the accident potential is, of course, a
complex and tightly coupled system. Perrow's prime example of that was the nuclear
power plant, with Three Mile Island accident as a case in point. Other systems that
belong to the same category were, e.g., aircraft and chemical plants. It is characteristic,
and probably not a coincidence, that all the systems Perrow describes in the book were
tightly coupled and only differed with respect to their complexity, i.e., they were mostly
in the second quadrant.
Perrow’s thesis, as expressed by Figure 1, is relevant for accident investigation
methods, since the explanation of an accident must be able to account for the nature of
interactions and the degree of coupling in the system. If we, for the sake of argument,
refer to the four quadrants of Figure 1, then it is clear that systems in quadrant 3 differ
in important respects from systems in quadrant 2. A method that may be adequate to
explain an accident in a quadrant 3 system, such as a person that is injured while
working at an assembly line, is unlikely to be sufficient to explain an accident in a
quadrant 2 system, such as an INES event at a nuclear power plant. (Even though the
converse is not necessarily true, it may be inefficient to use the more complex and
powerful methods to investigate accidents in simple systems.) The diagram therefore
provides an external frame of reference for accident investigations methods in addition
to the more traditional requirements such as consistency, reliability, usability, etc.
14
CCPS (1992), DOE (1999) and Sklet (2002). The methods already described by Harms-
Ringdahl (1996) – Deviation Analysis, HEAT, MORT, and SMORT – have not been
included in the set. Neither have methods that properly speaking were aimed at risk
assessment (e.g., Bayesian Belief Networks combined with Fault Trees), technological
malfunctions (e.g., Sneak Path Analysis), human reliability (e.g., ATHEANA), or safety
management methods (e.g., TRIPOD).
The first survey of the literature, applying the selection criteria described above,
produce a list of 21 accident investigation or accident analysis methods. The methods
are briefly identified and described in Table 1 below.
15
Acronym Method name Short description Source and
Year
ECFCA Events and Causal The events and causal factors chart may be DOE (1999)
Factors Charting and used to determine the causal factors of an
Analysis accident. This process is an important first step
in later determining the root causes of an
accident. Events and causal factors analysis
requires deductive reasoning to determine
which events and/or conditions that contributed
to the accident.
FRAM Functional A method for accident investigation as well as Hollnagel
Resonance Accident risk assessment based on a description of (2004)
Model system functions. Non-linear propagation of
events are described by means of functional
resonance.
HERA HERA HERA is a method to identify and quantify the Isaac et al.
impact of the human factor in incident/accident (2002)
investigation, safety management and
prediction of potential new forms of errors
arising from new technology. Human error is
seen as a potential weak link in the ATM
system and, therefore, measures must be taken
to prevent errors and their impact, and to
maximise other human qualities such as error
detection and recovery. HERA is predicated on
the notion that human error is the primary
contributor to accidents and incidents.
HFACS Human Factors HFACS identifies the human causes of an FAA/NTIS
Analysis and accident and provides a tool to not only assist in (2000)
Classification System the investigation process, but to target training
and prevention efforts. HFACS looks at four
levels of human failure, referring to the "Swiss
cheese" model. These levels include unsafe
acts (operator error), preconditions for unsafe
acts (such as fatigue and inadequate
communication), unsafe supervision (such as
pairing inexperienced aviators for a difficult
mission), and organizational influences (such
as lack of flight time because of budget
constraints).
HFIT Human factors HFIT was developed on a theoretical basis with Gordon, Flin
investigation tool reference to existing tools and models. It & Mearns
collects four types of human factors information (2005)
including (a) the action errors occurring
immediately prior to the incident, (b) error
recovery mechanisms, in the case of near
misses, (c) the thought processes which lead to
the action error and (d) the underlying causes.
16
Acronym Method name Short description Source and
Year
HINT – HINT – J-HPES HINT is a recent development of J-HPES, the Takano et al.
J-HPES Japanese version of INPO’s Human (1994)
Performance Evaluation System, cf. below. The
overall principle is to use a root cause analysis
of small events to identify trends, and to as a
basis for proactive prevention of accidents. The
method comprises a number of steps (similar to
SAFER, cf. below). These are: Step 1:
Understand the event. Step 2: Collect and
classify causal factor data. Step 3: Causal
analysis, using root cause analysis. And Step 4:
Proposal of countermeasures.
HPES Human Performance A method sponsored by INPO that utilizes a INPO (1989)
Enhancement System family of techniques to investigate events, with
particular emphasis on determining human
performance aspects. The HPES methodology
incorporates many tools such as task analysis,
change analysis, barrier analysis, cause and
effect analysis, and event and causal factor
charting. Additionally, many similar
methodologies have been developed from
HPES and adapted where necessary to suit the
specific requirements of individual
organizations.
MTO Människa-Teknologi- The basis for the MTO-analysis is that human, Rollenhagen
Organisation organisational, and technical factors should be (1995);
focused equally in an accident investigation. Bento (1992)
The method is based on HPES (Human
Performance Enhancement System)
PEAT Procedural Event The objective of PEAT is to help airlines Moodi &
Analysis Tool develop effective remedial measures to prevent Kimball
the occurrence of future similar errors. The (2004).
PEAT process relies on a non-punitive
approach to identify key contributing factors to
crew decisions. Using this process, the airline
safety officer would be able to provide
recommendations aimed at controlling the
effect of contributing factors. PEAT includes
database storage, analysis, and reporting
capabilities.
RCA Root cause analysis Root cause analysis identifies underlying E.g., IAEA
deficiencies in a safety management system (1999)
that, if corrected, would prevent the same and
similar accidents from occurring. Root cause
analysis is a systematic process that uses the
facts and results from the core analytic
techniques to determine the most important
reasons for the accident.
17
Acronym Method name Short description Source and
Year
SAFER SAFER 2007 SAFER is a generic method for accident Yoshizawa
investigation developed by TEPCO (J). Step 1 (1999)
Understand HF Engineering. Step 2 Make an
event flow chart: Arrange information to
understand the detail of the event and to have a
basis for communication and sharing of
information. Step 3 Pick up Problematic Points.
Step 4 Produce a Background Factors
Causality Diagram, that represents causality
among the factors. Step 5 Think out measures
to cut off the causality from background factors
(according to the diagram or event flow chart).
Step 6 Prioritize the Measures. Step 7
Implement the Measures. Step 8 Evaluate the
Effects
SCAT Systematic Cause The International Loss Control Institute (ILCI) Bird &
Analysis Technique developed SCAT for the support of Germain
occupational incident investigation. The ILCI (1985)
Loss Causation Model is the framework for the
SCAT system. The result of an accident is loss,
e.g. harm to people, properties, products or the
environment. The incident (the contact between
the source of energy and the “victim”) is the
event that precedes the loss. The immediate
causes of an accident are the circumstances
that immediately precede the contact. They
usually can be seen or sensed. Frequently they
are called unsafe acts or unsafe conditions, but
in the ILCI-model the terms substandard acts
(or practices) and substandard conditions are
used.
STAMP STAMP The hypothesis underlying STAMP is that Leveson
system theory is a useful way to analyze (2004)
accidents, particularly system accidents.
Accidents occur when external disturbances,
component failures, or dysfunctional
interactions among system components are not
adequately handled by the control system.
Safety is viewed as a control problem, and is
managed via constraints by a control structure
embedded in an adaptive socio-technical
system. Understanding why an accident
occurred requires determining why the control
structure was ineffective. Preventing future
accidents requires designing a control structure
that will enforce the necessary constraints.
Systems are viewed as interrelated
components that are kept in a state of dynamic
equilibrium by feedback loops of information
and control.
18
Acronym Method name Short description Source and
Year
STEP Sequentially Timed They propose a systematic process for accident Hendrick and
Events Plotting investigation based on multi-linear events Benner
sequences and a process view of the accident (1987).
phenomena. With the process concept, a
specific accident begins with the action that
started the transformation from the described
process to an accident process, and ends with
the last connected harmful event of that
accident process.
Swiss Reason's Swiss The Swiss Cheese model of accident causation Reason
cheese Cheese model is a model used in the risk analysis and risk (1990, 1997)
management of human systems. It likens
human systems to multiple slices of Swiss
cheese, stacked together, side by side. It was
originally propounded by British psychologist
James T. Reason in 1990, and has since
gained widespread acceptance and use in
healthcare, in the aviation safety industry, and
in emergency service organizations. It is
sometimes called the cumulative act effect.
TRACEr Technique for TRACEr provides a human error identification Shorrock and
Retrospective technique specifically for use in the air traffic Kirwan (1999;
Analysis of Cognitive control domain. It builds on error models in 2002)
Errors other fields and integrates Wickens' (1992)
model of information processing in ATC.
TRACEr is represented in a series of decision
flow diagrams. The method marks a shift away
from knowledge based errors in other error
analysis tools to better reflect the visual and
auditory nature of ATM. It has proved
successful in analysing errors in AIRPROX
reports to derive measures for reducing errors
and their adverse effects.
It is clear, even from the brief descriptions of the above list, that many methods are
related, in the sense that they refer to the same basic principles. Examples are the
various methods that look at barriers or the methods that look at root causes. It is
therefore necessary to make a selection of a smaller set of methods that deserve a closer
look. In order to do so it is necessary first to consider the criteria on which such a
selection can be made.
19
In a study commissioned by the Occupational Safety and Health Administration
(OSHA) in the US, Benner (1985) rated 14 different accident models and 17 different
accident investigations methods used by various US agencies. He began by a set of
evaluation criteria and a rating scheme developed from user data, statutes, applications,
and work products.
This led to a set of ten criteria that were used to rate the accident models as shown in
Table 2. Most of them actually refer to the quality of the outcome of the analysis
(realistic, definitive, satisfying, comprehensive, disciplining, consistent, direct, and
understandable or visible) rather than the model as such, although that in some sense
also reflects model characteristics. Two criteria relate more directly to the nature of the
accident model, namely that it should be functional and non-causal.
Benner’s initial assumption was that all accident investigation programs were driven by
accident models, and that the methods could therefore be evaluated against common
criteria. But his analyses led him to conclude that this assumption was not valid.
Instead, there were three types of relationships between the accident models and
investigation methodologies. In the first case, the accident model came before the
accident investigation methodology, hence determined that. In the second case the
relation was reversed, i.e., that the investigation methodology determined the accident
20
model. Finally, in the third case, a chosen (institutionalised or traditional) analysis
method would determine both the accident model and the investigation methodology,
without the model or investigation methodology particularly influencing each other. In
view of this conclusion Benner developed separate criteria for evaluating the accident
investigation methodologies. These criteria are listed in Table 3.
It is interesting to note also here that the criteria adress aspects of the methods in use,
e.g., encouragement or initiatives, rather than aspects of a method qua method, e.g.,
reliability or independence of user knowledge.
(It may be of interest to note that the top three accident models, according to Benner’s
criteria, were the Events Process model, the Energy Flow Process model, and the Fault
Tree model. Similarly, the top three accident investigation methods were Events
Analysis, the MORT system, and Fault Tree Analysis. Benner concluded his survey by
recommending both that “significant accident investigation program changes should be
considered in agencies and organizations using lower-ranked accident models or
investigation methodologies”, and that “a compelling need exists for more exhaustive
research into accident model and accident investigation methodology selection
decisions.” Sklet (2002) looked at 15 different methods using the same criteria, but only
characterised them in a final table, without indicating any rank order.)
A different approach is found in a survey of accident models and error classifications
(Hollnagel, 1998), which proposed the following six criteria (Table 4).
21
Table 4: Hollnagel’s (1998) criteria for classifying accident models and methods.
Criterion Definition
Analytic capability Analytic capability, which refers to the ability of each approach to support
a retrospective analysis of events involving human erroneous actions.
The specific outcome of a retrospective analysis should be a description
of the characteristics of human cognition that are included in the set of
assumed causes.
Predictive capability Predictive capability, which refers to the capability of each approach to
predict the probable type of erroneous actions (phenotype) in specific
situations. If possible, predictions should also be made of the likely
magnitude or severity of the erroneous actions. None of the models are
actually very good for making predictions, because predictions require
both a valid model and a reliable data base. While better models are
gradually emerging a reliable data base still awaits a concerted effort.
Technical basis Technical content, as the extent to which models generated from within
each approach are grounded in a clearly identifiable model of human
action.
Relation to existing The relation to and/or dependence on existing classification schemes, as
taxonomies the extent to which each of the three approaches is linked to viable
systems for classifying the erroneous actions that occur in a real-world
processing environment.
Practicality Practicality of each approach, which refers to the ease with which each
approach can be turned into a practical method or made operational.
Cost-effectiveness Finally, the relative costs and benefits that are associated with each
approach.
These criteria aim more directly at the qualities of the method, both with regard to the
theoretical basis and with regard to its efficacy. In some sense they consider both the
accident model (analytic capability, predictive capability, technical basis, relation to
existing taxonomies) and the investigation method (practicality, cost-effectiveness).
In addition to sets of criteria that aim to distinguish among methods, hence to serve as
the basis for a choice in a specific situation, there are also more practical criteria that are
common to all methods.
z Reliability – whether the method will give the same result if applied again (or to
a similar case), and the degree to which the method is independent of the
user/analyst and his/her knowledge and experience.
z Audit capabilities – whether it is possible to retrace the analysis and reconstruct
the choices, decisions, or categorisations made during the analysis. This
corresponds to Benner’s criterion of comprehensiveness.
z Time to learn – how long time does it take to learn to use the method and to
become a proficient user. Although this clearly is a one-time investment, it is
sometimes seen as an argument against adopting a new method.
z Resources needed – or how difficult/easy it is to use the method. Among the
main resources are people (hours of work), time, information and documentation
needs, etc.
z Validity – whether the findings provided by the method are the proper ones.
This is a very contentious issue, since there is no easy way of establishing the
correctness of the findings. It is very unusual that the same accident is
22
investigated in more ways than one., and even then there are no obvious
independent criteria by which to rate the findings.
The motivation for comparing and rating different methods is to be able to choose the
method that is best suited to solve a given problem. Although criteria such as speed,
resource demands, and prevalence in an industry are not unimportant, the primary
concern must be whether an investigation method can do what it is supposed to do,
namely produce an adequate explanation or account of why an adverse event (an
accident or an incident) occurred. An investigation method is basically a tool, and it is
clearly crucial that the tool is well-suited to the task at hand. Although most tools can be
used for different purposes – a wrench can, for instance, be used as hammer – it is
obviously better and more efficient if the tool fit the job precisely. This goes for
physical tools as well as for methods. It is therefore important to be able to characterise
methods with regard to how well they fit the task at hand, which in practice means how
well they can represent and account for the complexity of the actual situations.
Few of the criteria referred to above make any reference to this quality, the main
exception being Benner’s criteria of functional and non-causal. A good starting point
can, however, be found in Perrow’s (1984) description of the complexity of socio-
technical systems, cf. Figure 1. Perrow proposed the two dimensions of coupling, going
from loose to tight, and interactions, going from linear to complex. While the notion of
coupling is relatively straightforward, the notion of complexity must be used with some
care, since it can refer either to the ontological or the epistemological complexity 2
(Pringle, 1951). For practical reasons it is preferable to use a different concept, namely
how easy it is to describe the system, where the extremes are tractable and intractable
systems. A system, or a process, is tractable if the principles of functioning are known,
if descriptions are simple and with few details, and most importantly if the system does
not change while it is being described. Conversely, a system or a process is intractable if
the principles of functioning are only partly known or even unknown, if descriptions are
elaborate with many details, and if the system may change before the description is
completed. A good example of a tractable system is a post office, or rather the normal
functions of a post office, or the operation of a home furnace. Similarly, a good example
of an intractable system is the outage at a NPP or the activities in a hospital emergency
department. In the latter cases the activities are not standardised and change so rapidly
that it is never possible to produce a detailed and complete description.
Using this modification of the terminology, we can propose a new version of Perrow’s
diagram, as shown in Figure 2. (Note that this also means that some of the examples
used by Perrow have to change position; in addition, some examples (e.g., nuclear
weapons accidents) have been deleted, while others (financial markets) have been
introduced. These changes are, however, illustrative rather than exhaustive.)
2 Epistemological complexity can be defined as the number of parameters needed to define a system
fully in space and time. Ontological complexity has no scientifically discoverable meaning as it is not
possible to refer to the complexity of a system independently of how it is viewed or described.
23
Following this principle, accident investigation methods should be characterised in
terms of the systems – or conditions – they can account for. Despite Benner’s (1985)
concerns, this does depend on the underlying accident model. For instance, a simple
linear accident model – such as the domino model (Heinrich, 1931) – can be used to
account for certain types of accidents and not for others. The domino model is suitable
for systems – hence for accidents – that are loosely coupled and tractable. The reason is
simply that most systems were of that type at the time it was developed. Nuclear power
plants considered as systems are, however, tightly coupled and more or less intractable.
They therefore require accident models and accident investigation methods that are
capable of accounting for these features. It is therefore reasonable to characterise
investigation methods in terms of which applications they can account for. While this
will not by itself determine whether one method is “better” than another, it will make it
possible to choose a method that is suitable for a specific purpose and/or system and
thereby also to exclude methods that are unable to meet the requirements of an
investigation.
24
Technique), STEP (Sequentially Timed Events Plotting), and TRACEr (Technique for
Retrospective Analysis of Cognitive Errors).
7.1 Methods suitable for systems that are loosely coupled and tractable
In terms of frequency or numbers, most systems are loosely coupled and tractable even
today. Although a NPP clearly is not among them, and although few other industries of
concern are, many of the commonly used investigation methods nevertheless seem to be
best suited for – or even to assume – that the systems they describe are loosely coupled
and tractable. In practical terms this implies that it is possible both to have a more or
less complete description of the system and to account for events (e.g., failures or
malfunctions) one by one or element by element. While these assumptions make for
methods that are easier or simpler in terms of use, they also means that such methods
are unable to account for complex phenomena, hence to produce practically useful
explanations of accidents of that nature.
Each method is described by means of the following characteristics:
z References: The main scientific referens or source of information that describes
the method.
z Related methods: Other methods of the same type or that use the same principle.
z Main principle: The main analytical principle on which the method is based.
z Procedure: The main steps in using the method.
z Type of results: The main outcomes that the method produces.
z Operational efficiency and methodological strength: how easy it is to use the
method in practice and how much the method depends on the user's knowledge
and experience.
z Theoretical grounding, i.e., how well founded the concepts and categories are –
in essence which accident model the method implies.
z Practical value, i.e., how well the method support effective recommendations.
There are several sub-categories of methods for loosely coupled and tractable systems.
In the following four sub-categories will be described: (1) methods that focus on the
identification of failed barriers, (2) methods that focus on human error, (3) methods that
focus on root causes in isolation, and (4) methods that focus on root causes in
combination.
25
Example of methods that focus on barriers and/or defences and explain accidents as
the result of failed or deficient barriers
26
Examples of methods that focus on human error as the primary contributor to
adverse events
27
Examples of methods that focus on root causes
28
Examples of methods that combine multiple factors to explain accidents
7.2 Methods suitable for systems that are tightly coupled and tractable
The increasing frequency of non-trivial accidents during the 1980s and 1990s made it
clear that many of these could not be explained as a result of sequences or chains of
events, but that it was necessary to account for how combinations of multiple sequences
of events, or of events and latent conditions, could arise. This led to the proposal of
models that often are classified as epidemiological (Hollnagel, 2004). The prototype is
the Swiss cheese model.
29
Name: The Swiss cheese model (SCM)
References: Reason, J. T. (1990). Human Error. Cambridge University Press
Related The TRIPOD concept and set of methods, which in a sense also is the origin
methods: of the SCM. The idea behind TRIPOD is that organisational failures are the
main factors in accident causation. These factors are more “latent” and, when
contributing to an accident, are always followed by a number of technical and
human errors.
HFACS (Human Factors Analysis and Classification System), used by the
Federal Aviation Agency (US).
Main principle: In the Swiss Cheese model, an organization's defences against failure are
modelled as a series of barriers, represented as slices of Swiss cheese. The
holes in the cheese slices represent individual weaknesses in individual parts
of the system, and are continually varying in size and position in all slices. The
system as a whole produces failures when all of the holes in each of the slices
momentarily align, permitting "a trajectory of accident opportunity", so that a
hazard passes through all of the holes in all of the defenses, leading to a
failure.
Procedure: The basic method for using the SCM is to trace backwards from the accident.
The analysis looks for two main phenomena: active failures, which are the
unsafe acts committed by people (slips, lapses, fumbles, mistakes, and
procedural violations); and latent conditions, which arise from decisions made
by designers, builders, procedure writers, and top level management. Latent
conditions can translate into error provoking conditions within the local
workplace and they can create long-lasting holes or weaknesses in the
defences. Unlike active failures, whose specific forms are often hard to
foresee, latent conditions can be identified and remedied before an adverse
event occurs. Understanding this leads to proactive rather than reactive risk
management.
Type of Identification, and classification, of active failures and latent conditions.
results:
Operational The method is initially easy to use, but in its original form lack operational
efficiency and details. This has been remedied in various institutionalised version (e.g., by
methodological SHELL), but it still requires an appreciable level of experience to use
strength: effectively. The method is supported by a rather extensive set of instructional
materials, tutorials, web-based instructions, etc.
Theoretical The method represents a complex, linear model. It is quite similar to a fault
grounding: tree, although the common graphical representation is different – and less
detailed. The method focuses on human errors in combination with latent
operational conditions, and distinguishes between failures at the sharp and
the blunt ends.
Practical value: The model was originally propounded by James Reason, and has since
gained widespread acceptance and use in healthcare, in the aviation safety
industry, and in emergency service organizations. It has recently been called
into question by several authors.
30
Name: MTO (Människa-Teknologi-Organisation or Man-Technology-
Organisation)
References: Rollenhagen, C. (1995)*. MTO – En Introduktion: Sambandet Människa,
Teknik och Organisation. Lund, Sweden: Studentlitteratur.
Bento, J.-P. (1992). Människa, teknik och organisation. Kurs i MTO-analys för
Socialstyrelsen. Studsvik, Nyköping: Kärnkraftsäkerhet och Utbildnings AB.
Worledge, D. (1992). Role of human performance in emergency systems
management. Annual Review of Energy and the Environment, 17, 285-300.
Related The method is based on INPO’s HPES (Human Performance Enhancement
methods: System) described above.
Main principle: The basis for the MTO-analysis is that human, organisational, and technical
factors should be focused equally in an accident investigation.
Procedure: An MTO investigation comprises three methods:
1. Structured analysis by use of an event- and cause-diagram.
2. Change analysis by describing how events have deviated from earlier
events or common practice.
3. Barrier analysis by identifying technological and administrative
barriers which have failed or are missing.
The first step in an MTO-analysis is to develop the event sequence
longitudinally and illustrate the event sequence in a block diagram. Then, to
identify possible technical and human causes of each event and draw these
vertically to the events in the diagram. The next step is to make a change
analysis, i.e. to assess how events in the accident progress have deviated
from normal situation, or common practice. Further, to analyse which
technical, human or organisational barriers have failed or were missing during
the accident progress. The basic questions in the analysis are:
z What may have prevented the continuation of the accident sequence?
z What may the organisation have done in the past in order to prevent the
accident?
The last step in the MTO-analysis is to identify and present recommendations.
These should be as realistic and specific as possible, and might be technical,
human or organisational.
Type of Details and clarification of factors that either led to or contributed to the
results: accident.
Operational The use of the method is supported by instruction materials and books. It is
efficiency and fairly easy to use, but is not recommended for novices. The identification of
methodological specific causes and conditions relies more on experience than on a well-
strength: defined set of categories. The method includes several aspects of a full
accident investigation, including the recommendations.
Theoretical The method refers to a complex, linear accident model. The common
grounding: representation is, however, more in the nature of a fish bone diagram than a
fault tree. The method tends to consider causal factors one by one, rather
than in a larger context.
Practical value: The MTO method has been extensively used by the Swedish NPPs. The
principle is also widely used in other domains, such as traffic safety and
aviation. The MTO methods has many features common with other methods
(Swiss cheese, HPES), but distinguishes itself from the single-factor methods.
31
Name: Cognitive Reliability and Error Assessment Method (CREAM)
References: Hollnagel, E. (1998). Cognitive reliability and error analysis method. Oxford,
UK: Elsevier Science Ltd.
Related CREAM is a so-called second generation HRA methods, but differs from other
methods: methods of the same type (ATHEANA, MERMOS) by being explicitly
developed for both accident investigation and risk assessment.
Main principle: CREAM was developed to be used both predictively and retrospectively.
CREAM uses the Contextual Control Model (COCOM) as a basis for defining
four different control modes (strategic, tactical, opportunistic, scrambled). It is
assumed that a lower degree of control corresponds to less reliable
performance. The level of control is mainly determined by the common
performance conditions (CPC). The retrospective use (accident analysis) is
based on a clear distinction between that which can be observed (called
phenotypes) and that which must be inferred (called genotypes). The
genotypes used in CREAM are divided into three categories: individual,
technological and organisational, corresponding to the MTO triplet.
Procedure: The procedure for CREAM comprises the following steps:
1. Produce a description of what actually happened
2. Characterise Common Performance conditions
3. Produce a time-line description of significant events
4. Select all actions of interest
5. For each action, identify failure mode (this is done iteratively)
6. For each failure mode, find relevant antecedent-consequent links (this
is done recursively)
7. Provide overall description and draw conclusions.
Type of A graph, or a network, of antecedent actions (functions) and conditions that
results: together constitute an effective explanation of the accident. The graph shows
how various actions and conditions affected each other in the given situation.
Operational The CREAM method is clearly described, but not easy to use. This is due to
efficiency and the non-hierarchical nature of the method. The method, however, produces a
methodological clear audit trail, which enhances the reliability. The method has recently been
strength: supported by a computerised navigation tool, which makes it easier to use,
once it has been learned.
Theoretical The method does not look for specific causes, but rather for the operational
grounding: conditions that can lead to a loss of control, hence accidents. It is grounded in
cognitive systems engineering. Similar to other second generation methods it
rejects consider human error as a meaningful causal category. The basis for
the analysis is the event as it happened, rather than preconceived causal
factors.
Practical value: CREAM is a borderline method that in principle can be applied also to
accidents in intractable systems. However, the emphasis on tractability of past
events, if not of the system itself, means that it should primarily be thought of
for use with tractable systems.
CREAM has been used extensively in Norway and Sweden as a specific
method for traffic accidents under the name of DREAM (D stands for Driver).
There has also been a number of uses of the proactive version of CREAM for
risk assessment, for instance for NPP emergency procedures and space
station operations.
32
7.3 Methods suitable for systems that are loosely coupled and
intractable
There are no investigation methods in this category. The reason for that has to do with
the historical development of accident models and investigation methods. At the
beginning, effectively in the 1930s, industrial systems were loosely coupled and
tractable. As technologies and societies developed, systems became more tightly
coupled through vertical and horisontal integration, and at the same time less tractable
because new technologies allowed faster operations with more extensive automation.
The latter meant in particular that they became more or less self-regulating under
normal conditions, which reduced tractability. Since accidents ‘followed’ these
developments, methods were developed to be able to adress the new problems.
Conversely, few if any accident of note took place in loosely coupled, intractable
systems, hence no methods were developed to account for that. The basic reason is that
such systems are social rather than technological, e.g., universities, research companies,
and the like.
7.4 Methods suitable for systems that are tightly coupled and
intractable
The continuously growing complexity of socio-technical systems, and the consequent
reduction of tractability, has led to a fundamental change in the approach to risk and
safety. The most prominent example of that is the development resilience engineering
(Hollnagel, Woods & Leveson, 2006), which changes the focus from failures and
actions gone wrong to the usefulness of normal performance variability. With respect to
accident investigations this means that the aim is to understand how adverse events can
be the result of unexpected combinations of variations in normal performance, thereby
avoiding the need to look for a human error or root cause.
This view is often referred to as a systemic view. There are presently two main
proposals for a method, STAMP and FRAM.
33
Name: System-theoretic model of accidents (STAMP)
References: Leveson, N. G. (2004). A New Accident Model for Engineering Safer Systems.
Science, 42(4), 237-270.
Related Some relation, but not strong, to control theoretic methods such as Acci-map.
methods: Also some similarity to the Why-Because Analysis (WBA), cf.
http://www.rvs.uni-bielefeld.de/research/WBA/
Main principle: The hypothesis underlying STAMP is that system theory is a useful way to
analyze accidents, particularly system accidents. Accidents occur when
external disturbances, component failures, or dysfunctional interactions
among system components are not adequately handled by the control
system. Safety is viewed as a control problem, and is managed via
constraints by a control structure embedded in an adaptive socio-technical
system. Understanding why an accident occurred requires determining why
the control structure was ineffective. Preventing future accidents requires
designing a control structure that will enforce the necessary constraints.
Systems are viewed as interrelated components that are kept in a state of
dynamic equilibrium by feedback loops of information and control. STAMP
claims to be general method for explanation of mishaps with teleological
systems
Procedure: Uses a feedback control system as a specific causal model. The analysis
proceeds along the following lines:
1. In teleological systems, various subsystems maintain constraints
which prevent accidents
2. If an accident has occurred, these constraints have been violated
3. STAMP Investigates the systems involved, especially human-
organisational subsystems, to identify missing or inappropriate
features (those which fail to maintain the constraints)
4. It proceeds through analysing feedback & control (F&C) operations
Type of The most basic component of STAMP is not an event, but a constraint.
results: Accidents are therefore viewed as resulting from interactions among
components that violate the system safety constraints. The control processes
that enforce these constraints must limit system behavior to the safe changes
and adaptations implied by the constraints. Inadequate control may result
from missing safety constraints, inadequately communicated constraints, or
from constraints that are not enforced correctly at a lower level.
Operational STAMP can systematically uncover organisational structures and direct the
efficiency and analyst to ask revealing questions. Since STAMP is an analysis method on ly,
methodological it depends very much on the quality of the investigation report (data,
strength: information). Due to the complexity of the underlying model (cf., below), it
requires a considerable effort to use, and is in its present state only fitted for
experienced users. A method for a structured presentation of results is not
currently available.
Theoretical STAMP uses a specific causal model, i.e., a feedback control system. The
grounding: basic principle is that an accident occurs when operational constraints have
been violated. STAMP investigates systems involved, especially human-
organisational subsystems, to identify missing or inappropriate features (those
which fail to maintain the constraints). It proceeds through analysing feedback
& control (F&C) operations, which replaces the traditional chain-of-events
model. The model includes software, organizations, management, human
decision-making, and migration of systems over time to states of heightened
risk.
34
Practical value: STAMP has not been widely used and must still be considered under
development. The pros and cons of the method have been debated in the
RISK forum (http://catless.ncl.ac.uk/risks).
35
8 Discussion and conclusions
One way of summarising the characterisation of the nine accident investigation methods
described in the preceding chapter is to map them onto the modified Perrow diagram of
Figure 2. The result is shown in Figure 3. This shows that most methods are applicable
to tractable systems, or rather that the assumption is that the systems are tractable.
Conversely, one may conclude that these methods should not be used for intractable
systems, since they will not be able to produce adequate explanations. Several of the
commonly used methods, including root cause analysis, AEB, and HERA, also require
that systems only are loosely coupled; in other words, they are unable to account for the
consequences of tight couplings, hence adequately to explain accidents in systems of
that type.
Figure
3: Characterisation of accident investigation methods
It is sensible to assume that any method would be just about adequate for the typical
type of problems at the time it was developed. Indeed, there would be little reason to
develop a method that was too complex or more powerful than required. As argued in
the beginning, new methods are usually developed because the existing methods at
some point in time encounter problems for which they are inefficient or inadequate.
This, in turn, happens because the socio-technical systems where accidents happen
continue to develop and to become more complex and more tightly coupled. The
inevitable result is that even new methods after a while become underpowered because
the nature of the problems change, although they may have been perfectly adequate for
the problems they were developed for in the first place.
The position of the various methods on the diagram in Figure 3 presents a
characterisation of the methods using the two dimensions of coupling and tractability,
and thereby indirectly represents the developments of socio-technical systems since the
1930s. Without going into the details of this development, the third quadrant can be
seen as representing industrial systems before the middle of the 20th Century, i.e., before
the large scale application of information technology. The development since then has
36
been one in terms of tighter coupling (moving up into the first quadrant) and a loss of
tractability (moving right into the second quadrant). This has in turn required the
development of new methods, as shown in the diagram.
The position of a method reflects the assumptions behind the method, specifically what
has been called the accident model. The arguments for each method were presented
above. To illustrate the significance of the position, consider for instance the two
extremes RCA and FRAM.
z Root cause analysis (RCA) assumes that adverse outcomes can be described as
the outcome of a sequence (or sequences) of events or a chain (or chains) of
causes and effects. The investigation is therefore a backwards tracing from the
accident, trying to find the effective cause(s). The method requires that the
system is tractable, since it otherwise would be impossible to carry out this
backwards tracing. The method also requires that the system is only loosely
coupled, since it otherwise would be impossible to feel confident that the
correction or elimination of the root cause would prevent a recurrence of the
accident.
z The functional resonance accident model (FRAM) assumes that adverse
outcomes are the result of unexpected combinations of normal variability of
system functions. In other words, it is the tight couplings that lead to adverse
outcomes and not sequences of cause(s) and effect(s). Since the investigation
furthermore looks for functions rather than structures, it is less problematic if the
description is intractable. Indeed, functions may come and go over time whereas
system structures must be more permanent. Functions are associated with the
social organisation of work and the demands of a specific situation. Structures
are associated with the physical system and equipment, which does not change
from situation to situation.
This characterisation does not mean that FRAM is a better method than RCA. (A
similar argument can be made for any other comparison of two methods.) But it does
mean that FRAM is well-suited for some kinds of problems and that RCA is well-suited
for others. (It of course also means that there are problems for which either method is
ill-suited.).
In order to choose the right method to investigate an accident it is necessary first of all
to characterise the accident. This can be achieved by asking a number of questions, for
example:
1. Was the accident similar to something that has happened before, or was it new
and unknown? (The reference should be the history of the installation, as well as
industry wide.).
2. Was the organisation ready to respond to the accident, in the sense that there
were established procedures or guidelines available?
3. Was the situation quickly brought under control or was the development
lengthy?
4. Was the accident and the material consequences confined to a clearly delimited
subsystem (technological or organisational) or did it involve multiple
subsystems, or the whole installation?
5. Were the consequences on the whole expected / familiar or were they novel /
unusual?
37
6. Were the consequences in proportion to the initiating event, or were they
unexpectedly large (or small)?
(When considering these questions one should bear in mind, of course, that the answers
rely on an initial and informal understanding of what may have happened. An
experienced accident investigator should be able to do this without being biased by
premature assumptions about the nature of the cause.).
The first three questions illustrate issues that relate to the dimension of tractability. If
the questions are answered positively, it indicates that the system was tractable, at least
to some degree. The opposite is the case if the questions were answered negatively.
Questions 4-6 illustrate issues that relate to the dimension of coupling. If the questions
are answered positively, it indicates that the system was of the loosely coupled type.
The opposite is the case if the questions were answered negatively.
In conclusion, when faced with the need to investigate an accident it is important that
the method chosen is appropriate for the system and the situation, i.e., that it is capable
of providing an explanation. If the accident concerns the NPP operation as a whole, the
problems correspond to the characteristics of the second quadrant. The investigation
method must therefore be able to address systems of that nature. If the accident only
concerns the operation of a subsystem or a component, the problems may correspond to
the characteristics of the first or even the third quadrant. The investigation method can
also therefore be different. The six questions given above suggest how the
characteristics of the accident can be determined.
In addition to that other concerns may also play a role, such as resource demands, ease
of use, and consistency with other methods within the organisation or industry. While it
may be convenient, or even necessary, for an organisation to adopt a specific method as
its standard, this should always be done knowingly and with a willingness to reconsider
the choice when the conditions so demand it. Socio-technical systems, processes, and
organisations continuously change and develop, driven by internal and external forces
and demands. The methods that are available to manage those systems and to
investigate them when something goes awry, change at a much slower rate. Changes are
furthermore usually discrete rather than continuous. The often felt consequence of this
is that the available methods lag behind reality, often by as much as a decade or two.
The diagram of Figure 3 therefore only represents the situation at the time of writing,
i.e., around 2008. In five or ten years we must expect that the methods positioned in
quadrant 2 slowly will have been displaced towards quadrant 3, not because the
methods have changed but because the systems have. New and more powerful methods
will – hopefully – by then have been developed to accommodate this state of affairs.
38
9 Dictionary
Engelska Svenska
ATHEANA A Technique for Human Event Analysis En teknik för mänsklig händelseanalys
CICA Caractéristique Importante de la Karakteristika för olycksanalys
Conduite Accidentelle
CPC Common Performance Conditions Kontextuella förutsättningar
CREAM Cognitive Reliability and Error Analysis Kognitiv pålitlighets- och felanalysmetod
Method
DKV Operational Readiness Verification Driftklarhetsverifiering
(ORV)
EFC Error-Forcing Context Felhandlingsdrivande kontext
ETTO Efficiency-Thoroughness Trade-Off Effektivitets- och noggrannhetsavvägning
FRAM Functional Resonance Accident Model Resonansolycksmodell
HPES Human Performance Enhancement Mänskligt handlingsförbättrande system
System
INPO Institute of Nuclear Power Operation Institutet för kärnkraftsdrift (USA)
MERMOS Méthode d'Evaluation des Missions Säkerhetsutvärderingsmetod för
Opérateurs pour la Sécurité operatörer
MTO Man-Technology-Organisation Människa – Teknik - Organisation
ORV Operational Readiness Verification Driftklarhetsverifiering
WANO World Association of Nuclear Operators Världsorganisationen för kärnkraftsdrift
39
10 References
Benner, L. Jr., (1985). Rating accident models and investigation methodologies. Journal
of Safety Research, 16, 105-126.
Bento, J.-P. (1992). Människa, teknik och organisation. Kurs i MTO-analys för
Socialstyrelsen. Studsvik, Nyköping: Kärnkraftsäkerhet och Utbildnings AB.
Bird, F. E. Jr. & Germain, G. L. (1985). Practical loss control leadership. Georgia,
USA: International Loss Control Institute.
CCPS (1992). Guidelines for Investigating Chemical Process Incidents. Center for
Chemical Process Safety of the American Institute of Chemical Engineers.
CISHC (Chemical Industry and Safety Council), (1977). A guide to hazard and
operability studies. London: Chemical Industries Association.
Cooper, S. E., Ramey-Smith, A. M., Wreathall, J., Parry, G. W., Bley, D. C., & Luckas,
W. J. (1996). A Technique for Human Error Analysis (ATHEANA). Washington, DC:
Nuclear Regulatory Commission.
Dekker, S. (2006). The field guide to understanding human error. Aldershot, UK:
Ashagte.
Dianous, V. D. & Fiévez, C. (2006). ARAMIS project: A more explicit demonstration
of risk control through the use of bow–tie diagrams and the evaluation of safety barrier
performance. Journal of Hazardous Materials, 130(3), 220-233.
DOE. (1999). Conducting Accident Investigations: DOE Workbook (Revision 2, May 1,
1999). Washington, DC: U.S. Department of Energy.
FAA/NTIS (2000). The Human Factors Analysis and Classification System – HFACS
(DOT/FAA/AM-00/7). Washington, DC: Federal Aviation Administration.
Gordon, R., Flin, R. & Mearns, K. (2005). Designing and evaluating a human factors
investigation tool (HFIT) for accident analysis. Safety Science, 43, 147–171.
Harms-Ringdahl, L. (1987). Säkerhetsanalys i skyddsarbetet - En handledning.
Folksam, Stockholm.
Harms-Ringdahl, L. (1993). Safety analysis - Principles and practice in occupational
safety. Elsevier, London.
Harms-Ringdahl, L. (1996). Riskanalys i MTO perspektiv: Summering av metoder för
industriell tillämpning (SKI Raport 96:63). Stockholm, Sweden: SKI.
Heinrich, H. W. (1929). The foundation of a major injury. The Travelers Standard,
17(1), 1-10.
Heinrich, H. W. (1931). Industrial accident prevention: New York: McGraw-Hill.
Helmreich, R. L., Merritt, A. C. & Wilhelm, J. A. (1999). The evolution of Crew
Resource Management training in commercial aviation. International Journal of
Aviation Psychology, 9(1), 19-32.
Hendrick, K. & Benner, L. Jr. (1987). Investigating accidents with STEP. Marcel
Dekker.
Hollnagel, E. (1998). Cognitive reliability and error analysis method. Oxford, UK:
Elsevier Science Ltd.
Hollnagel, E. (2004). Barriers and accident prevention. Aldershot, UK: Ashgate.
40
Hollnagel, E. (2008). Investigation as an impediment to learning. In Hollnagel, E.,
Nemeth, C. & Dekker, S. (Eds.) Remaining sensitive to the possibility of failure
(Resilience engineering series). Aldershot, UK: Ashgate.
Hollnagel, E., Woods, D. D. & Leveson, N. G. (2006). Resilience engineering:
Concepts and precepts. Aldershot, UK: Ashgate.
IAEA (1999). Root cause analysis for fire events at nuclear power plants (IAEA-
TECDOC-1112). Vienna, Austria: IAEA.
INPO (1989). Human performance enhancement system: Coordinator manual (INPO
86-016, Rev. 02). Atlanta, GA: Institute of Nuclear Power Operations.
Isaac, A., Shorrock, S. & Kirwan, B. (2002) Human error in European air traffic
management: The HERA project. Reliability Engineering and System Safety, 75(2),
257-272.
Kirwan, B. (1994). A guide to practical human reliability assessment. London: Taylor
& Francis.
Le Bot, P., Cara, F., & Bieder, C. (1999). MERMOS, A second generation HRA method.
Proceedings of PSA ‘99, International Topical Meeting on Probabilistic Safety
Assessment", Washington, DC.
Leveson, N. G. (1995). Safeware - system safety and computers. Reading, MA:
Addison-Wesley.
Leveson, N. G. (2004). A New Accident Model for Engineering Safer Systems. Science,
42(4), 237-270.
MIL-STD-1629A (1980). Procedures for performing a failure mode, effects and
criticality analysis. Washington, DC: Department of Defence.
Moodi, M. & Kimball, S. (2004). Example application of procedural event analysis tool
(PEAT). Seattle, WA: Boeing Company.
Nouvel, D.; Travadel, S. & Hollnagel, E. (2007). Introduction of the concept of
functional resonance in the analysis of a near-accident in aviation. Ispra, Italy,
November 2007, 33rd ESReDA Seminar: Future challenges of accident investigation.
Perrow, C. (1984). Normal accidents: Living with high risk technologies. New York:
Basic Books, Inc.
Pringle, J. W. S. (1951). On the parallel between learning and evolution. Behaviour, 3,
175-215.
Reason, J. T. (1990). Human Error. Cambridge University Press
Reason, J. T. (1997). Managing the risk of organisational accidents. Aldershot, UK:
Ashgate.
Renborg, B., Jonsson, K., Broqvist, K. & Keski-Seppälä, S. (2007). Hantering av
händelser, nära misstag (SKI 2007:16). Stockholm: SKI.
Rollenhagen, C. (1995). MTO – En Introduktion: Sambandet Människa, Teknik och
Organisation. Lund, Sweden: Studentlitteratur.
Sawaragi, T.; Horiguchi, Y. & Hina, A. (2006). Safety analysis of systemic accidents
triggered by performance deviation. Bexco, Busan, South Korea, October 18-21. SICE-
ICASE International Joint Conference 2006.
Shorrock, S. T. & Kirwan, B. (1999). The development of TRACEr - A technique for the
retrospective analysis of cognitive errors in ATM. Proceedings of the 2nd International
Conference, 28-30 Oct. 1998, Oxford, UK. (Vol. 3, pp. 163-171).
41
Shorrock, S. T. & Kirwan, B. (2002). Development and application of a human error
identification tool for air traffic control. Applied Ergonomics, 33, 319–336.
Sklet, S. (2002). Methods for accident investigation (ROSS (NTNU) 200208).
Trondheim, Norway: NTNU.
Svensson, O. (2001). Accident and Incident Analysis Based on the Accident Evolution
and Barrier Function ( AEB) Model. Cognition, Technology & Work, 3(1), 42-52.
Swain, A. D. (1989). Comparative evaluation methods for human reliability analysis.
Köln, Germany: Gessellschaft für Reaktorsicherheit.
Takano, K., Sawayanagi, K. & Kabetani, T. (1994). System for analysing and
evaluating human-related nuclear power plant incidents. Journal of Nuclear Science
Technology, 31, 894-913.
van der Schaaf, T. & Kanse, L. (2004). Biases in incident reporting databases: an
empirical study in the chemical process industry. Safety Science, 42, 57-67.
Weick, K. E., Sutcliffe, K. M. & Obstfeld, D. (1999). Organising for high reliability:
processes of collective mindfulness. Research in Organisational Behaviour, 21, 81–
123.
Wickens, C. D. (1992). Engineering psychology and human performance. New York:
Harper-Collins.
Wilson, P. et. al., (1993). Root cause analysis – A tool for total quality management.
Milwaukee, WI: Quality Press.
Woods, D. D., Johannesen, L. J., Cook, R. I. & Sarter, N. B. (1994). Behind human
error: Cognitive systems, computers and hindsight. Columbus, OH: CSERIAC.
Worledge, D. (1992). Role of human performance in emergency systems management.
Annual Review of Energy and the Environment, 17, 285-300.
Yoshizawa,Y. (1999). Activities for on-site application performed in human factors
group. Proceedings of 3rd International Conference on Human Factors in Nuclear Power
Operation (ICNPO-III), Mihama, Japan.
42
SKI Report 2008:50
Research
Study on Developments in
Accident Investigation Methods:
www.ski.se A Survey of the “State-of-the-Art”
Erik Hollnagel
Josephine Speziali
January 2008
S TAT E N S K Ä R N K R A F T I N S P E K T I O N
Swedish Nuclear Power Inspectorate
POST/POSTAL ADDRESS SE-106 58 Stockholm
BESÖK/OFFICE Klarabergsviadukten 90
TELEFON/TELEPHONE +46 (0)8 698 84 00
May 2013
Copyright information
The information contained within this report is provided as guidance only.
While every reasonable care has been taken to ensure the accuracy of its
contents, the author and Loughborough University cannot accept any
responsibility for any action taken, or not taken, on the basis of this
information. The author and Loughborough University shall not be liable to
any person or organisation for any loss or damage which may arise from the
use of any of the information contained in this report.
© Loughborough University 2013
Contact information
Author contact details:
Peter Underwood, Loughborough Design School, CC.1.07, James France
Building, Loughborough University, Loughborough, Leicestershire, LE11 3TU,
UK.
E-mail address: [email protected]
i
Copyright information .................................................................................... i
Foreword ........................................................................................................ 1
1. Introduction ............................................................................................. 2
References ................................................................................................... 21
ii
Foreword
Accident analysis models and methods provide safety professionals with a
means of understanding why accidents occur. Choosing an analysis
technique is, however, not a simple process. A wide range of methods are
available; each offering various theoretical and practical benefits and
drawbacks. Furthermore, individuals engaged in accident investigation are
subjected to various factors, e.g. budgetary and time constraints, which can
influence their selection and usage of an analysis tool.
This report is based on an extensive review of the accident analysis literature
and an interview study conducted with 42 safety experts and has two aims.
Firstly, it provides an overview of the available analysis techniques and the
factors influencing an individual’s choice and usage of these methods. The
intention is to provide the reader with information that may enable them to
make a more informed selection of analysis tool. The second aim is to
present an analysis model currently used in industry. The intention is to
provide the reader with a validated method that can be readily employed, if
undertaking a detailed assessment of the available techniques is not
practicable.
1
1. Introduction
Understanding why accidents occur and how to prevent their recurrence is an
essential part of improving safety in any industry. Gaining this knowledge
requires determining why a certain combination of events, conditions and
actions lead to a specific outcome, i.e. accident analysis (Hollnagel et al.,
2008). Important tools used to achieve this understanding are the accident
causation model and accident analysis method. Analysis models provide a
conceptual representation of accident causation whereas analysis methods
provide a means of applying this theory.
The nature of accident causation has, however, become more complex over
time due to a number of factors, e.g. the rapid pace of technological advances
and more complex relationships between humans and technology (Leveson,
2011). Accident causation theory has also developed to capture this
increased complexity and numerous analysis models and methods have
emerged to apply this knowledge.
Selecting a technique to use from the wide range of analysis models and
methods presents a dilemma for any individual. The sheer number of
analysis tools (well in excess of 100) makes the task of assessing each one
impracticable. Other factors must, therefore, be considered when deciding
which technique is adopted and used, e.g. its usability and how well
established it is within industry.
1.1. Purpose and scope
The initial aim of this report is to provide an overview of the different
categories of analysis model and method which are available. The intention is
not to provide a detailed review of analysis techniques, as these are currently
available in the research and practitioner literature (e.g. Energy Institute,
2008; Johnson, 2003). Rather the purpose is to give the reader an
awareness of the general concepts underlying each category and provide a
focus for any further investigation they wish to undertake.
The report then presents a range of factors that influence an individual’s
approach to accident analysis and can prevent the adoption and usage of
analysis techniques. The aim is to provide the reader with an increased
awareness of the issues that shape their choice of method and provide a
framework to review their selection.
Finally, the report provides a description of an analysis technique that is
currently employed by a government accident investigation authority. The
method has been refined and validated over a period of years and enables
the application of accident causation theory in a practical and usable manner.
The purpose is to provide the reader with an ‘off-the-shelf’ analysis tool that
can be readily employed, if the identification and assessment of alternative
methods is not practicable.
2
2. Analysis models and methods
A key driver for the continued rise in analysis model and method numbers is
the ever-increasing complexity of socio-technical systems (which are
comprised of interacting human, technological and environmental
components) and the resulting change in accident causation mechanisms. As
researchers have sought to account for these changes, the ensuing
development of analysis techniques can be described as having gone through
three major phases, i.e. sequential, epidemiological and systemic. This
categorisation relates to the different underlying assumptions of accident
causation (Hollnagel and Goteman, 2004). This distinction is not obligatory
and other classification systems based on differing accident characteristics
exist (e.g. Kjellén, 2000) (Katsakiori et al., 2009). However, it helps explain
the desire of researchers to introduce systems theory concepts into accident
analysis, as detailed in the following sections.
2.1. Sequential techniques
The sequential class of models and methods describe accidents as the result
of time-ordered sequences of discrete events. They assume that an
undesirable event, i.e. a ‘root cause’, initiates a sequence of events which
lead to an accident and that the cause-effect relation between consecutive
events is linear and deterministic. This implies that the accident is the result
of this root cause which, if identified and removed, will prevent a recurrence of
the accident. Examples include the Domino model (Heinrich, 1931), Fault
Tree Analysis (Watson, 1961 cited in Ericson, 1999) and the Five Whys
method (Ohno, 1988).
These methods work well for losses caused by physical component failures or
the actions of humans in relatively simple systems and generally offer a good
description of the events leading up to an accident (Leveson, 2004). However,
the cause-effect relationship between the management, organisational and
human elements in a system is poorly defined by these techniques and they
are unable to depict how these causal factors triggered the accident
(Rathnayaka et al., 2011). From the end of the 1970’s it became apparent
that the sequential tools were unable to adequately explain a number of major
industrial accidents, e.g. Three Mile Island, Chernobyl and Bhopal.
Consideration for the role that organisational influences play in accidents was
required and resulted in the creation of the epidemiological class of analysis
tools.
2.2. Epidemiological techniques
Epidemiological models and methods view accidents as a combination of
‘latent’ and ‘active’ failures within a system, analogous to the spreading of a
disease (Qureshi, 2007). Latent conditions, e.g. management practices or
organisational culture, are likened to resident pathogens and can lie dormant
3
within a system for a long time (Reason et al., 2006). Such organisational
factors can create conditions at a local level, i.e. where operational tasks are
conducted, which negatively impact on an individual’s performance (e.g.
fatigue or high workload). The scene is then set for ‘unsafe acts’, such as
errors and violations, to occur. Therefore, the adverse consequences of
latent failures only become evident when they combine with unsafe acts, i.e.
active failures, to breach the defences of a system. The most well-known
epidemiological technique is the Swiss Cheese model (Reason, 1990, 1997),
which has formed the conceptual basis for various analysis methods, e.g. the
Human Factors Analysis & Classification System (HFACS) (Wiegmann and
Shappell, 2003) and Tripod Beta.
The epidemiological class of techniques better represent the influence of
organisational factors on accident causation, when compared with the
sequential tools. Given that they require an individual to look beyond the
proximal causes of an accident and examine the impact of a system’s latent
conditions, a more comprehensive understanding of an accident can be
achieved. However, many are still based on the cause-effect principles of the
sequential models, as they describe a linear direction of accident causation
(Hollnagel, 2004). From the late 1990’s, a number of researchers e.g.
(Rasmussen, 1997; Leveson, 2001; Svedung and Rasmussen, 2002) argued
that these epidemiological techniques were no longer able to account for the
increasingly complex nature of socio-technical system accidents. The
application of systems theory was subsequently proposed as a solution to this
issue.
2.3. Systemic techniques
Systems theory is designed to understand the structure and behaviour of any
type of system. Rather than treating accidents as a sequence of cause-effect
events, it describes losses as the unexpected behaviour of a system resulting
from uncontrolled relationships between its constituent parts. In other words,
accidents are not created by a combination of latent and active failures; they
are the result of humans and technology operating in ways that seem rational
at a local level but unknowingly create unsafe conditions within the system
that remain uncorrected. From this perspective, simply removing a ‘root
cause’ from a system will not prevent the accident from recurring. A holistic
approach is required whereby safety deficiencies throughout the entire system
must be identified and addressed. A range of systemic tools exist which
enable the application of the systems approach, e.g. the Systems Theoretic
Analysis Model and Processes model (STAMP) (Leveson, 2004, 2011), the
Functional Resonance Analysis Method (FRAM) (Hollnagel, 2004, 2012) and
the Accimap (Rasmussen, 1997).
Whilst these systemic techniques appear to provide a deeper understanding
of accident causation, various studies suggest they are more resource
4
intensive and require considerable amounts of domain and theoretical
knowledge to apply (e.g. Ferjencik, 2011; Johansson and Lindgren, 2008).
Furthermore, the latest version of the Swiss Cheese model (see Reason,
1997) acknowledges that active failures are not always required for an
accident to happen; long-standing latent conditions are sometimes all that is
required, as was the case in the Kings Cross, Piper Alpha and the space
shuttles Challenger and Columbia accidents (see Reason et al., 2006). It also
acknowledges that latent conditions can be better described as organisational
factors, rather than management failures. This represents top-level
managerial decisions as ‘normal behaviour’ influenced by the local conditions,
resource constraints and objectives of an organisation.
The distinction between the epidemiological and systemic perspective of
accidents, therefore, seems to be a subtle one. However, a number of
studies have compared systemic methods with established Swiss Cheese
based methods, such as HFACS (Salmon et al. 2012) and the Systemic
Occurrence Analysis Methodology (e.g. Arnold, 2009) and commented that
the systemic techniques do provide a deeper understanding of how the
behaviour of the entire system can contribute to an accident.
Whilst the ‘systems approach’ is arguably the dominant concept within
accident analysis research, systemic models and methods are yet to gain
widespread acceptance within the practitioner community (Underwood and
Waterson, 2013).
2.4. Model and method category selection
In order to choose which category of analysis technique best suits an
individual’s needs, a useful starting point is to consider the type of system
being analysed. Systemic techniques are designed to provide a depth of
understanding for complex accidents that is greater than the sequential and
epidemiological models and methods. However, it may be inefficient to use
these more complex and powerful methods to investigate accidents in simple
systems. Therefore, understanding the complexity of the system in question
will help to identify the most suitable method. Hollnagel (2008) provides a
means of characterising systems, based on the work of Perrow (1984), which
considers their coupling and tractability (manageability).
The coupling of a system can vary between being loose and tight and refers
to how subsystems and/or components are functionally connected or
dependent upon each other. Tightly coupled systems can be described as
follows:
• Buffers and redundancies are purposively part of the design
• Delays in processing are not possible
• Process sequences are invariant
5
• The substitution of supplies, equipment, personnel is limited and
anticipated in the design
• There is little slack possible in supplies, equipment, and personnel
• There is only one method to reach the goal
• Tightly coupled systems are difficult to control because an event in one
part of the system quickly will spread to other parts
A system’s manageability can vary from high (tractable) and low (intractable).
A tractable system can be characterised as:
• The principles of the system’s functioning are known
• System descriptions are simple and with few details
• The system does not change while it is being described, i.e. changes in
system activities are slow enough that the whole system can be described
completely and in detail
Hollnagel (2008) suggests that a good example of a tractable system is the
normal functioning of a post office, or the operation of a home furnace. He
also proposes that the outage at a nuclear power plant or the activities in a
hospital emergency department represent good examples of intractable
systems, given that their activities are not standardised and change so rapidly
that it is never possible to produce a detailed and complete description.
Using the dimensions of coupling and manageability, Hollnagel (2008)
characterises a number of systems (see Figure 1).
6
High Manageability Low
Financial
Tight
Power
grids Nuclear markets
Dams power plants
Air traffic
control Chemical
plants
Space
Railways missions
Marine
Coupling
Public
services
R&D
Mining
companies
Manufacturing
Loose
Post
Universities
offices
The locations of the systems presented in Figure 1 are illustrative and the list
is clearly not exhaustive. Therefore, the reader is encouraged to consider the
characteristics of their own organisation/system and its location on Figure 1.
Hollnagel and Speziali (2008) provide a number of questions to help
determine these characteristics:
System characteristic
Question
Tractable Intractable
Was the accident similar to something that
has happened before, or was it new and
Similar New and unknown
unknown? (The answer should be based
accident accident
on the history of the organisation and the
industry it operates in)
Was the organisation ready to respond to
the accident, in the sense that there Ready to
Not ready to respond
were established procedures or guidelines respond
available?
7
System characteristic
Question Loosely
Tightly coupled
coupled
Was the accident and the material
consequences confined to a clearly
Delimited Multiple subsystems,
delimited subsystem (technological or
subsytem whole system
organisational) or did it involve multiple
subsystems, or the whole installation?
Were the consequences on the whole
expected / familiar or were they novel / Expected Novel
unusual?
Were the consequences in proportion to
Proportional Consequences of
the initiating event, or were they
consequences unexpected
unexpectedly large (or small)?
Table 1 - System characteristics criteria (based on Hollnagel and Speziali,
(2008))
8
addressed by their current analysis tool they should consider using an
alternative, more powerful, technique.
In addition, determining how much of a system will be analysed should also
be considered. If an individual system component, e.g. a single human
operator, or a sub-system, e.g. an aircraft fuel system, is to be analysed then
a simpler sequential method may be appropriate. If the entire system is to be
examined and the analysis will incorporate the organisational (and possibly
regulatory and governmental) contribution to an accident, then
epidemiological or systemic techniques should be considered.
Financial
Tight
Power
grids Nuclear markets
Dams power plants
Air traffic
control Chemical
plants
Space
Railways missions
Marine
Coupling
Public
services
R&D
Mining
companies
Manufacturing
Loose
Post
Universities
offices
9
3. Influences on analysis model and method selection
Whilst the selection of an analysis technique may be affected by the
characteristics of the system in which it is employed, a number of other
influential factors exist. A range of these additional issues were identified in a
study carried out by Underwood and Waterson (2013). Interviews were
conducted with 42 safety professionals based in ten countries. The nine full
time accident investigators, 17 health and safety professionals, ten human
factors specialists and six researchers had experience of working in at least
one of 25 industries. The interviewees were asked about their current
approach to accident and/or risk analysis, their knowledge of analysis
techniques and their views on the communication between the researcher and
practitioner communities.
The factors that were considered to influence the selection of analysis
techniques are detailed in the remainder of Section 3.
3.1. Model and method awareness
In order to use an analysis technique it is clear that an individual must first
become aware of it. However, various issues exist which may prevent this
from occurring.
Some individual’s simply have no desire to change their current approach
and, therefore, have no need for new information. In this case it is important
that the individual has evidence that their chosen analysis method provides a
sufficient understanding of accidents to develop safety recommendations that
prevent recurrence. If the same accidents keep occurring, despite efforts to
prevent them, then the individual should consider using a more powerful
analysis tool to gain a deeper understanding of why this is so. If this tool
provides further insights into the causes of the accidents then more effective
recommendations may be devised.
Awareness of analysis methods is also dictated, at least in part, by the level of
training received by an individual. The extent of training received appears to
be role-dependent. Full-time investigators, for example, sometimes receive
extensive training via university-level courses, whereas practitioners with
varying degrees of involvement with accident investigation may receive less
training or none at all.
Individuals who are not provided with training and undertake a search for
information regarding analysis methods face issues which may limit their
awareness. Such issues include the cost of information and the time required
to gather and read it. In addition, some accident analysis information
presented in the academic literature maybe considered by some individuals to
be too conceptual and provide little or no practical benefit.
10
Providing usage guidance for the various analysis methods is beyond the
scope of this report. However, for those individuals who require information
about accident analysis techniques, the reader is referred to the material
listed in Appendix A.
3.2. Model and method adoption
Even if a sufficient awareness of analysis methods is obtained by an
individual, various barriers may prevent a technique from being adopted. For
example, the needs of end users may not have been successfully accounted
for during the development of an analysis method. An individual’s decision to
adopt a method can also be based on personal selection criteria, such as how
well the technique’s approach suits their way of thinking or whether they have
previously used the method.
The analysis approach taken by an individual can be influenced by their need
to assign liability for an accident. Some individuals prefer (or are mandated)
to avoid seeking blame in favour of focusing on safety improvements. This
may lead them to using methods which focus on safety deficiencies
throughout an entire system, e.g. a systemic method (see Section 2.3).
However, others more concerned with the commercial and legal implications
of accidents may select a method which simplifies the task of singling out a
‘root cause’ to blame for an accident, e.g. a sequential technique (see Section
2.1). This is particularly evident when those who are conducting an
investigation may be deemed culpable and are incentivised to apportion
liability elsewhere.
The track record of use within industry that a method has established plays an
important part in whether it is adopted by individuals and organisations.
Without a history of application in practice, there can be reluctance to trial new
analysis methods, as their credibility maybe questioned.
3.3. Model and method usage
The level of effort invested in an analysis will be based, at least in part, by the
resources available to an investigation team. Consequently this can affect
whether an individual/team employs more complex analysis techniques. In
addition to affecting which analysis method is used, the time and financial
constraints involved in accident investigation can also affect how it is used.
The depth of analysis that can be achieved, for example, is limited by the time
available to the investigation team.
The usability of an analysis method will affect whether an analysis is
performed effectively and efficiently. In order for a technique to have
adequate usability it must be easy to understand and apply. Consideration
should, therefore, be given to the availability and clarity of guidance material
as well as the training and resources required to use a given analysis method.
11
The graphical output of a method will affect the ability of an individual (or team
of investigators) to successfully perform an analysis. Graphically representing
an accident has been considered to be useful by both researchers (e.g. Sklet,
2004; Svedung and Rasmussen, 2002) and practitioners (e.g. Australian
Transport Safety Bureau, 2008) for a number of reasons. For example, it can
be easier to see the relationships between system components and identify
gaps/weaknesses in the analysis. Also, charting an accident can also be
useful for communicating the findings of complex investigations (Australian
Transport Safety Bureau, 2008). Therefore, it is important to consider if a
given method provides these benefits and the resources which are required to
graphically describe the accident. For example, does the analysis method
need specialised charting software or the simpler combination of sticky-notes
and a whiteboard?
A number of factors related to the reliability of a method (i.e. the consistency
of results obtained when a given accident is analysed separately by different
individuals or reanalysed by the same person) can also affect its usage. For
example, an individual’s background and experience can influence their
analysis approach and produce variation in investigation findings. Open
discussions and analysis reviews which result in a consensus on the
investigation findings can help minimise the biasing effects of individuals’
backgrounds; a process which is common with full-time investigators.
However, the qualitative nature of some analysis tools may increase the
difficulty of reaching such an agreement. The reliability of a method is further
affected by the availability and clarity of usage guidance. Less guidance
increases the flexibility of an analysis and gives an individual more freedom to
probe into different aspects of an event. Whilst this flexibility maybe suited to
an experienced investigator, a more structured approach may improve the
consistency of analysis outputs of less experienced individuals.
Reliability is particularly important if accident trend analysis is to be
performed. The greater the reliability of a method and its outputs, the more
the results of any trend analysis can be trusted. The use of causal factor
taxonomies can greatly enhance the reliability of an analysis method, if the
taxonomy is appropriate for the industry in which the accident of interest
occurred in. Some methods (e.g. HFACS) have been devised with industry-
specific taxonomies. However, taxonomies can be restrictive and may require
an investigator to ‘force fit’ a piece of information into the classification
system. Therefore, it is important to understand whether a given taxonomy
meets the needs of the investigation team.
Furthermore, individuals may not be able to gain access to the data required
for some of the more complex, e.g. systemic, methods. For example, such
information may exist outside of the organisation ‘affected‘ by the accident
(e.g. commercially sensitive documentation from an equipment supplier) or an
12
individual maybe in the ‘wrong‘ position within an organisation to address the
whole scope of an accident (e.g. unable to interview senior managers) (Dien
et al., 2012).
3.4. Organisational and industry influences on model and method usage
Some individuals have the freedom to choose which analysis technique they
adopt and use. However, in many cases, organisational policy dictates which
methods are used. Organisational policy can also impact on the resources
available for practitioners to learn and use new analysis methods.
The degree of regulation within a given industry can have a large influence on
what type of analysis techniques are used in accident investigation and risk
assessments. For example, regulation in the nuclear industry is prescriptive,
with regards to the use of analysis methods. Regulation in other industries
however, e.g. civil aviation, provides the investigator with a greater degree of
method selection flexibility, despite the adoption of a given analysis model by
the regulator (such as the Swiss Cheese model used by the International Civil
Aviation Organization).
The effort and cost of implementing a new analysis method within an
organisation, or throughout an industry, by means of new regulations can
create resistance to change. This inertia can also increase due to a number
of other factors, e.g. the level of industry regulation or the number of
stakeholders involved in effecting the change.
3.5. Method and model selection summary
Any of the factors described in Sections 3.1 – 3.4 may prevent an individual
from becoming aware of, adopting and/or using a new analysis technique.
However, it is likely that they all, to a greater or less extent, combine to inhibit
the application of new models and methods.
Some individuals may not be in a position to influence some/all of these
factors and, therefore, will have to continue using their current analysis
method. However, if the investigator has an opportunity to select which
technique they will use, considering the following questions may help them
reach a more informed decision.
• How complex is the system to be analysed, i.e. what is the level of
coupling and tractability of the system?
• How much of the system will be analysed?
• What is the type of method that I currently use (sequential, epidemiological
or systemic) and is it suitable for analysing the system I am interested in?
• What alternative methods are available and are they more suitable for the
current analysis?
13
• How easy is the method to understand and use?
• How much usage guidance material is available?
• What resources are required to use the method, e.g. specialist software?
• Does the graphical output of the method help facilitate the analysis, e.g.
identify evidence gaps?
• Does the method provide a useful means of communicating the findings of
an analysis with others, e.g. colleagues or non-experts?
• How reliable is the method?
• Does the method have a structured application process?
• Does the method have a taxonomy of factors which contribute to an
accident?
• Do I need to perform accident trend analysis and, if so, does a method
exist that uses a suitable taxonomy or do I need to devise my own
classification scheme?
An important point to note is that, while analysis methods enable an individual
to apply a given view of accident causation to their evidence, no single
technique can capture the complexity of a system. Indeed, by definition,
analysis models (and their associate methods) are only a representation of
reality.
Therefore, individuals engaged in accident analysis should not consider that
one technique is necessarily appropriate to analyse every aspect of every
accident. The analyst should not force fit evidence into their analysis, or
reject it, simply to comply with the application requirements of their chosen
method. While a method will guide the analyst to collect evidence and help
interpret the data, the analysis should not be constrained by the method.
Therefore, it maybe necessary to use more than one method so that the
strengths of one technique will compensate for the weaknesses of another.
For example, a sequential method maybe more suitable to analyse the
technical failures in a system, whereas a systemic technique maybe more
effective at analysing the wider organisational issues. This multi-method
approach has been has been successfully utilised by both researchers (e.g.
Ferjencik, 2012; Harris and Li; 2011) and practitioners (e.g. Australian
Transport Safety Bureau, 2008 p.38; Dutch Transport Safety Board, 2012)
14
4. A useful analysis model
As described in Section 1, there are many accident analysis models and
methods available. Whilst this report has so far provided some guidance on
how to select an appropriate analysis technique, it is acknowledged that
individuals may not have time to perform a comprehensive method
comparison. Therefore, this section provides the reader with an ‘off-the-shelf’
analysis tool that can be readily employed.
The analysis technique in question is the Australian Transport Safety Bureau
(ATSB) accident investigation model and has been used in transport accident
investigations by the ATSB since 2002 (ATSB, 2008). As such, the model
has been empirically validated by a governmental investigation agency, which
is highly regarded within the accident investigation community (ATSB, 2008).
Therefore, the ATSB model arguably represents a ‘tried and tested’ class-
leading analysis technique. Furthermore, a detailed (and publically available)
description of the model and its use is provided by the ATSB (2008).
Therefore, the user of the model has free access to guidance material which
can enhance the usability and reliability of the model.
4.1. Description of the ATSB model
The ATSB investigation analysis model (referred to hereafter as the ‘ATSB
model’) is a modified version of the well-known Swiss Cheese model (SCM).
As per the SCM, the ATSB model provides a general framework that can
guide data collection and analysis activities during an investigation (ATSB,
2008 p.36). However, various alterations to the original SCM were made by
the ATSB to improve its usability and the identification of potential safety
issues. Such changes include an enhanced ability to combine technical
issues into the overall analysis, the use of neutral language and emphasising
the impact of preventative, as well as reactive, risk controls. To highlight the
changes made, the ATSB (2008) presented a latter version of the SCM (see
Fig. 3) and their adaptation to it (see Fig. 4).
Defences
15
Figure 4 – ATSB adaptation of the SCM (adapted from ATSB (2008))
16
Organisational Influences h
t
a
What could have been in place to minimise problems p
n
with the risk controls? o
it
Safety
a
issues g
it
Risk Controls s
e
v
What could have been in place to reduce the likelihood n
of or severity of problems at the operational level?
I
Local Conditions
What aspects of the local environment may have
influenced the individual actions/technical problems?
Occurrence Events
(including technical problems)
What events best describe the occurrence?
17
occurrence events. These controls facilitate and guide performance at the
operational level and can include procedures, training, equipment design and
work rosters. Recovery controls are put in place to detect and correct (or
otherwise minimise) the adverse effects of local conditions, individual actions
and occurrence events. These ‘last line’ controls include warning systems,
emergency equipment and emergency procedures.
Organisational influences are those conditions which influence the
effectiveness of an organisation’s risk controls and can be classed as internal
organisational conditions or external influences. Internal organisational
conditions are the safety management processes and other organisational
characteristics which influence the effectiveness of its risk controls. Examples
of safety management processes include hazard identification, risk
assessment, change management and training needs analysis. External
influences are the processes and characteristics of external organisations
which impact on an organisation’s risk controls and its internal organisational
conditions. Various external influences exist, e.g. regulatory standards and
surveillance or pressures and standards provided by industry associations
and international standards organisations.
4.3. ATSB model usage
The ATSB suggest that the most effective way of using the model to identify
potential safety factors is to start at the bottom level and work upwards,
asking a series of strategic questions. Broad questions for each level are
included in Fig. 5. The ATSB (2008 p.49-56) also provide detailed guidance
on their investigation approach and how potential safety factors can be tested
for their existence, their influence on an accident and whether they require
further analysis.
Many accident analysis techniques use charts to graphically represent the
findings of an investigation and the ATSB model is no exception. Use of
analysis charts can make it easier to see the potential relationships between
safety factors, identify gaps in the analysis which require further explanation.
Furthermore, charts can also be useful for communicating the findings of
complex investigations. A charting format preferred by the ATSB is based on
the Accimap method (Rasmussen, 1997). It shows the occurrence events
involved in an accident from left to right and adds the contributing safety
factors to these events in a series of hierarchical layers. The influence that a
given safety factor has on others is indicated by a connecting arrow. An
example of such an analysis chart is presented in Fig. 6. In the ATSB’s
experience, the use of this charting format has considerably helped the
explanation of complex accidents and incidents to industry personnel during
presentations and courses.
18
2
0
0
8
©
C
o
m
m
o
n
w
e
a
lth
o
f
A
u
st
ra
lia
19
As with any model of accident causation, the ATSB model has limitations. For
example, many safety factors can be proposed which do not neatly fall into
one of the levels. Furthermore, the limited descriptive nature of the model
does not fully explain the complex, dynamic nature of accident development.
An important example is the concept of local rationality. Actions and
decisions taken by people at all levels of a system are affected by their local
goals, resource constraints and external influences. To understand why an
individual (or team) took a decision or course of action, such activity must be
placed in context by examining the local conditions. The ATSB model
explicitly addresses this requirement at the operational level, however, the
context in which organisational influences were generated are not
incorporated into the model. Therefore, the user should investigate, if
possible, the local conditions that were present at the organisational level of a
system. This will help achieve a deeper understanding of an accident and
avoid the inappropriate blaming of an organisation’s management.
As well as investigating individual accidents and incidents, there is often a
need to analyse data from multiple events to identify trends in contributing
factors. The use of taxonomies to classify contributing factors is a convenient
way to achieve this, albeit that they restrict the flexibility of an analysis (see
Section 3.3). The ATSB model does not have a publically available taxonomy
so the user, if free to do so, would need to devise an appropriate classification
system for their organisation/industry. However, many users may already
have a given taxonomy in place, which is incorporated into an organisational
and/or regulatory database, and this may not be possible.
Despite these limitations, the ATSB (2008) state that their experience of using
the model has shown that it provides an appropriate balance between ease of
use and full realism when identifying potential safety factors and
communicating the findings of safety investigations.
20
References
Arnold, R., 2009. A qualitative comparative analysis of SOAM and STAMP in
ATM occurrence investigation. Lund University.
Australian Transport Safety Bureau, 2008. Analysis, causality and proof in
safety investigations Aviation Research and Analysis Report AR-2007-053.
Canberra City: Australian Transport Safety Bureau.
Dien, Y., Dechy, N. and Guillaume, E., 2012. Accident investigation: From
searching direct causes to finding in-depth causes – problem of analysis
or/and of analyst? Safety Science, 50(6), pp. 1398-1407.
Dutch Transport Safety Board, 2012. Experiences and challenges in using
STAMP for accident analysis, First STAMP/STPA Workshop at MIT, April
2012, Massachusetts Institute of Technology.
Energy Institute, 2008. Guidance on investigating and analysing human and
organisational factors aspects of incidents and accidents. London: Energy
Institute.
Ericson, C.A., 1999. Fault tree analysis - A history, 17th International System
Safety Conference, 16-21 Aug 1999, System Safety Society, pp. 1-9.
Ferjencik, M., 2011. An integrated approach to the analysis of incident
causes. Safety Science, 49(6), pp. 886-905.
Harris, D. and Li, W.-C., 2011. An extension of the human factors analysis
and classification system for use in open systems. Theoretical Issues in
Ergonomics Science, 12(2), pp. 108-128.
Heinrich, H.W., 1931. Industrial accident prevention. New York, NY: McGraw-
Hill.
Hollnagel, E., 2012. FRAM – the functional resonance analysis method.
Farnham: Ashgate.
Hollnagel, E., 2008. The changing nature of risks. Ergonomics Australia, 22(1-
2), pp. 33-46.
Hollnagel, E., 2004. Barriers and accident prevention. Aldershot: Ashgate
Publishing Limited.
Hollnagel, E. and Goteman, Ö., 2004. The Functional Resonance Accident
Model, Cognitive System Engineering in Process Control 2004, 4 - 5 Nov
2004, CSEPC, pp. 155-161.
Hollnagel, E., Pruchnicki, S., Woltjer, R. and Etcher, S., 2008. Analysis of
Comair flight 5191 with the Functional Resonance Accident Model, 8th
International Symposium of the Australian Aviation Psychology Association, 8
- 11 Apr 2008, Australian Aviation Psychology Association.
Hollnagel, E. and Speziali, J., 2008. Study on developments in accident
investigation methods: A survey of the'state-of-the-art. SKI Report 2008:50.
Sophia Antipolis, France: Ecole des Mines de Paris.
21
Johansson, B. and Lindgren, M., 2008. A quick and dirty evaluation of
resilience enhancing properties in safety critical systems, E. Hollnagel, F. Pieri
and E. Rigaud, eds. In: Third Symposium on Resilience Engineering, 28 - 30
Oct 2008, École des mines de Paris.
Johnson, C., 2003. Failure in safety critical systems: A handbook of incident
and accident reporting. Glasgow: Glasgow University Press.
Katsakiori, P., Sakellaropoulos, G. and Manatakis, E., 2009. Towards an
evaluation of accident investigation methods in terms of their alignment with
accident causation models. Safety Science, 47(7), pp. 1007-1015.
Kjellén, U., 2000. Prevention of accidents through experience feedback.
London: Taylor and Francis.
Leveson, N., 2011. Engineering a safer world: Systems thinking applied to
safety. London: The MIT Press.
Leveson, N., 2004. A new accident model for engineering safer systems.
Safety Science, 42(4), pp. 237-270.
Leveson, N., 2001. Evaluating accident models using recent aerospace
accidents. Part I: Event-Based Models. Cambridge, MA: Massachusetts
Institute of Technology.
Ohno, T., 1988. Toyota production system: Beyond large-scale production.
Portland, OR: Productivity Inc.
Perrow, C., 1984. Normal accidents: Living with high-risk technologies. New
York, NY: Basic Books.
Qureshi, Z.H., 2007. A review of accident modelling approaches for complex
socio-technical systems, T. Cant, ed. In: 12th Australian Workshop on Safety
Related Programmable Systems, 2007, Australian Computer Society, pp. 47-
59.
Rasmussen, J., 1997. Risk management in a dynamic society: A modelling
problem. Safety Science, 27(2-3), pp. 183-213.
Rathnayaka, S., Khan, F. and Amyotte, P., 2011. SHIPP methodology:
Predictive accident modeling approach. Part I: Methodology and model
description. Process Safety and Environmental Protection, 89(3), pp. 151-164.
Reason, J., 1997. Managing the risks of organizational accidents. Aldershot:
Ashgate Publishing Ltd.
Reason, J., 1990. Human error. Cambridge: Cambridge University Press.
Reason, J., Hollnagel, E. and Paries, J., 2006. Revisiting the «Swiss cheese»
model of accidents. EEC Note No. 13/06. Brétigny-sur-Orge, France:
EUROCONTROL Experimental Centre.
Salmon, P.M., Cornelissen, M. and Trotter, M.J., 2012. Systems-based
accident analysis methods: A comparison of accimap, HFACS, and STAMP.
Safety Science, 50(4), pp. 1158-1170.
22
Sklet, S., 2004. Comparison of some selected methods for accident
investigation. Journal of Hazardous Materials, 111, pp. 29-37.
Svedung, I. and Rasmussen, J., 2002. Graphic representation of accident
scenarios: Mapping system structure and the causation of accidents. Safety
Science, 40(5), pp. 397-417.
Underwood, P. and Waterson, P., 2013. Systemic accident analysis:
Examining the gap between research and practice. Accident Analysis &
Prevention, 55, pp. 154-164.
Watson, H.A., 1961. Launch control safety study. Section VII, Volume I.
Murray Hill: Bell Laboratories.
Wiegmann, D.A. and Shappell, S.A., 2003. A human error approach to
aviation accident analysis: The human factors analysis and classification
system. Burlington, Vermont, USA: Ashgate Publishing Ltd.
23
Appendix A – Useful sources of accident analysis information
The sources of information provided below give a general coverage of the
accident analysis and its associated methods.
Free sources of information
• Australian Transport Safety Bureau (2008) – Analysis, Causality and Proof
in Safety Investigations
This document provides an overview of how the ATSB conduct investigations
and the analysis model they have developed, as well as a useful summary of
the Swiss Cheese model and how suitable it is for accident analysis.
http://www.atsb.gov.au/media/27767/ar2007053.pdf
• Energy Institute (2008) – Guidance on investigating and analysing human
and organisational factors aspects of incidents and accidents
This document offers practitioner-focused guidance on accident investigation,
analysis method selection and an overview of a number of different analysis
tools.
http://www.energyinstpubs.org.uk/tfiles/1354473348/817.pdf
• Erik Hollnagel’s website
This website provides details about the FRAM systemic analysis method and
a list of publications utilising the technique.
http://www.functionalresonance.com/
• Hollnagel and Speziali (2008) – Study on Developments in Accident
Investigation Methods: A Survey of the “State-of-the-Art”
This report provides a useful overview of some accident analysis methods
and their suitability for analysing systems with different complexities.
http://hal.archives-ouvertes.fr/docs/00/56/94/24/PDF/SKI-Report2008_50.pdf
• Johnson (2003) - Failure in safety critical systems: A handbook of incident
and accident reporting
This book provides a detailed description of various facets of accident
investigation, including accident analysis methods.
http://www.dcs.gla.ac.uk/~johnson/book/
• Nancy Leveson’s website
This website provides numerous articles and presentations about the use of
the systemic STAMP method for accident and hazard analysis.
sunnyday.mit.edu/
24
• Qureshi (2007) - A review of accident modelling approaches for complex
socio-technical systems
This article provides a comprehensive overview of the development of
accident causation theory and techniques. It is available from the following
website (a free account must be set up in order access the full document).
http://dl.acm.org/citation.cfm?id=1387046&dl=ACM&coll=DL&CFID=2154499
66&CFTOKEN=66373980
• Reason et al. (2006) – Revisiting the Swiss Cheese model of accidents
This report, prepared for EUROCONTROL gives a detailed account about the
development and current status of the well known Swiss Cheese model.
http://www.eurocontrol.int/eec/gallery/content/public/document/eec/report/200
6/017_Swiss_Cheese_Model.pdf
Other sources of information
• Dekker, S., 2006. The field guide to understanding human error. Aldershot:
Ashgate Publishing Limited.
This book by Sydney Dekker provides an accessible introduction to the ‘new
view’ of accidents, which promotes the avoidance of hindsight and blame.
• Leveson, N., 2011. Engineering a Safer World: Systems Thinking Applied
to Safety. The MIT Press, London.
Comprehensive coverage of systems theory, STAMP and its various
applications is contained in this book. Also, a description is provided as to
why systemic accident analysis is required.
• Hollnagel, E., 2012. FRAM—The Functional Resonance Analysis Method.
Ashgate, Farnham.
The FRAM method is described and demonstrated in this book, along with
information about its underlying theory and the need for systemic accident
analysis.
• Salmon et al., 2011. Human factors methods and accident analysis:
practical guidance and case study applications. Farnham: Ashgate
Publishing Limited.
This book provides a number of examples of accident analysis methods and
how they are applied.
25