Stanton Etal Applied Ergonomics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228084915

Using existing HEI techniques to predict pilot error: a comparison of SHERPA,


HAZOP and HEIST

Article · September 2002

CITATIONS READS

7 651

8 authors, including:

Paul Matthew Salmon Neville A Stanton


University of the Sunshine Coast University of Southampton
426 PUBLICATIONS   9,670 CITATIONS    952 PUBLICATIONS   22,345 CITATIONS   

SEE PROFILE SEE PROFILE

Mark S. Young Donald Harris


Loughborough University Coventry University
141 PUBLICATIONS   6,445 CITATIONS    224 PUBLICATIONS   3,232 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Systems Approaches to Risk Assessment View project

HI:DAVe- psychophysiological monitoring of the driver state in semi-automated vehicles: circadian effect View project

All content following this page was uploaded by Donald Harris on 29 May 2014.

The user has requested enhancement of the downloaded file.


ARTICLE IN PRESS

Applied Ergonomics xxx (2008) 1–8

Contents lists available at ScienceDirect

Applied Ergonomics
journal homepage: www.elsevier.com/locate/apergo

Predicting pilot error: Testing a new methodology and a multi-methods and


analysts approach
Neville A. Stanton a, Paul Salmon b, *, Don Harris d, Andrew Marshall e, Jason Demagalski d,
Mark S. Young c, Thomas Waldmann f, Sidney Dekker g
a
University of Southampton, Transportation Research Group, School of Civil Engineering and the Environment Highfield, Southampton SO17 1BJ, UK
b
Accident Research Centre, Monash University, Building 70, Clayton, Victoria 3800, Australia
c
Brunel University, BIT-LAB, Uxbridge, Middlesex UB8 3PH, UK
d
Cranfield University, Human Factors Group, School of Engineering, Cranfield, Bedford MK43 OAL, UK
e
Marshall Associates, London, UK
f
University of Limerick, Ireland
g
Lund University, Sweden

a r t i c l e i n f o a b s t r a c t

Article history: The Human Error Template (HET) is a recently developed methodology for predicting design-induced
Received 12 December 2006 pilot error. This article describes a validation study undertaken to compare the performance of HET
Accepted 10 October 2008 against three contemporary Human Error Identification (HEI) approaches when used to predict pilot
errors for an approach and landing task and also to compare analyst error predictions to an approach to
Keywords: enhancing error prediction sensitivity: the multiple analysts and methods approach, whereby multiple
Human error
analyst predictions using a range of HEI techniques are pooled. The findings indicate that, of the four
Human Error Identification
methodologies used in isolation, analysts using the HET methodology offered the most accurate error
Error prediction
Reliability and validity predictions, and also that the multiple analysts and methods approach was more successful overall in
terms of error prediction sensitivity than the three other methods but not the HET approach. The results
suggest that when predicting design-induced error, it is appropriate to use a toolkit of different HEI
approaches and multiple analysts in order to heighten error prediction sensitivity.
Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction Despite the superfluity of HEI techniques available (a methods


review identified over 50 approaches – see Stanton et al., 2005) and
Within complex sociotechnical systems around 75% of all their increased application, they are relatively rarely used in the
accidents and safety compromising incidents are attributed, in part domain of the civil flight deck. This is surprising since it has
at least, to human error. There are many means of reducing or previously been established that the major cause of aviation acci-
mitigating human error; one approach involves the use of struc- dents is human error (McFadden and Towell, 1999); around 75% of
tured methods to predict the errors that are likely to be made by commercial aviation accidents are attributed to human error (Civil
operators during task performance. Human Error Identification Aviation Authority, 1998). Further, a number of high profile aviation
(HEI) works on the premise that an understanding of both the work incidents have been attributed, at least in some part, to design-
task and the characteristics of the technology being used allows induced human error, including the Nagoya Airbus A300-600
analysts to predict, a priori, potential errors that may arise from the accident (where the pilots could not disengage the go-around
resulting interaction (Baber and Stanton, 1996; Stanton and Baber, mode after inadvertent activation due to a lack of understanding of
2002). The use of HEI techniques is now widespread, with appli- the automation and poor design of the operating logic in the
cations in a wide range of domains including the nuclear power and autoland system), the Cali Boeing 757 accident (where the poor
petro-chemical processing industries (Kirwan, 1996), air traffic interface design of the flight management computer and a lack of
control (Shorrock and Kirwan, 2002), aviation (Harris et al., 2005), logic checking led to a controlled flight into terrain accident) and
space operations (Nelson et al., 1998), health care (Lane et al., 2006) the Strasbourg A320 accident at Mont St Odile (where the crew
and public technology (Baber and Stanton, 1996). inadvertently set an excessive descent rate instead of manipulating
the flight path angle as a result of both functions using a common
control interface and an associated poorly designed display).
* Corresponding author. As part of a DTI/EUREKA! funded project investigating the
E-mail address: [email protected] (P. Salmon). prediction of pilot error, the authors developed a new HEI

0003-6870/$ – see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.apergo.2008.10.005

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
ARTICLE IN PRESS

2 N.A. Stanton et al. / Applied Ergonomics xxx (2008) 1–8

methodology, the Human Error Template (HET; Marshall et al., Next, the outcome or consequence associated with the error is
2003), to be used specifically for predicting design induced pilot described (e.g. the consequence of the pilot dialing in the airspeed
error on civil flight decks during new flight deck technology using the heading knob would be that the aircraft inappropriately
certification. The impetus for this came from a US Federal Aviation adjusts its heading to that of the erroneously entered speed value).
Administration report (FAA, 1996), which, amongst other things, Finally, judgements on the likelihood of the error occurring (Low,
recommended that flight deck designs be evaluated for their Medium or High) and the criticality of the error (Low, Medium or
susceptibility to design-induced flight crew errors and also to High) are made based on domain expertise and experience. If the
identify the likely consequences of those errors during the type identified error is given a ‘high’ rating for both likelihood and
certification process (Harris et al., 2005). criticality, the interface technology in question is rated as a ‘fail’,
The aims of this study were twofold. First, we wished to assess meaning that it is not suitable for certification. An example HET
the performance of the HET methodology against three contem- pro-forma for the task step ‘Dial the speed/mach knob to enter 150 on
porary HEI methods, SHERPA (Embrey, 1986), Human Error HAZOP the IAS/Mach display’ is presented in Table 1. A flowchart depicting
and the Human Error Identification in Systems Tool (HEIST; Kirwan, the HET procedure is presented in Fig. 1.
1994). The purpose of this was to validate the HET methodology as
a tool for predicting design induced pilot error on civil flight decks. 3. Validating the Human Error Template
It was anticipated that the HET methodology would be more
accurate at predicting design induced pilot error than the three The validity of HEI techniques requires testing to ensure that
contemporary methods, based upon the fact that it was developed they are accurate in the prediction of error, whilst the reliability of
specifically for use on flight decks, whereas the other three HEI techniques requires testing to ensure that the techniques offer
methods were developed for control room and nuclear power plant the same error predictions when used by different analysts for the
tasks. Second, we wished to compare the performance of an same task and when used by the same analyst more than once for
approach designed to enhance the accuracy of error predictions, the same task. Typically, HEI techniques place a great amount of
namely the multiple methods and multiple analysts approach, in dependence upon the judgement of the analyst and so different
which the error predictions of different analysts using different analysts may make different predictions regarding the same
methods are pooled in order to enhance error prediction sensitivity. problem (inter-analyst reliability). Similarly, the same analyst may
make different judgments on different occasions (intra-analyst
2. The Human Error Template reliability).
A number of HEI technique validation studies have been
The HET methodology uses an external error mode (EEM) reported in the literature (e.g. Williams, 1989; Whalley and Kirwan,
taxonomy that was developed from a review of existing HEI methods 1989; Kirwan, 1992a,b, 1998a,b; Kennedy, 1995; Baber and Stanton,
and an evaluation of incidences of design-induced pilot error. The 1996; Stanton and Stevenage, 1998). For example, Whalley and
HET EEM taxonomy comprises the following 12 error types: Kirwan (1989) evaluated six HEI methods for their ability to accu-
rately predict the errors responsible for four incidents that had
 Fail to execute, e.g. pilot fails to perform a particular task or previously occurred in the nuclear industry. Similarly, Kennedy
action. (1995) examined the ability of a number of HEI methods to predict
 Task execution incomplete, e.g. pilot fails to perform a task or the errors attributed as causal factors in 10 major disasters. In
action in its entirety. conclusion to an evaluation of 12 HEI approaches, Kirwan (1992b)
 Task executed in the wrong direction, e.g. pilot turns a knob or recommended a combination of expert judgement and the
moves a lever in the wrong direction. Systematic Human Error Reduction and Prediction Approach
 Wrong task executed, e.g. pilot performs a wrong task or action. (SHERPA; Embrey, 1986) as the most valid approach to HEI. Baber
 Task repeated, e.g. pilot presses the correct button twice. and Stanton (1996) tested the validity of SHERPA and Task Analysis
 Task executed on the wrong interface element, e.g. pilot presses For Error Identification (TAFEI; Baber and Stanton, 1996) when used
the wrong button. to predict London Underground rail ticket machine errors. It was
 Task executed too early, e.g. pilot performs a task or action too concluded that both SHERPA and TAFEI provided an acceptable
early in a sequence. level of validity based upon the data from two expert analysts.
 Task executed too late, e.g. pilot performs a task or action too Stanton and Stevenage (1998) also tested the validity of SHERPA
late in a sequence. and a heuristic approach when used to predict error on a vending
 Task executed too much, e.g. pilot moves a lever or turns a knob machine task. It was concluded that SHERPA provided a better
too much. means of predicting errors than the heuristic approach did. More-
 Task executed too little, e.g. pilot does not move a lever or turns over, it was reported that SHERPA returned a mean sensitivity index
a knob sufficiently. (SI) of 0.76 at Trial 1; 0.74 at Trial 2; and 0.73 at Trial 3, which
 Misread information, e.g. pilot misreads the information pre- represent very acceptable levels of validity.
sented by a display.
 Other. 4. Multiple methods and analysts

The HET EEM taxonomy is applied to each bottom level task step It is apparent from the validation studies described above that,
in a Hierarchical Task Analysis (HTA; Stanton, 2006) of the flight although achieving acceptable levels of validity (e.g. SHERPA
task under analysis in order to identify any credible errors. The studies typically return sensitivity index scores of around 0.7) there
identification of credible errors is based on the analyst’s subjective is room for improvement in terms of the accuracy of HEI error
judgement and involves the analyst either observing the task being predictions. One such approach could be to use a combination of
performed or walking through the task themselves either with the multiple methods and multiple analysts, based on the notion that
flight deck interface itself or with functional drawings and photo- the accuracy of error predictions may be enhanced by using a range
graphs of the interface. For each credible error (i.e. those judged by of different but complementary HEI approaches to predict human
the analyst to be possible) the analyst provides a description of the errors for the same task and also that pooling the error predictions
form that the error would take, such as, ‘pilot dials in the airspeed made by a number of different analysts could also enhance the
value using the heading knob’ or ‘pilot fails to lower the landing gear’. comprehensiveness of the errors predicted. The underlying

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
ARTICLE IN PRESS

N.A. Stanton et al. / Applied Ergonomics xxx (2008) 1–8 3

Table 1
Example HET output (source: Marshall et al., 2003).

Scenario: land A320 at New Orleans using the Task step: 3.4.2. Dial the Speed/MACH knob Interface elements: Speed/MACH knob,
autoland system to enter 150 in the IAS/MACH window IAS/MACH display, auto pilot panel

Error mode Description Outcome Likelihood Criticality Pass Fail

L M H L M H
Fail to execute
Task execution incomplete
Task execution in wrong Pilot turns the Speed/MACH knob in the Aircraft decreases speed rather than U U U
direction wrong direction increases speed
Wrong task executed
Task repeated
Task executed on the wrong Pilot dials in airspeed using the HDG knob Aircraft moves to HDG of 150 and U U U
interface element rather than the Speed/MACH knob stays at current speed
Task executed too early
Task executed too late
Task executed too much Pilot turns the Speed/MACH knob too Aircraft takes on incorrect airspeed U U U
much
Task executed too little Pilot does not turn the Speed/MACH knob Aircraft takes on incorrect airspeed U U U
enough
Misread information
Other

assumption is that the shortfalls of each HEI technique and each on) which should enhance error prediction sensitivity and accu-
analyst are compensated for by the other techniques and analysts racy. Kirwan (1998a,b) first proposed the concept of using a range
used (i.e. any errors that method A misses, method B will highlight, or ‘toolkit’ of HEI methods to enhance error prediction sensitivity in
and any errors that analyst A misses, analyst B may highlight, and so complex systems. In conclusion to a review of 38 existing HRA/HEI
techniques Kirwan (1998a) reported that, since none of the
techniques available satisfied all of the 14 criteria against which
START they were evaluated, a framework or toolkit approach may be the
most suitable approach for enhancing the comprehensiveness of
the HEI analysis. Kirwan (1998b) suggested practitioners to utilise
Take the first/next a framework type approach to HEI, whereby a mixture of inde-
bottom level task step pendent HRA/HEI tools would be used under one framework.
from the HTA Although due to its novelty there appears to be nothing within
the academic literature stating the strengths and weaknesses of
multiple methods and analysts approaches to HEI, it is apparent
Enter scenario and that, whilst potentially improving error prediction sensitivity,
task step details in multiple methods and analysts approaches do have some potential
error pro-forma
weaknesses. First, the false alarm rate (i.e. errors predicted that do
not in fact occur) can potentially be increased due to the pooled
error data. However, in safety critical industries it may be accept-
Take first HET error able to generate a high rate of false alarms in order to ensure that all
mode and consider potential errors are identified. Second, the use of additional
potential occurrence methods can significantly increase the level of resources (e.g. time,
training, etc.) required to undertake HEI analyses and also the
increased data returned will ultimately increase the time required
No Is the error for data analysis.
credible?
The three methods, SHERPA, Human Error HAZOP and HEIST
were chosen as a result of a literature review of existing HEI
Yes
methods from which it was concluded that the SHERPA, Human
Error HAZOP and HEIST methodologies were the most suited for
For credible errors, provide: use in the prediction of potential design induced error on the flight
- Description of the error;
deck. A brief description of the three techniques is provided in the
- Consequences of the error;
- Error likelihood (L, M,H); following sections. For a more exhaustive description of the latter
- Error criticality (L, M,H); three techniques, including example outputs, the reader is referred
- PASS/FAIL rating to Stanton et al. (2005).

Yes 5. Systematic Human Error Reduction and Prediction


Approach (SHERPA)
No No
STOP

Are there any Are there any


more error
modes?
more task step? SHERPA (Embrey, 1986) was originally developed for use in the
nuclear reprocessing industry and is probably the most commonly
used HEI approach, with applications in a number of domains,
Yes
including ticket machines (Baber and Stanton, 1996), vending
machines (Stanton and Stevenage, 1998), and in-car radio-cassette
Fig. 1. HET Flowchart. machines (Stanton and Young, 1999). SHERPA uses a behavioural

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
ARTICLE IN PRESS

4 N.A. Stanton et al. / Applied Ergonomics xxx (2008) 1–8

taxonomy linked to an error mode taxonomy and is applied to an remove the reliability problems associated with taxonomy-based
HTA of the task under analysis. The behavioural and EEM taxon- approaches, they add considerable time to the analysis because
omies are used to identify credible errors that are likely to occur each prompt must be considered.
during each step in the HTA. For each credible error identified the As stated, the purpose of this study was to compare the
analyst provides a description of the form that the error would take, performance of HET against three contemporary Human Error
such as, ‘pilot dials in wrong airspeed’ and identifies any conse- Identification (HEI) approaches when used to predict pilot errors
quences associated with the error and also any recovery steps that for an approach and landing task and also to compare, in terms of
would need to be taken in event of the error being made. Finally, error prediction sensitivity, the multiple methods and analysts
ordinal probability (Low, Medium or High), criticality (Low, approach with multiple analyst predictions for each method.
Medium or High) and potential design remedies are recorded.
8. Methodology
6. Human Error Hazard and Operability Study (HAZOP)
8.1. Participants
HAZOP (Kletz, 1974; cited in Swann and Preston, 1995) is a well-
established engineering approach that was developed in the late A convenience sample of 37 Brunel undergraduate students
1960s by ICI (Swann and Preston, 1995) for use in process design aged between 19 and 21 years old was used for the study. Our
audit and engineering risk assessment (Kirwan, 1992a). Typically justification for using undergraduate participants with no previous
undertaken as a group approach, HAZOP involves analysts applying experience of HEI, civil aviation flight tasks and only limited
guidewords, such as Not done, More than or Later than, to each task experience of human factors in general stems from the original
step in order to identify potential errors that may occur. Many requirement for the methodology developed to be usable by non-
variations on the HAZOP approach exist, and the Human Error human factors specialists during the design and certification of
HAZOP approach was developed for dealing with human error flight deck technology. For example, Marshall et al. (2003, p. 6)
issues (Kirwan and Ainsworth, 1992). In the present study, a set of stated that ‘‘the method should also be capable of being used by
Human Error HAZOP guidewords (Whalley, 1988; cited in Kirwan non-human factors experts within the certification authorities’’.
and Ainsworth, 1992) were used. Each guideword is applied to each Further, the capture of potential errors during the early system
task step to identify any credible errors. Once a description of the design phase requires that designers (with limited or no human
error is provided, the consequences, cause and recovery path of the factors experience) are able to use HEI approaches. The participants
error are described. Finally, redesign suggestions are made to either in this case therefore represent a ‘worst case’ population since they
prevent the error from occurring or mitigate its consequences. have no experience of HEI or piloting; acceptable performance by
these participants would provide evidence that experienced
7. Human Error Identification in Systems Tool (HEIST) analysts would achieve acceptable results using the same approach.
The participants were allocated into four groups based upon the
The HEIST technique (Kirwan, 1994) is a component of the HERA HEI methodology that they used during their study (four separate
methodology (Kirwan, 1998b) and uses error identifier questions error prediction studies were conducted, one for each HEI meth-
(e.g. ‘‘Could the operator fail to carry out the act in time?’’) linked to odology). Group one consisted of eight male undergraduate
behaviour tables and an external error mode taxonomy that are students. These participants formed the HET group and received
designed to prompt the analyst to identify potential errors. The task training in the HET methodology. Group two consisted of nine
step in question is first classified into one of the HEIST behavioural undergraduate students. Of these six were male and three were
categories and then the associated HEIST behaviour table and error female. These participants formed the SHERPA group and received
identifier prompts are used to encourage the analyst to identify any training in the SHERPA methodology. Group 3 consisted of a further
errors that could potentially occur during performance of the task nine undergraduate students. Of these seven were male and two
in question. For each credible error identified, the system cause or were female. These participants formed the Human Error HAZOP
psychological error mechanism and error reduction strategy (both group and received training in the Human Error HAZOP method-
of which are provided in the HEIST behaviour tables) is recorded ology. The fourth and final group consisted of 11 undergraduate
and the consequences associated with the error are described. students. Of these, eight were male and three were female. These
The main differences between the approaches compared relate participants formed the HEIST group and received training in the
to the type of approach that they represent, the taxonomies of error HEIST methodology. All participants had no previous experience of
modes that they use, how the analyst goes about predicting the any the HEI methodologies used or of flying an aeroplane.
errors with the technique and also what additional information is
provided once an error has been identified. In terms of the type of 8.2. Flight task
HEI approach, the HET, SHERPA and Human Error HAZOP
approaches are examples of taxonomy-based HEI techniques, The study focussed on the aircraft-landing task using ‘Land
which are characterised by their use of EEM taxonomies to identify aircraft X at New Orleans Airport using the autoland system’. This
potential errors. Typically EEMs are considered for each component task was part of the approach phase of a flight in Aircraft X
step in a particular task or scenario in order to determine credible (a modern, highly automated, ‘glass cockpit’, medium capacity
errors that may arise during the man–machine interaction. Taxo- airliner). This task was chosen as was deemed to be representative
nomic approaches to HEI are typically the most successful in terms of a typical civil aviation-landing task in an automated glass cockpit
of sensitivity and are also the cheapest, quickest and easiest to use. airliner. An HTA was constructed for the flight task based on an
However, these techniques depend greatly on the judgement of the observation of a video recording of a similar landing task and
analyst and their reliability and validity may at times be ques- consultation with subject matter experts. An extract of the HTA is
tionable. HEIST, on the other hand, is an example of an error presented in Fig. 2.
identifier prompt-based technique. These approaches use prompts
or questions to aid the analyst in identifying potential errors. The 8.3. Materials
prompts are typically linked to a set of error modes and reduction
strategies. HEIST is also different in this case since it also considers All participants were supplied with a training package for the
performance-shaping factors. Whilst these techniques attempt to methodology in question. The training packages consisted of

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
ARTICLE IN PRESS

N.A. Stanton et al. / Applied Ergonomics xxx (2008) 1–8 5

3.Prepare the
aircraft for landing

3.1 Check the 3.2 Reduce 3.3 Set flaps to 3.4 Reduce 3.5 Set flaps to 3.6 Set flap to 3.8 Put 3.10Set flaps
distance (m) airspeed to level 1 airspeed to150 level 2 level 3 the to ‘full’
from runway 190 Knots Knots landing
gear down

3.7 Reduce
airspeed to
140 Knots 3.9 Check
3.2.1Check 3.2.2 Dial the altitude
current airspeed ‘Speed/MACH’ knob
to enter 190 on the
IAS/MACH display
3.5.1.Check 3.5.2 Move
current flap setting flap lever to 2

3.10.1Check 3.10.2
current flap Move flap
3.6.2 Move setting lever to F
3.3.1 Check 3.3.2 Move ‘flap’ 3.6.1 Check
current flap setting lever to 1 current flap setting ‘flap’ lever
to 3

3.4.1 Check 3.4.2 Dial the


current airspeed 3.7.1 Check 3.7.2 Dial the
‘Speed/MACH’knob current airspeed
to enter 150 on the ‘Speed/MACH’ knob
IAS/MACH display to enter 140 on the
IAS/MACH display

Fig. 2. Extract of landing task HTA (source: Marshall et al., 2003).

a description of the method in question, a copy of the taxonomy Error and HEI was given. Next, participants were given a short
associated with the error prediction method; a flowchart showing training session on the method that their particular group was
how to conduct an analysis using the method; an example output of being tested on. This included a short introduction to the method
the method and also an example of an analysis carried out using the and a step-by-step walkthrough of a worked example of an HEI
method in question. Participants were also given an HTA describing analysis using the method in question. The analysis used for each of
the action stages involved when using a vending machine as part of the methods was an HEI analysis of a Ford in-car radio cassette
the training and also an HTA describing the action stages involved system (Stanton and Young, 1999).
when landing aircraft X at New Orleans using the Auto-land system Once familiar with their HEI method, participants were given an
for the main study. The participants were also provided with HTA of a vending machine task (Stanton and Stevenage, 1998) along
photographs of all flight deck instrumentation used in the flight with A3 photographs of the vending machine and its user interface
task, i.e. flap lever, throttle lever, auto-pilot panel, Captains’ primary on which to undertake practice HEI analysis. After a demonstration
flight display (in the appropriate mode), landing gear lever and the of the task and a walkthrough of the HTA, participants used their
Captain’s navigation display. All participants were also provided allocated method to make error predictions for the vending
with suitable pro-formae for recording their error predictions. machine task. At this stage, participants were permitted to confer
Microsoft Flight Simulator 2000Ô Professional Edition was also with other participants and also to ask the experimenter questions
used to give the participants a demonstration and walkthrough of regarding the analysis. Once the error predictions were complete,
the flight task under analysis. participants were provided with an ‘expert’ analysis (undertaken
by a human factors researcher with considerable experience in HEI)
8.4. Design for the vending machine task so that they could compare their error
predictions with an expert’s error predictions for the same task. The
A between-subjects design was used in this study. The inde- experimenter then discussed each of the errors predicted and
pendent variables were the four different participant groups, the answered any questions regarding the vending machine error
HET group, HAZOP group, HEIST group and SHERPA groups. The prediction task.
dependent variables were the errors predicted by each participant After a short break, participants were then given the HTA for the
and the time taken by each participant to conduct the HEI exercise. task, ‘Land aircraft X at New Orleans using the Auto-land system’, as
the experimental condition, along with colour photographs of all of
8.5. Procedure the relevant flight deck equipment. After an initial walkthrough of
the task, participants were given a step-by-step demonstration of
Participants were recruited via e-mail advertisement and the the landing task using Microsoft Flight Simulator 2000 Professional
respondents were divided into four separate groups, based upon Edition. Participants were then asked to predict any potential
the four HEI techniques used. For each group, participants were design induced pilot errors for the flight task independently from
initially given a short briefing on the purpose of the experiment. other participants. For reliability purposes, participants returned 4
Following this a lecture-based introduction to the areas of Human weeks later to carry out a repetition of the analysis (hereafter

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
ARTICLE IN PRESS

6 N.A. Stanton et al. / Applied Ergonomics xxx (2008) 1–8

referred to as Trial 2) employing the same HEI technique that they 0.7 SI T 1
had used during the first error prediction exercise (hereafter
SI T 2
referred to as Trial 1). 0.6

Sensitivity lndex Score


8.6. Data analysis 0.5

To compute validity statistics, the error predictions made by 0.4


each participant were compared with actual error incidence data
0.3
reported by pilots using the autoland system for the flight task
under analysis (which was obtained via questionnaire survey). In 0.2
this survey pilots type-rated on the same aircraft were asked to
report any errors that either they had made or they had seen being 0.1
made by a co-pilot, for each of the task steps in the HTA, ‘Land
aircraft X at New Orleans airport using the Auto-Land system’. 0
A total of 46 pilots (45% Captains, 37% First Officers, 13.3% Trainee SHERPA HAZOP HEIST HET Multiple
Captains, 4.7% who declined to state their position) with experience Methods
ranging from less than 2000 h to over 16,000 h (mean ¼ 6, 832 h, HEI Method
SD ¼ 4, 524 h) responded to the survey. Fifty-seven different error Fig. 3. SI scores (Trial 1 and Trial 2).
types were reported in the survey. A detailed description of these
errors can be found in Marshall et al. (2003).
The sensitivity of each participant’s error predictions was analysts group. The sensitivity of these error predictions (each
calculated using the Signal Detection paradigm. The signal detec- method in isolation and the multiple methods and analysts
tion paradigm was used as it has been found to provide a useful approach data) was then assessed using the sensitivity index
framework for testing the power of HEI techniques and has been formula described above.
used effectively for this purpose in the past (e.g. Stanton and Ste- The mean Trial 1 and Trial 2 SI scores for each method and also
venage, 1998; Harris et al., 2005). The signal detection paradigm the multiple methods and analysts approach SI score are presented
sorts the data into the following mutually exclusive categories: in Fig. 3.
The SI score results show that the multiple methods and
(1) Hit – an error predicted by the analyst that was also reported by analysts approach achieved the greatest SI scores (Trial 1 ¼ 0.69,
the survey respondents. Trial 2 ¼ 0.69), followed by analysts using the HET approach (Trial
(2) Miss – the failure to predict an error that was reported by the 1 ¼ 0.66, Trial 2 ¼ 0.65). As sensitivity is made up of hit rate and
survey respondents. false alarm rate, each of these was considered separately. The mean
(3) False alarm – an error predicted by the analyst but that was not Trial 1 and Trial 2 hit rate scores for each method and the multiple
reported by the survey respondents. methods and analysts approach hit rate score are presented in
(4) Correct rejections – correctly rejected error that was not Fig. 4.
reported by the pilots. This represents the number of errors The hit rate score results show that the analysts using HET
contained in the HEI methods error mode taxonomy that were approach achieved the greatest hit rate scores (Trial 1 ¼ 0.88, Trial
correctly rejected by the analyst and also not reported by the 2 ¼ 0.89).
survey respondents. The mean Trial 1 and Trial 2 false alarm rate scores for each
method and the false alarm rate score for the multiple methods and
These four categories were entered into the signal detection grid analysts approach are presented in Fig. 5.
for each subject. The signal detection paradigm was then used to The false alarm rate scores show that, at Trial 1, the analysts
calculate the sensitivity index (SI). This returns a value of between using the Human error HAZOP approach achieved the lowest false
0 and 1, the closer that SI is to 1, the more accurate the techniques’
predictions are. The formula used to calculate SI is given in Eq. (1)
(from Stanton and Stevenage, 1998)
 ! 0.9 Hit Rate T1
Hit þ1 False alarm Hit Rate T2
HitþMiss FAþCorrect rejection 0.8
SI ¼ (1)
2
0.7
9. Results
0.6

Treatment of data: The data obtained had to be grouped first 0.5


so that the multiple methods and analysts approach sensitivity
could be calculated. For the individual HEI methods comparison 0.4
the mean scores (SI, hit rate and false alarm rate) from each
0.3
methods group (e.g. HET group, SHERPA group, HEIST group and
Human Error HAZOP group) were calculated. For the multiple 0.2
methods and analysts approach, six participant’s error predictions
from each method group were pooled together. In order to be 0.1
consistent in the comparison of the individual methods with the
pool of multiple methods and analysts some cases had to be 0
SHERPA HAZOP HEIST HET Multiple
discounted, as not all participants turned up to all of the trials. A Methods
core pool of six participants in each of the groups was formed,
whose data was then used to form the multiple methods and Fig. 4. Hit rate scores (Trial 1 and Trial 2).

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
ARTICLE IN PRESS

N.A. Stanton et al. / Applied Ergonomics xxx (2008) 1–8 7

Table 2
0.8 FA Rate T1 Comparison significance table.
FA Rate T2 Signal detection criteria Analysis using
0.7
HET HAZOP SHERPA HEIST
0.6 Hit rate Trial 1 0.0038 0.0101 Multiple
<0.005 <0.05 methods
0.5 Hit rate Trial 2 0.0154 0.0161 0.0245 0.0247 and
<0.05 <0.05 <0.05 <0.05 analysts
0.4 False alarm rate Trial 1 0.0039 0.0159 0.0062
<0.005 <0.05 <0.0
False alarm rate Trial 2 0.0163 0.0064
0.3 <0.05 <0.01
Sensitivity index Trial 1 0.0039 0.0039 0.0039
0.2 <0.05 <0.005 <0.005
Sensitivity index Trial 2 0.0039 0.0065
0.1 <0.005 <0.01

Empty cell, not significant; italic values, multiple methods and analysis significantly
0 greater; bold values, analysts using one method significantly greater.
SHERPA HAZOP HEIST HET Muliple
Methods
used to predict errors for the same flight task (Stanton et al., 2006).
Fig. 5. False alarm rate scores (Trial 1 and Trial 2). The superior performance of the HET approach in this case was
most probably attributable to the fact that the HET error mode
taxonomy was developed, in part, based on a review of civil aviation
alarm rate score (0.35) whilst the multiple methods approach
incidents. On the other hand, SHERPA, Human Error HAZOP and
achieved the lowest false alarm rate score at Trial 2 (0.34).
HEIST were developed for the nuclear power and process control
Mann–Whitney ‘U’ statistical tests were performed to establish
domains. This meant that analysts were somewhat constrained in
if the observed differences between the sensitivity index, hit rate
terms of the errors that they could predict, since the taxonomy used
and false alarm rate scores for the different approaches were
in these methods taxonomy may not have contained errors of the
significant. The results are presented in Table 2.
type that might occur on civil flight decks.
The results presented in Table 2 demonstrate that at Trial 1, the
The second objective of this study was to test an approach to
multiple methods and analysts approach SI scores were signifi-
enhancing the accuracy of error predictions, the multiple methods
cantly better than the analyst mean SI scores using either HAZOP,
and analysts approach. In terms of the overall accuracy of the error
SHERPA and HEIST, but not significantly better than the HET mean
predictions, it was observed that the multiple methods and analysts
SI score and at Trial 2 that the multiple methods and analysts
approach was significantly more accurate than the multiple analyst
approach SI scores were significantly greater than the analyst
approach using HAZOP (at Trial 1 only), SHERPA and HEIST at
scores using SHERPA and HEIST, but not HET and HAZOP scores.
predicting errors for the flight task, but not significantly more
For hit rate at Trial 1, the multiple methods and analysts
accurate than the error predictions offered by the multiple analysts
approach score was significantly higher than the HAZOP and
using the HET approach (nor HAZOP at Trial 2). It is concluded from
SHERPA scores, but was not significantly greater than the HET and
this that the multiple methods and analysts approach is not
HEIST hit rate scores. For hit rate Trial 2, the multiple methods
significantly more accurate than the HET approach when used to
analysis approach score was significantly greater than the SHERPA
predict pilot errors for the landing task in question, however, this
and HAZOP scores, however, the analysts using HET and HEIST
finding does have implications for the prediction of human error in
achieved significantly greater hit rate scores at time 2.
complex systems. For example, in some cases, it suggests that it
For false alarm rate, at Trial 1 the statistical analysis indicates
may be more appropriate to use multiple methods and multiple
that the multiple methods and analysts approach scored lower
analysts to predict errors, rather than only one method in isolation.
(and therefore better) than the analysts using HAZOP, SHERPA and
On the basis of the findings derived from this study, using a group
HEIST, but that the difference between the multiple analyst
of analysts and HEI methods to predict error can enhance the
and methods analysis and the analysts using HET was not signifi-
sensitivity of error predictions, whereas using only one method
cant. For false alarm rate Trial 2, the multiple methods and analysts
could potentially lead to critical errors being missed during the
analysis approach scores were significantly lower than analysts
error prediction process. This certainly appears to be the case when
using HET and SHERPA, but were not significantly lower than the
attempting to predict error in domains for which no HEI
analyst scores for HAZOP and HEIST.
approaches have been specifically developed. One way of
enhancing the sensitivity of the error predictions made would
10. Discussion therefore be to use a combined toolkit of a range of HEI methods
from other domains.
10.1. Error prediction sensitivity This research lends support to Kirwan’s (1998a,b) argument for
the use of a comprehensive multiple methods (i.e. toolkit)
This study had two main objectives. The first objective of the approach. The differences in the taxonomies of different methods
study was to compare the accuracy of the HET approach when used may ensure greater capture of the types of error that are likely to
to predict design induced pilot error against three other contem- occur. Alternatively, methods that have been developed specifically
porary HEI approaches developed in other domains. In conclusion, for the domain in question appear to perform equally as well when
participants using the HET methodology were the most accurate in multiple analysts are utilised. It is assumed that the superior
their predictions for the flight task under analysis, both at Trial 1 accuracy of the multiple methods and analysts approach over and
and Trial 2. This study therefore provides validation evidence in above the three other HEI methods was due to the error mode
support of the HET approach for predicting pilot error on civil flight taxonomy being more comprehensive as in this case it was effec-
decks. Previous studies have also demonstrated that the HET tively four error taxonomies combined. This comprehensiveness of
approach is more accurate than HAZOP, HEIST and SHERPA when the error taxonomy is likely to lead to an increase in the numbers of

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
ARTICLE IN PRESS

8 N.A. Stanton et al. / Applied Ergonomics xxx (2008) 1–8

errors correctly identified (e.g. hits) and thus reduce the number have provided invaluable support to this research and also to
of errors that are missed (e.g. misses). On the downside, the Air2000, British Midland and JMC airlines and their pilots for taking
increased number of error modes could potentially increase the the time to complete the questionnaire.
number of wrongly identified errors (e.g. false alarms) and decrease
the numbers of errors correctly discarded (e.g. correct rejections).
This was not the case in this study, however, with the multiple
methods and analysts group performing better in terms of SI and References
false alarm rate scores. Further, as pointed out earlier, in some
Baber, C., Stanton, N.A., 1996. Human error identification techniques applied to
circumstances (i.e. in safety critical systems analysis) it may be public technology: predictions compared with observed use. Applied
acceptable to generate a high false alarm rate if it contributes to the Ergonomics 27 (2), 119–131.
detection of more errors. It is recommended, however, that when Civil Aviation Authority, 1998. Global Fatal Accident Review 1980–1996 (CAP 681).
Civil Aviation Authority, London.
using a multiple methods and analysts approach, appropriate Embrey, D.E., 1986. SHERPA: a systematic human error reduction and prediction
subject matter experts with a sufficient level of experience in HEI approach. Paper presented at the International Meeting on Advances in Nuclear
are used or a combination of subject matter experts and methods Power Systems, Knoxville, Tennessee.
Federal Aviation Administration, 1996. Report on the Interfaces Between Flight-
experts working together (Stanton and Stevenage, 1998). It should crews and Modern Flight Deck Systems. Federal Aviation Administration,
be noted that the analysts in this case were neither experts in the Washington DC, USA.
domain of civil aviation nor were they experts in the application of Harris, D., Stanton, N.A., Marshall, A., Young, M.S., Demagalski, J., Salmon, P.M.,
2005. Using SHERPA to predict design induced error on the flight deck.
HEI techniques. It would be expected that the signal detection Aerospace Science and Technology 9, 525–532.
theory statistics should be higher (i.e. error predictions more Kennedy, R.J., 1995. Can human reliability assessment (HRA) predict real accidents?
accurate) if this were the case. A case study analysis of HRA. In: Glendon, A.I., Stanton, N.A. (Eds.), Proceedings
of the Risk Assessment and Risk Reduction Conference. Aston University,
It is acknowledged that the use of a convenience sample of Birmingham, UK.
participants with no experience of HEI methods or piloting in Kirwan, B., Ainsworth, L.K., 1992. A Guide to Task Analysis. Taylor and Francis,
general was a significant limitation of this study. As stated earlier, London.
Kirwan, B., 1992a. Human error identification in human reliability assessment. Part
the sample used in this case represented a worst case sample (i.e.
1: overview of approaches. Applied Ergonomics 23, 299–318.
no domain or HEI method experience). Further, this limitation was Kirwan, B., 1992b. Human error identification in human reliability assessment. Part
tempered by the relatively good performance of the participants 2: detailed comparison of techniques. Applied Ergonomics 23, 371–381.
involved and suggests, provided that an appropriate methodology Kirwan, B., 1994. A Guide to Practical Human Reliability Assessment. Taylor and
Francis, London.
and task description is used, that domain expertise might not be as Kirwan, B., 1996. The validation of three Human Reliability Quantification
critical as is assumed. techniques – THERP, HEART and JHEDI: part 1 – technique descriptions and
In closing, this study has demonstrated that the HET approach is validation issues. Applied Ergonomics 27 (6), 359–373.
Kirwan, B., 1998a. Human error identification techniques for risk assessment of high
a viable tool for identifying design induced pilot errors within the risk systems – part 1: review and evaluation of techniques. Applied Ergonomics
civil aviation domain. The level of accuracy attained by inexperi- 29 (3), 157–177.
enced analysts when using the HET approach to identify such errors Kirwan, B., 1998b. Human error identification techniques for risk assessment of high
risk systems– Part 2: towards a framework approach. Applied Ergonomics 29
is encouraging and suggests that the HET approach, when used by (5), 299–318.
domain experts with significant experience in HEI analysis, can Lane, R., Stanton, N.A., Harrison, D., 2006. Applying hierarchical task analysis to
potentially be a very powerful tool for accurately identifying design medication administration errors. Applied Ergonomics 37 (5), 669–679.
Marshall, A., Stanton, N.A., Young, M., Salmon, P.M., Harris, D., Demagalski, J.,
induced pilot error. Further, this study seems to suggest that error
Waldmann, T., Dekker, S.W., 2003. Development of the Human Error Template –
prediction sensitivity can potentially be enhanced through the use a New Methodology for Assessing Design Induced Errors on Aircraft Flight
of a multiple methods and analysts approach, which indicates that Decks. Final Report of the ERRORPRED Project E! 1970 (August 2003).
Department of Trade and Industry, London.
future HEI analyses efforts should utilise teams of HEI analysts with
McFadden, K.L., Towell, E.R., 1999. Aviation human factors: a framework for the new
access to a toolkit of different HEI approaches. millennium. Journal of Air Transport Management 5, 177–184.
It is recommended that further research into means of Nelson, W.R., Haney, L.N., Ostrom, L.T., Richards, R.E., 1998. Structured methods for
enhancing error prediction accuracy be undertaken. Also, further identifying and correcting potential human errors in space operations. Acta
Astronautica 43, 211–222.
applications of the HET approach within the aviation domain are Shorrock, S.T., Kirwan, B., 2002. Development and application of a human error
encouraged. Further, whilst this study has demonstrated that using identification tool for air traffic control. Applied Ergonomics 33, 319–336.
multiple analysts and methods may enhance error prediction Stanton, N.A., 2006. Hierarchical task analysis: developments, applications and
extensions. Applied Ergonomics 37, 55–79.
sensitivity, it is clear that further investigation in other domains in Stanton, N.A., Baber, C., 2002. Error by design: methods for predicting device
which error prediction is dominant is required, such as the process usability. Design Studies 23 (4), 363–384.
control and air traffic control domains. The usefulness of HEI Stanton, N.A., Stevenage, S.V., 1998. Learning to predict human error: issues of
reliability, validity and acceptability. Ergonomics 41, 1737–1756.
techniques is already assured. However enhancing the accuracy of Stanton, N.A., Young, M.S., 1999. A Guide to Methodology in Ergonomics: Designing
the error predictions offered by such techniques can only make for Human Use. Taylor and Francis, London.
them more powerful tools within system design and analysis Stanton, N.A., Salmon, P., Walker, G.H., Baber, C., Jenkins, D., 2005. Human Factors
Methods: A Practical Guide for Engineering and Design. Ashgate, Aldershot.
efforts.
Stanton, N., Harris, D., Salmon, P.M., Demagalski, J.M., Marshall, A., Young, M.S.,
Dekker, S.W.A., Waldmann, T., 2006. Predicting design induced pilot error using
Acknowledgements HET (Human Error Template) – a new formal human error identification
method for flight decks. Journal of Aeronautical Sciences, February, 107–115.
Swann, C.D., Preston, M.L., 1995. Twenty five years of HAZOPs. Journal of loss
We wish to acknowledge that this research was made possible prevention in the Process Industries 8 (6), 349–353.
through funding from the Department of Trade and Industry as part Whalley, S.J., Kirwan, B., 1989. An evaluation of five human error identification
of the European EUREKA! Programme. Our thanks go to John techniques. Paper presented at the 5th International Loss Prevention Sympo-
sium. Oslo.
Brumwell & Gillian Richards of the CARAD Advanced Systems Williams, J.C., 1989. Validation of human reliability assessment techniques.
Programme at the DTI and Richard Harrison (now at QinetiQ) who Reliability Engineering 11, 149–162.

Please cite this article in press as: Stanton, N.A., Predicting pilot error: Testing a new methodology and a multi-methods and analysts approach,
Applied Ergonomics (2008), doi:10.1016/j.apergo.2008.10.005
View publication stats

You might also like