0% found this document useful (0 votes)
4 views18 pages

Delphi

This paper critically evaluates the Delphi technique, highlighting its methodological and technical challenges that hinder its effectiveness in reducing 'process loss' in group judgments. The authors argue that the discrepancies between laboratory applications and the original Delphi concept limit the validity of comparisons and generalizations. They conclude that while Delphi has potential, its typical feedback mechanisms may negate any benefits, suggesting alternative approaches to improve judgment accuracy within groups.

Uploaded by

Tanushree Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views18 pages

Delphi

This paper critically evaluates the Delphi technique, highlighting its methodological and technical challenges that hinder its effectiveness in reducing 'process loss' in group judgments. The authors argue that the discrepancies between laboratory applications and the original Delphi concept limit the validity of comparisons and generalizations. They conclude that while Delphi has potential, its typical feedback mechanisms may negate any benefits, suggesting alternative approaches to improve judgment accuracy within groups.

Uploaded by

Tanushree Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228237839

The Delphi Technique: A Re-Evaluation of Research and Theory

Article · April 2011

CITATIONS READS
100 3,071

3 authors, including:

Gene Rowe George Wright


Gene Rowe Evaluations University of Strathclyde
120 PUBLICATIONS 14,238 CITATIONS 181 PUBLICATIONS 13,121 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by George Wright on 09 January 2015.

The user has requested enhancement of the downloaded file.


TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE 39, 235-251 (1991)

Delphi
A Reevaluation of Research and Theory

GENE ROWE, GEORGE WRIGHT, and FERGUS BOLGER

ABSTRACT

This paper examines critically the Delphi technique to determine whether it succeeds in alleviating the
“process loss” typical of interacting groups. After briefly reviewing the technique, we consider problems with
Delphi from two perspectives. First, we examine methodological and technical difficulties and the problems
these have brought about in experimental applications. We suggest that important differences exist between
the typical laboratory Delphi and the original concept of Delphi. These differences, reflecting a lack of control
of important group characteristics/factors (such as the relative level of panelist expertise), make comparisons
between Delphi studies unrealistic, as are generalizations from laboratory studies to the ideal of Delphi. This
conclusion diminishes the power of those former Delphi critiques that have largely dismissed the procedure
because ofthe variability of laboratory study results. Second, having noted the limited usefulness of the majority
of studies for answering questions on the effectiveness of Delphi, we look at the technique from a theoreti-
cal/mechanical perspective. That is, by drawing upon ideas/findings from other areas of research, we attempt
to discern whether the structure of the Delphi procedure itself might reasonably be expected to function as
intended. We conclude that inadequacies in the nature of feedback typically supplied in applications of Delphi
tend to ensure that any small gains in the resolution of “process loss” are offset by the removal of any opportunity
for group “process gain.” Some solutions to this dilemma are advocated; they are based on an analysis of the
process of judgment change within groups and a consideration of factors that increase the validity of statisti-
cal/nominal groups over their constituent individual components.

The Impetus for a New Approach to Aggregating Judgments


The group committee meeting is a common strategy for the resolution of differences
and the advocation of refined opinion-whether that opinion be related to the forecasting
of future events, the estimation of current status, or the expression of present intentions
or decisions. Underlying this process is the expectation that ‘?I + 1” heads will be better
than one [l] and that the potential sum of useful information available to the group will
be at least as great as, and more usually greater than, that of any particular individual
within that set. Lock has further noted that groups may serve to enhance individual
commitment, help in resolving ambiguous and conflicting knowledge, and facilitate crea-
tivity along with a watchfulness for errors [2]. Consequently, combining individual judg-
ments may lead to “process gain” [3, 41, where groups may perform better than their
best member.

GENE ROWE is a researcher at the Bristol Business School, Bristol, England. GEORGE WRIGHT is
reader in business at the Bristol Business School. FERGUS BOLGER is a research fellow at the Bristol Business
School, where he is conducting research on professional judgment.
Address reprint requests to Dr. Gene Rowe, Bristol Business School, Coldharbour Lane, Frenchay, Bristol
BS16 IQY, England.

0 1991 by Elsevier Science Publishing Co., Inc. 0040-1625/91/$3.50


236 G. ROWE ET AL.

However, while evidence does exist for the advantage of committee-like groups over
individual judgments over a variety of domains, in terms of qualitative and quantitative
performance criteria [ 1, 5-71, the advantages of such groups over mathematical aggre-
gations of individual judgments into “statisticized” groups are less consistent [8-lo].
Further, studies have generally shown group judgment to be mainly inferior to the judg-
ment of the group’s best member [ 1, 111. Thus, it is clear that in many circumstances
interacting groups do not perform up to their optimal level or potential.
Various explanations have been proposed to account for this “process loss” [ 121,
which is often seen to occur within interacting groups. For example, Steiner has suggested
that a misweighting of judgments from group members may come about through the
mismatch between members’ status and quality of contributions, through the lack of
contribution from proficient yet under-confident members, through the difficulty of eval-
uating the quality of individual participants, or through the social pressures that may be
exerted by an incompetent majority on a competent minority [ 121. Further, various authors
have noted how groups may take on lives of their own, where motives change and the
purpose of achieving the best possible judgment may be supplanted by the goal of simply
reaching agreement. Hence, premature closure and satisficing may dominate optimalizing,
with consensus accepting the first solution that greatly offends no one, even though no
one may agree with that solution wholeheartedly [ 13-151. Add to these factors the vested
interests of group members with their needs to “win” (or at least not to lose face), and
it becomes obvious that interacting groups possess a good many potential stumbling
blocks and that scope exists for an alternative aggregation technique.

The Delphi Technique and Its Potential for Ameliorating “Process Loss”

HISTORY
The Delphi technique was developed during the 1950s by workers at the RAND
Corporation as a procedure to “obtain the most reliable consensus of opinion of a group
of experts . . . by a series of intensive questionnaires interspersed with controlled opinion
feedback’ [ 16, p. 4581. Initially, it was used in the realm of long-term forecasts of change,
particularly in science and technology, though in later years it has come to be used in a
wide variety of judgmental settings [ 15, 17-191. The main criterion for Delphi’s em-
ployment is the indispensability of judgmental information, which may arise in cases
(such as forecasting) where no historical data exist, or when such data are inappropriate
(that is, new influencing factors are expected that are not incorporated in the past data).
Delphi may also find a use in situations where moral or ethical (that is, subjective)
considerations dominate economic or technical ones, or perhaps even in situations where
historical/economic/technical data are just too costly to obtain. Thus, according to Coates,
Delphi is a technique of “last resort,” to be used when no adequate models exist upon
which some statistical prediction or judgment might be based [20]. We should also note
that Delphi may be used for other purposes such as in policy formation, where “validity”
of judgment is a less salient concept and “last resort” may then be an inappropriate
assessment. Our concern, however, shall focus upon Delphi as a judgment aid-
ing/enhancing tool, in which case Coates’s assessment as to its limitation seems sound.

CHARACTERISTICS
Delphi aims to make use of the positive attributes of interacting groups (that is,
knowledge from a variety of judges) while removing the negative aspects largely attributed
to the social difficulties within such groups. To this end, four necessary features char-
DELPHI: A REEVALUATION 237

acterize a Delphi procedure: anonymity, iteration, controlled feedback, and statistical


aggregation of group response.

1. “Anonymity” is achieved through the use of questionnaires. By allowing group


members to make their responses privately, undue social pressures should be
avoided. Theoretically, this should allow group members the freedom to express
their own beliefs without feeling pressured by dominant individuals, providing
them with the opportunity to consider each idea on the basis of merit alone (as
opposed to the basis of spurious and invalid criteria-for example, “status”),
while enabling them to change their minds without fearing loss of face (in the
eyes of the group).
2. “Iteration” occurs by means of presenting the constructed questionnaire over a
number of rounds, allowing members to change their opinions.
3. “Controlled feedback’ takes place between rounds, during which each group
member is informed of the opinions of the other group members. Often this
feedback is presented in the form of a simple statistical summary of the group
response-such as the mean or median (in quantitative assessments, such as when
an event might occur, the likelihood of a given event occurring, and so on)--
though sometimes actual arguments may be presented. Thus, all members are
allowed an input into the process, not just the most vocal.
4. “Statistical group response” is obtained at the end of the procedure where group
judgment is expressed as a median (usually), and the extent of the spread of
members’ opinions may be used as an indication of the strength of the consensus.
Therefore, more information is available than just a simple consensual judgment.

The first round of the “classical” Delphi procedure is unstructured, allowing the
experts relatively free scope to elaborate on those issues/forecasts they see as being
important in the selected area of interest. These individual factors are then consolidated
into a single set by the monitor team, producing questionnaires for subsequent rounds
where quantitative assessments or predictions are required. After each of these subsequent
rounds, responses are analyzed and statistically summarized (usually into medians plus
upper and lower quartiles), which are presented to the “panelists” for further consideration.
Thus, in the third round and thereafter, panelists have the opportunity to alter prior
estimates on the basis of the group feedback. Further, if panelists’ assessments fall within
the upper or lower quartile, then they may be asked to give reasons whey their selections
are correct against the majority opinion (with all such arguments remaining of anonymous
origin). This procedure continues until a certain stability in panelists’ responses is achieved,
and is left to the discretion of the monitor team. The forecast or assessment for each
event is then represented by the median on the final round, with the degree of disagreement
indicated by differences in the quartile figures. Hence, consensus is not entirely forced,
as an indicator of discontent may be appended to the ultimate response.
Variations of Delphi do exist from this original ideal [15, 171. Most commonly,
round one is structured in order to make things simpler for the monitor team and panelists;
the number of rounds is variable, though seldom goes beyond one or two iterations (when
most change in panelists’ responses is expected [21]); and often, panelists may be asked
for just a single statistic-such as the date by when an event has a 50% likelihood of
occurring-rather than for multiple figures or dates representing degrees of confidence
or likelihood (for example, the 10% and 90% likelihood dates). These simplifications are
238 G. ROWE ET AL.

Area containing
true answer

Central Central Central


Tendency Tendency Tendency
for for for
‘Swingers’ Group ‘Holdouts’
Fig. 1. Theoretical change in group response over rounds.

particularly common in laboratory studies of Delphi-an important point we will return


to later.

THE THEORY BEHIND DELPHI


Simply stated, and as implied earlier, Delphi aims to maintain the advantages of the
interacting group while removing the (largely social) hinderances leading to process loss.
The axiomatic explanation as to how enhanced judgment should be attained has been
expounded by Dalkey [22] and reviewed by Parente and Anderson-Parente [ 191. The so-
called “theory of errors” assumes that the aggregate of a group will provide a judg-
ment/forecast that is generally superior to that of most of the individuals within the group:
when the range of individual estimates excludes the true answer (T), then the median
(M) should be at least as close to the true answer as one half of the group, but when the
range of estimates includes T, then M should be more accurate than more than half of
the group. This indicates the advantage of taking a statistical aggregate of individual
estimates (see Parente and Anderson-Parente [ 191, especially p. 141-142). Of course,
this doesn’t necessarily mean that M will be more accurate than the most accurate panelist,
though it is a possibility.
The improvements with iteration were hypothesized by Dalkey [22] to arise through
the motion of the less knowledgeable panelists (known in RAND jargon as “swingers”),
accompanied by the relative intransigence of the more accurate panelists (known as
“holdouts”). It was suggested that the least accurate panelists will realize that this is the
case and hence be drawn toward the median, while the most knowledgeable panelists
will be more confident and so be less drawn toward the mean or remain where they are.
If so, then it can be shown that the median group response should move toward T over
rounds, as demonstrated in Figure 1.
Thus M and T are seen as the two forces that exert influence upon group members,
with feedback working to dampen the pull of M, which might otherwise bias the panelists’
responses away from the true answer.

EXPERIMENTAL FINDINGS
Generally, Delphi studies have used material of such a nature as to ensure rapid
assessment of cross-round changes in judgmental accuracy. Hence, question scenarios
have tended to be of either the “almanac” or short-range forecasting type, the former
involving items whose values are already known by the experimenters and about which
the subjects are presumed capable of making educated guesses. Results confirm that
convergence does take place over rounds [23-261, though they are not so unequivocal
DELPHI: A REEVALUATION 239

in demonstrating that such convergence moves toward more valid responses. For example,
Boje and Mumighan [27] demonstrated a decrease in accuracy over rounds, while other
studies have shown improvements to be only slight [28]. Nevertheless, the general trend
is toward more valid judgments over iterations (see Parente and Anderson-Parente [19]
for review).
In comparison to other techniques aimed at enhancing judgmental accuracy, again
Delphi’s worth has not been convincingly demonstrated. For example, Brockhoff found
that a Delphi procedure produced more accurate short-term forecasts than a face-to-face
committee meeting, while the situation was reversed for almanac questions [21]. Riggs
[29] found Delphi to produce significantly superior predictions than a normal interacting
group, as did Larreche and Moinpour [30].
Gustafson et al. compared Delphi to a variety of other techniques, including an
“estimate-talk-estimate” procedure-essentially the nominal group technique (NGT) [28].
NGT basically incorporates features of nominal and interacting groups, where discussion
of ideas takes place in a structured group environment, with decisions made in private,
and followed by statistical combination. Using almanac items, Gustafson et al. found a
slight advantage for the NGT-like procedure over Delphi, with Delphi providing results
no better than the prior average individual estimates or than the interacting group condition.
Other studies have provided evidence for a superiority of NGT over Delphi [3 1, 321,
though Boje and Mumighan found NGT less good than Delphi (both techniques performed
poorly) [27]. Fischer, in another multiprocedure comparison study, found that Delphi,
NGT, and interacting groups each outperformed the statistical average of individual
judgments, though these procedures were virtually indistinguishable from one another in
terms of performance [33]. Sniezek, in a further study of group procedures, likewise
found each approach (including Delphi) better than the average of individual judgments,
though nonsignificantly different from one another [34].

Critique of Delphi
For Delphi to be a useful technique it should provide more accurate assess-
ments/judgments/forecasts than those which might be attained by interacting groups or
by individuals. The fact that Delphi has not consistently been shown to achieve this, and
has been shown to be generally inferior (to some small extent) to another technique
(NGT), gives us cause for concern. Fischer has pointed out that, when no method is
significantly better than any other at providing enhanced judgmental performance, then
practical economic factors become important [33]. From this perspective, Delphi’s utility
becomes suspect because, though it may prove less costly than other group techniques
such as the committee meeting or the NGT (where the individuals are required to gather
together at the same time and place), it nevertheless suffers in comparison to a straight-
forward mathematical aggregation approach, where individual estimates are elicited and
combined with little further process.
Indeed, it was largely the lack of clear demonstrations of Delphi’s validity or reli-
ability that formed the main basis for Sackman’s well-known critique [35], along with
the methodological inadequacy of many of the studies of Delphi. Experimental findings
since then do not allow us to refute the main thrust of Sackman’s arguments; without
proven validity, Delphi’s usefulness is questionable. Certain authors and proponents of
the technique have struck back at Sackman, balking at such a purely performance-based
assessment of Delphi. For example, Linstone and Turoff have argued against the view
of Delphi as a scientific tool answerable to those criteria that need to be fulfilled in order
for a test/procedure to be deemed useful (for instance, reliability) [18]. Instead, they have
240 G. ROWE ET AL.

characterized Delphi as a method for structuring a group communication such that it is


effective in allowing the group to deal with complex problems. Likewise, Linstone has
warned against “overselling” the technique, suggesting that its key advantage lies in
producing useful guidance or planning for decision makers [ 171, though this viewpoint,
perhaps unnecessarily, circumscribes the usefulness of Delphi to a limited number of
situations (such as for policy formation), when the technique may have greater potential
than that.
These latter arguments seem weak: without demonstrable improvements in perfor-
mance (which should reflect beneficial manipulations of group process), the technique
cannot justify its preeminence over simpler, less costly, and less time-consuming pro-
cedures. If the issue boils down to merely one of “economics” or “confidence” (in the
output, from the judges or organization), then the technique should have no place in the
interests of researchers. Clearly there is more to Delphi than that: if Delphi’s opponents
are to be answered, then their criticisms will need to be addressed in a more realistic and
immediate manner than simply declaring that they have “missed the point” [20].
Stronger rejoinders to Sackman’s critique have been put forward [17, 20, 361. For
example, Linstone cites Sackman’s concentration on “poorly executed applications of
Delphi” and claims that he “ignores significant supportive research“ [ 17, p. 2971. Others
have also noted the “sloppy execution” of many experimental studies of Delphi [37]. If
those studies considered by Sackman, and experimental demonstrations since that time,
were and have been inadequate, then it may be inappropriate to dismiss the Delphi
technique per se on the basis of their results.
It is our aim, in the following two sections, to reevaluate the empirical and theoretical
basis of Delphi. We will argue that although Sackman’s assertions were correct on the
basis of evidence available to him, the evidence itself is problematic. In the next section
we will thus look at methodological and technical problems with Delphi, concentrating
on the issue of the generalizability of experimental findings. We will argue that the ideal
of Delphi has been overlooked by the experimenters, and hence “Delphi” per se has
largely escaped examination-and thus is still to be studied. Reasons for such oversights
will be considered, as well as general implications as to the way in which we should
study Delphi.
The second of our “critical” sections will then look at theoretical reasons (based on
past research in the area of judgment and decision making, and in other fields of study)
as to why we might expect the ideal of Delphi to succeed or fail. In doing this, we will
elucidate several problems that might be expected to hinder Delphi efficacy, and suggest
areas of potential improvement. An emphasis of this section is thus on the potential of
Delphi to achieve its ostensible aim of ameliorating process loss, while maintaining the
benefits of interacting groups.
Having divided our reevaluation into two, it must be stressed that each section deals
with issues that might be said to overlap: thus are “theory” and “practice” ever linked.
Therefore, the sections should not be considered mutually exclusive in their coverage of
issues.

MethodologicaVTechnical Problems, and the Issue of Generalizability


While “sloppy execution,” as a general evaluation of Delphi studies, seems a touch
harsh, it cannot be disputed that most of the (far from numerous) scientific studies have
used versions of Delphi somewhat removed from the “classical” prototype of the tech-
nique. To begin with, the vast majority of studies use structured first rounds, preventing
the involvement of experts in expressing their beliefs as to what may be important in
DELPHI: A REEVALUATION 241

relation to the issue of interest, and hence denying the construction of coherent scenarios
for assessment. Indeed, much of the material used for assessment tends to be in the form
of isolated questions, either almanac-type questions or short-range forecasts (a step re-
moved from the long-range forecasts for which the technique was largely used on its
inception), While this simplification is reasonable in principle (after all, there is no reason
why Delphi should not be used for aiding forecasts of the near future or assessing present
trends for which suitable data may be lacking), the actual questions used are often highly
suspect. This seems to be especially the case with almanac items: it is difficult to see
how Delphi might fulfill its objectives when being used as an aid for helping groups of
unknowledgeable subjects, who are largely guessing the order of magnitude of some
quantitatively large and unfamiliar statistic (such as the tonnage of a certain material
shipped from New York in a certain year), in achieving some reasoned consideration of
others’ additional knowledge, and so producing improved estimates. In such circumstances
it is likely a matter of chance as to whether Delphi produces improvements in accuracy
or not.
Similarly, the pertinence of question scenarios must also depend upon the relevance
of the expertise of the panelists: “sensible” questions are only sensible to panelists if they
lie within their general realm of knowledge. Thus, not only are the assessment scenarios
simplified (and often meaningless), but the nature of the panelists is often such as to
make the scenarios meaning!ess. That is, Delphi was designed for use by experts making
meaningful judgments or forecasts, particularly in cases where a variety of factors (eco-
nomic, technical, and so on) might be such as to ensure the limited knowledge of each
individual expert, thus allowing each expert the opportunity to derive benefits from
communicating with others possessed of different information. This use of diverse experts
is rarely (if ever) found in laboratory situations: for the sake of simplification, most studies
employ homogeneous samples of undergraduate or graduate students, making judgments
about scenarios in which they can by no means be considered expert and about which
they are liable to possess near identical knowledge in any case (indeed, certain studies
have even started off by trying to standardize knowledge before a Delphi manipulation,
for example, Riggs [29]). This point must be important: if varied information is not
available to be shared, then what is the benefit of any grouplike aggregation?
Part of the problem here seems to stem from the inconclusiveness of those studies
on expertise in Delphi, in demonstrating-clearly and unequivocally-advantages in using
“expert” subjects as panelists, rather than “inexpert” ones. That is, while certain studies
have revealed that greater increases in accuracy over Delphi rounds may be obtained
through the selection of more “expert” panelists [24, 26, 29, 30, 38, 391, other studies
have failed to find such benefits [21, 40-421. With such ambiguity of findings, perhaps
it is not surprising that researchers have tended to use the latter set of studies as justification
for their usage of inexpert students, hence saving themselves the difficulties of finding
truly expert panelists. However, these variable findings do nor justify the disregard of
informed panelists: most of such studies have obtained their “experts” merely on the basis
of the self-rating of student subjects. The first point that arises here is that any real
differences in expertise between such groups of high and low raters are liable to be very
small, because all panelists come from the same basic group. The second point concerns
the appropriateness of self-rating as a true reflector of actual expertise: it is likely that
what such studies are identifying are not experts per se, merely those who believe
themselves to be expert-a distinction that may bear some parallel to that separating the
truly knowledgeable judges from the dogmatic ones whose dominating behaviors Delphi
is supposed to guard against. Obviously, before we can examine the role of expertise,
242 G. ROWE ET AL.

we must have a means for identifying the experts appropriately in the first place: self-
rating has not been shown to be a consistent or particularly useful tool in this respect.
With the above said, we might recommend that the best approach to selecting Delphi
panelists should involve some premeditated selection procedure, the specifics of which
could be tailored to the particular problem environment, and should rely upon objective
tests of relative expertise, with a key aspect being the choice of experts from varied
backgrounds to guarantee a wide base of knowledge [ 17, 351.
The divergence of laboratory studies from the ideal of Delphi on this issue of expertise
has important implications for the generalizability of Delphi research. In other fields of
study various authors have pointed out, theoretically, how the worth of any aggregating
technique must depend upon a variety of factors, including the extent of expertise of
group members. For example, Hogarth developed an analytical model to determine how
many and which experts should be included in a statistical group [43]. Hogarth’s model
yielded group validity as a function of the number of experts, their mean individual
validity, and the mean intercorrelation of their judgments (essentially, the extent to which
the experts possessed similar or different knowledge). Interestingly, he went on to dem-
onstrate how the addition of a more knowledgeable expert (with higher than average
individual validity) to a group might actually lead to a decrease in validity of the statis-
tically produced group response, if that expert were to also increase the intercorrelation
of the group’s judgment by a certain amount. Consequently, the greatest improvement
in group accuracy might actually be obtained by adding a less valid expert, who acted
to decrease interindividual correlation within the group (see also Einhom et al. [44] and
Ashton [45], who found some evidence that the model provided good approximations to
the actual validities of statistical groups in a “real-world” study).
A major issue arises from this work. Though Hogarth’s model concerns itself with
statistical groups (that is, nominal groups with responses derived through the aggregation
of individual responses), Hogarth has suggested that the model might still provide some
guidelines for interacting groups: it does seem reasonable that his identified factors are
liable to play a role in normal group processes. Regardless, it is readily apparent that
Delphi does entail characteristics of the statistical group. (Ferrell has classified it as a
“mixed’ approach, incorporating aspects of behavioral and statistical groups [5].) It is
possible to interpret the technique as a two-stage process in which a first “interacting”
stage seeks to debias individual judgments, while the following stage then equally weights
the individual judgments and statistically averages them. We represent this conceptual-
ization in Figure 2.
Hence, the factors identified by Hogarth [43] and Einhom et al. [44] that affect the
performance of statistical groups must play a role within the Delphi process since, after
iteration, we are left with a statistical group. The effectiveness of any individual Delphi
study is going to be influenced by aspects such as the number of panelists and their
relative validity or expertise. Since these factors have not been controlled for in existing
Delphi studies, we are lef with a host of studies telling us very little about the validity
of the Delphi technique per se, or in comparison to other techniques. The following
example illustrates this point: consider two studies of Delphi, the first employing six low-
knowledge subjects with virtually identical knowledge, the second employing an equal
number of moderately knowledgeable subjects with widely different expertise. From our
perspective on Delphi, we might see the procedure as functioning through the transfor-
mation of an initial nominal group into a final-round nominal group with different char-
acteristics (the same number of group members, yet members with new attributes in terms
of expertise and relative knowledge). The extent of improvements over rounds must then
DELPHI: A REEVALUATION 243

Processes Outcomes
Stage 1

~___-_--------~
I
1 Increased individual I
- iteration Iexpertise (reduced bias);
- feedback + +I I
- preservation- ) Increased convergence 1
of anonymity I of expert opinion 1
L______--_-_-_A

f
I

I
( Statistical ( c------_-__-___~
t
I
-4Random error removed i
L__________-_-_A
I

Fig. 2. Delphi as a two-stage process.

reflect the degree to which such transformations take place in a favorable direction. In
our example, we might expect the first study to show a convergence of opinions over
rounds, yet the Delphi “yield” will not necessarily show much in the way of enhanced
validity over the simple average of first-round opinions. In the second study, we might
likewise expect an increase in convergence of opinions (though perhaps not by much,
the more knowledgeable members perhaps being more confident in their knowledge and
hence feeling less temptation to move to the norm), while we might expect individual
validity of judgments to increase by a greater extent than in the first study (more knowledge
being available for sharing and subsequent opinion revision). Such conditions, as defined,
might then result in only little improvement in accuracy for the first group (or even a
decrease in accuracy), yet produce a substantially greater improvement in accuracy for
the second group. So: two studies of Delphi, one showing improvement (and outper-
forming the first-round statistical aggregation), and one showing no improvement (and
maybe leading to a decrement in the accuracy from the first-round statistical average).
The question that arises from our hypothetical example then becomes: What exactly can
we conclude about the validity of Delphi? The key point to be made here is that, without
adequate control of key influencing factors, such as the degree of panelist expertise, it
becomes difficult for us to compare one laboratory study to the next. Further, since the
ideal of Delphi recommends the use of experts with dSgPeringrealms of knowledge, and
the typical laboratory study employs nonexperts with essentially similar knowledge, we
244 G. ROWE ET AL.

cannot even generalize from the average experimental study (or from any?) to the ideal
of Delphi. One conclusion that emerges from this is that the recommended manner of
compromising Delphi groups, which is commonly utilized in real-world practice (for
instance, Martin0 [15]), will tend to produce greater benefits from the actual Delphi
“transformation” than the typical laboratory panel. In this case, we might expect that
potential exists for Delphi (conducted properly) to produce results far superior to those
thus far demonstrated by research.
Indeed, the influence of Hogarth’s factors is liable to affect the performance of other
aggregation techniques in addition to Delphi (such as normal interacting groups or NGT),
and similar arguments to those expressed above may then be leveled at those techniques
and studies. In which case, it becomes difficult to make definite conclusions from tech-
nique-comparison studies as to “what’s best?‘- a question for which we’d be naive to
expect a simple answer in any case.
Another major point of departure of the laboratory studies from the ideal of Delphi
lies in the nature of feedback given to the panelists. As has already been noted, feedback
should be presented in the form of medians plus arguments from those panelists whose
estimates fall outside certain limits of the average (for example, quartiles, see Linstone
and Turoff [ 181). However, in the majority of experimental examinations, simple averages
(medians or means) are usually given-and that is all. This has to be a crucial issue,
since the feedback is the intended means of conveying information from one panelist to
the next: by limiting the scope of feedback one limits the scope for improvements. Thus,
even if a Delphi panel is composed appropriately, with panelists who do have information
to share, unless these are allowed to exchange that information in an effective way, then
their valuable knowledge becomes useless. This issue of feedback utility will be considered
more fully in the next section: suffice it to say here that, again, we find a case where
laboratory simplifications have adversely influenced perceptions of Delphi effectiveness.
From our discussion so far, we suggest that the majority of experimental investi-
gations do little to enlighten us as to the usefulness of Delphi or its overall validity.
However, this does not absolve the proponents of the technique as such from all blame:
a comprehensive and useful tool should have more exact definitions as to its limits and
boundaries, thus enabling a standardization of the procedure for proper empirical study.
The fact that atypical exemplars are used in the laboratory reflects difficulties in the
development of analogues of “real-world’ applications. That is, because long-term fore-
casts are difficult to validate, experimenters have tended to use alternative, more im-
mediately verifiable items: since ecologically valid experiments are difficult and costly
to run, experimenters have used artificial situations and student subjects. Because language
is often ambiguous, making verification tricky (for instance, how does one define “quality
of life,” or what are the limits defining “outerspace” developments), experimenters have
replaced complex and meaningful (to an expert) scenario statements with simple questions
about uncontroversial outcomes. Such questions have often required panelists to make
guesses about quantities-such as the amount, in tonnes, of annual shipments from New
York harbor-about which the generation of guesses can be seen to require little in the
way of a causal theory that can be subsequently shared among panelists. While the
technique should not be written off because of these unavoidable difficulties for imple-
mentation, work needs to be done to overcome the lack of distinct specifications on
aspects such as the type of feedback that constitutes Delphi feedback, the criteria that
should be met for convergence of opinion to signal the cessation of polling, the selection
procedures that should be used to determine number and type of panelists, and so on.
Suggestions in the literature are made on these issues, yet not made explicitly or strongly
DELPHI: A REEVALUATION 24s

enough--else why the disregard by experimenters and researchers? Granted, the above
are issues upon which it is difficult to be terribly precise, but some greater precision is
required to prevent the technique from being misrepresented in the laboratory. Much of
the problem must lie in a lack of recognition of the importance of such issues (panel
composition and so forth) for the success of the Delphi approach.
To summarize: in this section we have largely confined ourself to an examination
of the application of Delphi. We have noted ambiguities within procedural definitions of
the “classic” technique which have encouraged simplified versions to be used in the
laboratory: we place some of the blame for oversimplification on the lack of a compre-
hensive definition of the limits of the procedure. This lack of definition, we have argued,
has led to a host of experimental studies that vary on major dimensions, allowing little
knowledge about the validity of Delphi, as practiced, to be gained and, indeed, allowing
only limited comparisons to be drawn between one laboratory study and another. Gen-
eralizations from experimental studies to the ideal of Delphi, as initially expressed by
the technique’s originators, would seem misplaced. Experimental evidence that Delphi
has sometimes led, or not led, to improved judgment over other procedures is, by itself,
of little value.
This is not to say that some useful experimental studies have not been done: studies
such as those by Parente et al. [46] and Hample and Hilpert [47], which have attempted
a kind of decomposition of Delphi, have great potential for telling us about the mechanics
of change in Delphi, and hence something of the technique’s potential (for example,
Parente et al. seemed to demonstrate iteration, rather than feedback, to be responsible
for the majority of accuracy improvement over rounds). Nevertheless, uncontrolled com-
parisons of procedures are futile: after all, the same factors that are liable to influence
Delphi effectiveness (for example, initial correlation of judges’ knowledge) are liable to
influence the effectiveness of those other procedures. So, does Delphi rectify group process
loss, overcoming the hurdles of interacting groups? Surely, we cannot tell-and it is
because we cannot tell that critiques (such as Sackman’s) which place great emphasis on
those inadequate laboratory studies are weakened. If we want to examine Delphi per se
we need a change of direction: since existing comparisons to other techniques contain
the difficulties in interpretation we have just outlined, our best hope of validating-or
exposing-Delphi lies in evaluating those principles upon which Delphi is based. We
need to demonstrate or refute such theoretical assertions, and to do this we need to
understand the process of judgment change brought about by Delphi. It is these issues
that are addressed in the next section.

The Theory Behind Delphi, and the Mechanics of Change


Earlier, we used Hogarth’s [43] analytical model to explain why comparisons across
Delphi studies are difficult. We now consider that model once more, this time from the
perspective of explaining the changes through the implementation of the Delphi procedure.
Figure 2 incorporates our view of Delphi as a two-stage process involving both behavioral
and statistical group components.
Of Hogarth’s three main factors affecting the efficiency of a statistical group (namely,
the number of group members, their individual validity, and the extent to which their
judgments are independent), the number of experts, intuitively, should play little role in
determining the improvement of a Delphi output over a statistical average of those same
individuals before manipulation. (It could, however, have some effect-for example, a
larger number of panelists could improve the probability that at least one member knows
the answer or could figure it out, and sharing knowledge through feedback might influence
246 G. ROWE ET AL.

others. Regardless, it should play a role in determining the overall accuracy of the panel.)
We can certainly see that feedback should play a definite role in altering the validity of
the individual panelists through information exchange, while also increasing the conver-
gence of the individual judgments. Hopefully, other aspects of Delphi-such as the
necessity of “anonymity’‘-should ensure that convergence occurs in line with the im-
proved validity of the panelists (i.e., change in validity for the better). Initially, the role
of “iteration” would seem to be merely as the medium through which change may come
about.
Thus, depending upon the influences provoked by Delphi, the final statistical ag-
gregation may be better or worse than the initial average according to the new charac-
teristics of the panel after final polling. As noted earlier, we may visualize this process
as changing one nominal group of individuals for another: the second group need not be
better than the first, particularly when the individual validity of each of the panelists is
not increasedar not increased enough-to compensate for the higher convergence of
individual responses.
By phrasing an explanation of Delphi change, taking into account Hogarth’s work,
we provide a consistent basis for explaining why laboratory studies of Delphi have
produced such variable results. Most studies, however, tell us very little about whether
the above propositions are accurate descriptions of the mechanisms of Delphi transfor-
mations-as we have argued, studies have tended to overly concentrate on gross perfor-
mance measures of the technique as a whole, rather than considering underlying processes.
Nevertheless, there have been a few studies conducted in this latter vein which have,
surprisingly, supported a view of change far different from that theorized. These studies
have generally concentrated on the role of the feedback and iteration aspects [27, 46,
471. Parente et al. used student subjects to forecast “if’ and “when” certain general
newsworthy events would occur [46]. Finding, in their second experiment, that group
accuracy was better than 95% of individual panelists’ judgments (for “if’ and “when”
criteria), they went on, in their third experiment, to separate out the iteration and feedback
components. Doing this, they found that iterated polling and consensus feedback had
little effect upon the accuracy of group “if’ predictions, while iteration did result in error
reduction for when a predicted event would occur, yet the feedback alone did not encourage
accuracy improvement. Similarly, Boje and Mumighan found a decrease in accuracy for
Delphi rounds, and yet an individual iteration procedure, without feedback of group
consensus, actually resulted in increased accuracy [27]. Hample and Hilpert also found
individual iteration to be responsible for the majority of improvements over rounds, with
added feedback providing only small additional gains [471.
These results seem most peculiar, for they indicate that panelists may produce more
valid judgments over rounds in the absence of the additional information of the other
group members’ estimates. If this is is the usual process in Delphi, then the whole
procedure could be made far simpler (and easier to implement) by simply having indi-
viduals reassess initial estimates and then aggregating such latter estimates from a number
of judges. For an explanation of the results, we need to address the specific nature of the
Delphis conducted in these studies--for they tend to follow the same trend of oversim-
plification noted in the earlier section (though here they have produced some interesting
results).
Two factors particularly need to be considered: first, the unexpected effectiveness
of iteration on its own, and, second, the lack of impact of provided feedback. Taking
the first point, the results may well reflect the type of quantitative judgments required in
the typical laboratory study. For example, various biases have been shown to manifest
DELPHI: A REEVALUATION 241

themselves in both expert and nonexpert judges (in a wide variety of judgmental domains),
one type going under the name of anchoring and adjustment [48, 491, which involves
judges taking some initial reference point (either supplied intentionally or unintentionally
by the experimenter, or derived internally) and then adjusting their responses in the desired
direction by an amount which, usually, is insufficient. Thus, we might suggest, the
additional pollings that take place in Delphi allow panelists the opportunity to anchor
upon their first-round estimates and later adjust their responses further in the light of
deliberation induced by the second polling. As long as panelists have some idea as to
where the true answer should lie, then this mechanism would explain improvements in
accuracy over rounds-at least in the “decomposition” studies mentioned here. In the
full-blown Delphi procedure, the median values supplied are liable to act as surrogate
anchor points for each of the judges, supplanting their own first-round estimates. This
proposed mechanism is, of course, only a hypothesis--and yet a hypothesis which, if
empirically demonstrated to be true, could point to a very easy and effective means for
enhancing judgmental performance.
This brings us on to the second point, for, if it is the iteration process itself that
accounts for judgmental improvement over rounds, then we must ask ourselves: What is
the role of feedback? Intuitively, one would expect feedback to be the main force of
judgment change, and the fact that studies reveal convergence does take place toward
the median shows that it does exert influence-yet influence not necessarily in the right
direction. This relates to an aforementioned point: Parente and Anderson-Parente observed
that, theoretically, two forces should exert pull on the panelists’s responses: M (the
median) and T (the true value-a somewhat curious concept for how can a “true” but
unknown value exert influence?) [ 191. The influence of M is easily seen; the variable
influence of T is revealed by the inconstant nature of improvement (which may be largely
attributable to iteration in any case).
Perhaps these results are hardly surprising for a simple reason: the great majority of
laboratory studies feedback only medians (or means), that is, a solitary statistic. But what
informative content does such feedbackpossess? How can ideas be transferred, or positions
stated and clarified, when all that is available is an isolated number, or perhaps some
indication of the spread of panelists’ opinions ? “Classical” Delphi recommends feedback
of arguments by those panelists who fall outside certain limits [17], but it is questionable,
again, as to how much knowledge may be circulated when the majority of panelists have
so little to say or to communicate with their fellows. Indeed, in the field of social
psychology we might see a parallel with the M and T aspects of Delphi, where a similar
dichotomy is suggested to account for the influence for change within a group. Deutsch
and Gerrard stated such a dichotomy in terms of “normative” vs “informative” influences
[SO], though recent authors have expressed essentially the same distinction in terms of
“social comparative” vs “persuasive argument” influences [5 11. The former factor relates
to a movement in position inspired by the need to be accepted-or at least not rejected-
by the group, while the latter factor relates to change inspired by the content and per-
suasiveness of others’ arguments. It may be interesting to note that several studies have
found “mere exposure” effects where subjects’ positions have been shown to shift solely
on the basis of being presented with the average opinion of a group of others [52] and
that, further, such effects have been interpreted as evidence in favor of normative or
social comparative influences [5 11. These “mere exposure” effects bear an uncanny re-
semblance to the effects of Delphi manipulations.
Further consideration of Delphi makes it clear as to how those very same unwanted
factors causing process loss in interacting groups may still exert their influences within
248 G. ROWE ET AL.

the nominal groups of Delphi (irrespective of the anonymity of the panelists). Hill and
Fowles have noted how bandwagon and fatigue effects might lead to concurrence with
group opinion for reasons other than the reasoned consideration of arguments (again,
what arguments?) [37]. Also, as already stated, the theory of Delphi presumes that the
“holdouts,” who draw the “swingers” toward their estimates, are those panelists most
sure of their position through their superior knowledge-yet, does it not seem more likely
that the chief criteria for such intransigence be dogmatism, rather than knowledge? Sniezek
has demonstrated a relationship between confidence in opinion and amount of change
over Delphi rounds, corresponding with improvements in accuracy over rounds [34]. In
this case it appears that confidence was correlated with knowledge-but there is no reason
to suppose that this will always, or even mostly, be the case. (Indeed, the misplaced
confidence of human judges is a frequently reported bias in studies on judgmental per-
formance; see, for example, Lichtenstein et al. [53].)
It would seem, then, that Delphi does not remove the influence of the dominant
individual after all: social pressures are probably still felt even though they are less
immediate and threatening than in a normal interacting group. The worrying aspect about
all this is that, though we can see reason for the amelioration of the negative aspect of
interacting groups (to some extent), it is also apparent that the beneficial informative
function of the group must suffer as well-and probably to a greater degree. Removing
sources of process loss are all very well, but extinguishing the opportunity for process
gain must negate any improvements, and so prevent Delphi from achieving its objectives.
Therefore, though we have suggested that, empirically, it is difficult to draw conclusions
about the issue “does Delphi alleviate process loss in groups?‘, theoretically it would
seem doubtful as to whether this should be so, at least doubtful as to whether Delphi
should outperform consensus groups as a matter of course. This is not to say that the
technique is worthless-far from it-but rather that, in its presently defined format, it is
difficult to see how it might produce vast improvements in judgment (over the individual
judge) and improvements over other approaches (such as consensus groups). With a few,
perhaps minor alterations, all this might change, but in order to determine this, more
studies need be done on the mechanics of the technique in order to determine the best
means of composing a Delphi-like approach.

Conclusions
This paper set out to critically examine the Delphi technique in order to determine
whether it fulfills its intentions of alleviating the process loss which is typical of interacting
groups. In composing this evaluation, we have examined “methodological and technical”
problems arising in the experimental studies, and concluded that these often diverge
considerably from the original concept of Delphi, making comparisons difficult-both
between them and the ideal of Delphi and between each other. This lack of ecological
validity of conducted experiments (telling us very little of the validity of Delphi as an
ideal or as practiced), makes it difficult to draw overa!l conclusions about Delphi or about
the effectiveness of Delphi in comparison with other nominal and interacting group
techniques.
We went on to consider, theoretically, why Delphi might or might not work, and
concluded that inadequacies in the nature of feedback typically given mean that small
gains in the resolution of process loss are liable to be more than offset by the removal
of the chance for process gain. Hill and Fowles [37] have advocated that we keep the
premise of Delphi, yet revise its methodology in such a way that agreement may arise
from the consideration of evidence, not as an “artifact of the method.” This seems a
DELPHI: A REEVALUATION 249

sensible suggestion: after all, as Parente and Anderson-Parente have noted, it would be
premature to “throw out the oracle with the holy water” [19, p. 1401. The idea of removing
components leading to process loss is sound; anonymity seems sensible (at least anonymity
at the ultimate judgmental stage); iteration itself may be promising for improving judgment
through induced deliberation; and feedback can widen knowledge and stimulate new
ideas.
Further research also suggests itself in the realm of “expertise,” particularly in
consideration of the relationship between confidence-in-assessment and actual expertise
and in the search for more appropriate criteria for determining the existence of (relative)
expertise. Successful selection of expert panelists should occur before a Delphi procedure
is undertaken, though it may also be used aposteriori in order to determine which panelists
should be included in statistical aggregation and which might be dropped [30].
Greater care should be taken in comparison (of technique) studies if these are to
continue; but for these to be meaningful, we need greater understanding of the influences
of group factors on the output and effectiveness of the different techniques (Hogarth’s
factors, for instance). First priority, thus, should be given over to more intense analysis
of the mechanics of change in nominal and interacting groups, which should subsequently
allow us to develop stronger theoretical frameworks on which to construct techniques for
improving judgment and forecasting.
We conclude that Delphi does have potential as a judgment-aiding technique-but
improvements are needed, and the basis of these improvements can only be determined
by a more thorough consideration of the mechanics of judgment change within groups
and a greater understanding of the factors that influence the validity of statistical/nominal
groups.

References
1. Hill, G. W., Group versus Individual Performance: Are N+ 1 Heads Better than One? Psychologicul
Bull&n 91(3), 517-539 (1982).
2. Lock. A., Integrating Group Judgments in Subjective Forecasts, in Judgmenfal Forecasting. G. Wright
and P. Ayton, eds., Wiley, Chichester, 1987.
3. Hackman, J. R., and Morris, C. G., Group Tasks, Group Interaction, Process and Group Performance
Effectiveness: A Review and Proposed Integration, Advances in Experimentul Social Psychology 8, 45-
99 (1975).
4. Sniezek, J. A., and Henry, R. A., Accuracy and Confidence in Group Judgment, Organizurional Behavior
and Human Decision Processes 43, l-28 (1989).
5. Ferrell, W. R., Combining Individual Judgments, in Behavioral Decision Making. G. Wright, ed., Plenum,
New York, 1985.
6. Flores, B. E., and White, E. M., Subjective vs Objective Combining of Forecasts: An Experiment, Journal
of Forecasring 8, 331-341 (1989).
7. Nishett, R., and Ross, R. L., Human Inference: Strategies and Shortcomings of Social Judgement, Prentice-
Hall, Englewood Cliffs, NJ, 1980.
8. Large, I., Fox, D., Davitz, J., and Brenner, M., A Survey of Studies Contrasting the Quality of Group
Performance and Individual Performance, Psychological Bulletin 55, 337-372 (1958).
9. Rowse, G. L., Gustafson, D. H., and Ludke, R. L., Comparison of Rules of Aggregating Subjective
Likelihood Ratios, Organizational Behavior and Human Performance 12, 274-285 (1974).
10. Uecker, W. L., The Quality of Group Performance in Simplified Information Evaluation, Journal oj
Accounting Research 20, 388-402 (1982).
1I. Hastie, R., Experimental Evidence on Group Accuracy, in Decision Research, vol. 2. B. Grafman and
G. Owen, eds., JAI Press, Greenwich, CT, 1986.
12. Steiner, I.D., Group Process and Productivity, Academic Press, New York, 1972.
13. Hoffman, L. R., Group Problem Solving, in Advances in Experimental Social Psychology, vol. 2. L.
Berkowitz, ed., Academic Press, New York, 1965.
14. Janis, I., Victims of Groupthink. Houghton Mifflin, Boston, 1972.
15. M&no, J., Technological Forecasting for Decision-Making, 2nd ed., Elsevier, New York, 1983.
250 G. ROWE ET AL.

16. Dalkey, N. C., and Helmer, Q., An Experimental Application of the Delphi Method to the Use of Experts,
Management Science 9, 458-467 (1963).
17. Linstone, H. A., The Delphi Technique, in Handbook ofFutures Research. R. B. Fowles, ed., Greenwood
Press, Westport, CT, 1978.
18. Linstone, H., and Turoff, M., TheDelphiMethod: Techniques andApplications, Addison-Wesley, London,
1975.
19. Parente, F. J., and Anderson-Parente, J. K., Delphi Inquiry Systems, in Judgmental Forecasting. G.
Wright and P. Ayton, eds., Wiley, Chichester, 1987.
20. Coates, J. F., In Defense of Delphi: A Review of Delphi Assessment, Expert Opinion, Forecasting and
Group Process by H. Sackman, Technological Forecasting and Social Change 7, 193-194 (1975).
21. Brockhoff, K., The Performance of Forecasting Groups in Computer Dialogue and Face to Face Discussions,
in The Delphi Mefhod: Techniques and Applications. H. Linstone and M. Turoff, eds., Addison-Wesley,
London, 1975.
22. Dally, N. C., Towards a Theory of Group Estimation, in TheDelphiMethod: Techniques andApplications.
H. Linstone and M. Turoff, eds., Addison-Wesley, London, 1975.
23. Brown, B., and Helmer, O., Improving the Reliability of Estimates Obtained from the Consensus of
Experts. The RAND Corporation, P-2986, 1964.
24. Dalkey, N. C., and Brown, B., Comparison of Group Judgement Techniques with Short-Range Predictions
and Almanac Questions. The RAND Corporation, R-678.ARPA, 1971.
25. Helmer, O., Convergence of Expert Consensus through Feedback. The RAND Corporation, P-2973, 1964.
26. Jolson, M. A., and Rossow, G., The Delphi Process in Marketing Decision Making, Journal of Marketing
Research 8, 443-448 (1971).
27. Boje, D. M., and Mumighan, J. K., Group Confidence Pressures in Iterative Decisions, Management
Science 28, 1187-l 196 (1982).
28. Gustafson, D. H., Shukla, R. K., Delbecq, A., and Walster, G. W., A Comparison Study of Differences
in Subjective Likelihood Estimates Made by Individuals, Interacting Groups, Delphi Groups, and Nominal
Groups, Organizational Behavior and Human Performance 9, 280-291 (1973).
29. Riggs, W. E., The Delphi Method: An Experimental Evaluation, Technological Forecasting and Social
Change 23, 89-94 (1983).
30. Larreche, J. C., and Moinpour, R., Managerial Judgment in Marketing: The Concept of Expertise, Journal
of Marketing Research 20, 110-121 (1983).
31. Gough, R., The Effect of Group Format on Aggregate Subjective Probability Distributions, in Litility,
Probability, and Human Decision Making. D. Wendt and C. Vlek, eds., Reidel, Dordrecht, Holland,
1975.
32. Seaver, D. A., Assessing Probability with Multiple Individuals. Unpublished doctoral dissertation, Uni-
versity of Southern California, Los Angeles, 1979.
33. Fischer, G. W., When Oracles Fail: A Comparison of Four Procedures for Aggregating Subjective Prob-
ability Forecasts, Organizational Behavior and Human Performance 28, 96110 (1981).
34. Sniezek, J. A., An Examination of Group Process in Judgmental Forecasting, International Journal of
Forecasting 5, 171-178 (1989).
35. Sackman, H., Delphi Critique. Lexington Books, Lexington, MA, 1974.
36. Scheele, D. S., Consumerism Comes to Delphi: Comments on Delphi Assessment, Expert Opinion,
Forecasting and the Group Process by H. Sackman, Technological Forecasting and Social Change 7,215-
219 (1975).
37. Hill, K. Q., and Fowles, J., The Methodological Worth of the Delphi Forecasting Technique, Technological
Forecasting and Social Change 7, 179-192 (1975).
38. Best, R. J., An Experiment in Delphi Estimation in Marketing Decision Making, Journal of Marketing
Research 11, 448-452 (1974).
39. Dalkey, N. C., Brown, B., and Co&ran, S. W., The Delphi Method III: Use of Self-Ratings to Improve
Group Estimates, Technological Forecasting 1, 283-291 (1970).
40. Bender, A. D., Strack, A. E., Ebright, G. W., and von Haunalter, G., Delphi Study Examines Devel-
opments in Medicine, Futures 1, 289-303 (1969).
41, Salancik, J. R., Wenger, W., and Helfer, E., The Construction of Delphi Event Statements, Technological
Forecasting and Social Change 3, 65-73 (1971).
42. Winkler, R. L., Probablistic Prediction: Some Experimental Results, Journal of the American Statistical
Association 66, 675-685 (1971).
43. Hogarth, R. M., A Note on Aggregating Opinions, Organizational Behaviour and Human Performance
21, 40-46 (1978)
44. Einhom, H. J., Hogarth, R. M., and Klempner, E., Quality of Group Judgment, Psychological Bulletin
84, 158-172 (1977).
DELPHI: A REEVALUATION 251

45. Ashton, R. H., Combining the Judgments of Experts: How Many and Which Ones? OrganizarionalEehuvior
and Human Decision Processes 38, 405-414 (1986).
46. Parente, F. J., Anderson, J. K., Myers, P., and O’Brien, T., An Examination of Factors Contributing to
Delphi Accuracy, Journal of Forecasfing 3(2), 173-182 (1984).
47. Hample, D. J., and Hilpert, F. P., A Symmetry Effect in Delphi Feedback. Paper presented at the
International Communication Association Convention, Chicago, 1975.
48. Northcroft, M. A., and Neale, G. B., Experts, Amateurs and Real-Estate: An Anchoring and Adjust
Perspective in Property Pricing Decisions, Organizational Behavior and Human Decision Processes 39,
84-97 (1987).
49. Tversky, A., and Kahneman, D., Judgment under Uncertainty: Heuristics and Biases, Science 185, 1124-
1131 (1974).
50. Deutsch, M., and Gerard, H. B., A Study of Normative and Informational Social Influences Upon Individual
Judgment, Journal of Abnormal and Social Psychology 5 1, 629-636 (1955).
51. Isenberg, D. J., Group Polarization: A Critical Review and Meta-analysis, Journal of Persona&y and
Social Psychology 50, 1141-l 151 (1986).
52. Myers, D. G., Polarizing Effects of Social Comparisons, Journal of Experin:ental Social Psychology 14,
554-563 (1978).
53. Lichtenstein, S., Fischoff, B., and Phillips, L. D., Calibration of Probabilities: Tbe State of the Art to
1980, in Judgment under Uncertainty: Heuristics and Biases. D. Kahneman, P. Slavic, and A. Tversky,
eds., Cambridge University Press, Cambridge, 1982.

Received 1 May 1990

View publication stats

You might also like