International Journal of Forecasting: Emre Soyer Robin M. Hogarth
International Journal of Forecasting: Emre Soyer Robin M. Hogarth
evidence demonstrating that such presentation effects do to improve inferences. However, presenting the results
occur. Many studies have shown, for example, the way in in graphical fashion alone improved the accuracy. The
which subtle changes in questions designed to elicit prefer- implications of our findings, including suggested ways of
ences are subject to contextual influences (see, e.g., Kahne- improving statistical reporting, are discussed in Section 5.
man & Tversky, 1979). Moreover, these have been reported
in both controlled laboratory conditions and field stud- 2. Current practice
ies involving appropriately motivated experts (Camerer,
2000; Thaler & Sunstein, 2008). The human information There are many sources of empirical analyses in
processing capacity is limited, and the manner in which economics. In order to obtain a representative sample of
attention is allocated has important implications for both current practice, we selected all of the articles published
revealed preferences and inferences (Simon, 1978). in the 3rd issues (of each year) of four leading journals
Recently, Gigerenzer and his colleagues (Gigerenzer, between 1998 and 2007 (441 articles). The journals were
Gaissmaier, Kurz-Milcke, Schwartz, & Woloshin, 2007) American Economic Review (AER), Quarterly Journal of
reviewed research on how probabilities and statistical in- Economics (QJE), Review of Economic Studies (RES) and
formation are presented, and consequently perceived, by Journal of Political Economy (JPE). Among these articles,
individuals or specific groups that use them frequently in we excluded those with time series analyses, and only
their decisions. They show that mistakes in probabilistic included those with cross-sectional analyses where the
reasoning and the miscommunication of statistical infor- authors identify one or more independent variables as
mation are common. Their work focuses mainly on the statistically significant causes of relevant economic and
fields of medicine and law, where doctors, lawyers and social outcomes. Our aim is to determine how the
judges fail to communicate crucial statistical information consumers of this literature translate the findings about
appropriately in particular situations, thereby leading to average causal effects into perceptions of predictability.
biased judgments that have a negative impact on others. Many of the articles published in these journals are em-
One such example is the failure of gynecologists to infer pirical. Over 70% of the empirical analyses use variations
the probability of cancer correctly, given the way in which of regression analysis, of which 75% have linear specifi-
mammography results are communicated. cations. Regression analysis is clearly the most prominent
We examine the way in which economists communi- tool used by economists to test hypotheses and identify re-
cate statistical information. Specifically, we note that much lationships among economic and social variables.
of the work in empirical economics involves the estima- In economics journals, empirical studies follow a
tion of average causal effects through the technique of common procedure for displaying and evaluating results.
regression analysis. However, when we asked a large sam- Typically, authors provide a table that displays the
ple of economists to use the standard reported outputs of descriptive statistics of the sample used in the analysis.
the simplest form of regression analysis to make proba- Either before or after this display, they describe the
bilistic forecasts for decision making purposes, nearly 70% specification of the model on which the analysis is based,
of them experienced difficulties. The reason for this, we then provide the regression results in detailed tables. In
believe, is that current reporting practices focus attention most cases, these results include the coefficient estimates
on the uncertainty surrounding the model parameter esti- and their standard errors, along with other frequently
mates, and fail to highlight the uncertainty concerning out- reported statistics, such as the number of observations and
comes of the dependent variable conditional on the model the R2 values.
identified. On the other hand, when attention was directed Table 1 summarizes these details for the sample of
appropriately – by graphical as opposed to tabular means studies referred to above. It shows that, apart from
– over 90% of our respondents made accurate inferences. the regression coefficients and their standard errors (or
In the next section, we provide some background on t-statistics), there is not much agreement as to what
the practice and evolution of reporting empirical results in else should be reported. The data therefore suggest that
economics journals. In Section 3 we provide information economists probably understand the inferences that can
concerning the survey we conducted with economists, be made about regression coefficients or the average
which involved them answering four decision-oriented impact of manipulating an independent variable quite
questions based on a standard format for reporting the well; however, their ability to make inferences about other
results of regression analyses. We employed six different probabilistic implications may be less well developed (e.g.,
conditions designed to assess the differential effects due to predicting individual outcomes conditional on specific
model fit (R2 ) and different forms of graphical presentation inputs).
(with and without accompanying statistics). In Section 4, It is not clear when, how, or why the above manner
we present our results. In brief, our study shows that of presenting regression results in publications emerged.
the typical presentation format of econometric models No procedure is ever explicitly stated in the submission
and results – one based mainly on regression coefficients guidelines for the highly ranked journals. Moreover,
and their standard errors – leads economists to ignore popular econometric textbooks, such as those of Greene
the level of predictive uncertainty implied by the model (2003), Gujarati and Porter (2009) and Judge, Griffiths, Hill,
and captured by the standard deviation of the estimated and Lee (1985) do not explain specifically how to present
residuals. As a consequence, there is a considerable results or how to use them for decision making. Hendry
illusion of predictability. Adding graphs to the standard and Nielsen (2007) address issues regarding prediction
presentation of coefficients and standard errors does little in more detail than other similar textbooks. Another
E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711 697
Table 1
Distribution of types of statistics provided by studies in our sample of economics journals.
Studies that: Journals % of total
AER QJE JPE RES Total
exception is Wooldridge (2008), who dedicates several statistics can convey a message to readers about the
sections to presentation issues. His outline suggests that a level of uncertainty in the results. These are R2 and the
good summary consists of a table with selected coefficient Standard Error of the Regression (SER).1 As a bounded and
estimates and their standard errors, R2 statistics, a standardized quantity, R2 describes the fit of a model. SER,
constant, and the numbers of observations. Indeed, this on the other hand, provides information on the degree of
is consistent with today’s practice. More than 60% of the predictability in the metric of the dependent variable.
articles in Table 1 follow a similar procedure. Table 1 shows that SER is practically never given in
Zellner (1984) conducted a survey of statistical practice the presentation of results: less than 10% of the studies
based on articles published in 1978 in the AER, JPE, with linear specifications provide it. R2 is the prevalent
International Economic Review, Journal of Econometrics and statistic reported to give an indication of model fit. This
Econometrica. He documented confusion as to the meaning is the case for 80% of published articles with a linear
of tests of significance, and proposed Bayesian methods for specification. Table 1 also shows that more than 40% of the
overcoming theoretical and practical problems. Similarly, publications in our sample that utilize a linear regression
McCloskey and Ziliak (1996) provided an illuminating analysis (excluding studies that base their main results on
study of statistical practice based on articles published an IV regression) provide no information on either R2 or
in AER in the 1980s. They demonstrated that there was the standard deviation of the dependent variable. Hence,
widespread confusion in the interpretation of statistical a decision maker consulting the results of these studies
results, due to a confounding of the concepts of statistical cannot infer much about either the unexplained variance
and economic or substantive significance. Too many within the dependent variable or the cloud of data points to
results depended on whether the t- or other statistics which the regression line is fitted. Alternatively, a scatter
exceeded arbitrarily defined limits. In follow-up studies, plot would be essential in order to indicate the degree of
Ziliak and McCloskey (2004, 2008) report that, if anything, uncertainty. However, less than 40% of the publications in
this situation worsened in the 1990s (see also Zellner, our sample provide a graph with actual observations.
2004). Given the prevalence of empirical analyses and their
Empirical finance has developed an illuminating way potential use for decision making and prediction, debates
of determining the significance of findings. In this field, about how to present results are important. However, it is
once statistical analysis has identified a variable as being important that such debates be informed by evidence as to
‘‘important’’ in affecting, say, stock returns, it is standard to the way in which knowledgeable individuals use currently
assess ‘‘how important’’ it is by evaluating the performance available tools for making probabilistic inferences, and
of simulated stock portfolios that use the variable (see, e.g., the way in which different presentation formats affect
Carhart, 1997, and Jensen, 1968). judgment. Our goal is to provide such evidence.
In psychology, augmenting significance tests with the
effect size became common practice in the 1980s. For ex- 3. The survey
ample, in its submission guidelines, Psychological Science,
the flagship journal of the Association for Psychological
3.1. Goal and design
Science, explicitly states, ‘‘effect sizes should accompany
major results. When relevant, bar and line graphs should
How do knowledgeable individuals (economists) inter-
include distributional information, usually confidence in-
pret specific decision making implications of the standard
tervals or standard errors of the mean’’.
output of a regression analysis? To find out, we used the
In forecasting, Armstrong (2007) initiated a discussion
following criteria to select the survey questions. First, we
on not only the necessity of using effect size measures
provided information about a well-specified model that
when identifying relationships among variables, but also
strictly met the underlying assumptions of linear regres-
the fact that significance tests should be avoided when
sion analysis. Second, the model was straightforward, in
doing so. He argues that the results of significance
tests are often misinterpreted, and even when presented
and interpreted correctly, they do not contribute to the
1 Some sources refer to SER as the Standard Error of Estimates, or SEE
decision making process. Schwab and Starbuck (2009)
(see RATS), while others refer to it as the root Mean Squared Error or root-
make an analogous argument for management science. MSE (see STATA). Wooldridge (2008) uses the term Standard Error of the
In interpreting the results of linear regression analysis Regression (SER), defining it as ‘‘an estimator of the standard deviation of
from a decision making and predictive perspective, two the error term’’.
698 E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711
that it had only one independent variable. Third, all of the 2. What minimum, positive value of X would make sure,
information necessary for solving the problems posed was with 95% probability, that the individual obtains more
available from the output provided. Fourth, although suffi- Y than a person who has X = 0?
cient information was available, respondents had to apply 3. Given that the 95% confidence interval for β is (0.936,
knowledge about statistical inference in order to make the 1.067), if an individual has X = 1, what would be the
calculations necessary for answering the questions. probability that s/he gets Y > 0.936?
This last criterion is the most demanding, because 4. If an individual has X = 1, what would be the
whereas economists may be used to interpreting the probability that s/he gets Y > 1.001 (i.e. the point
statistical significance of regression coefficients, they estimate)?
typically do not assess the uncertainties involved in The questions for Conditions 2, 4, and 6 were the same,
prediction when an independent variable is changed or except that the confidence interval for β is (0.911, 1.130),
manipulated (apart from making ‘‘on average’’ statements and we ask about the probabilities of obtaining Y >
that give no hint as to the distribution around the average). 0.911 and Y > 1.02, given X = 1, in questions 3 and
Our study required respondents to answer four decision 4 respectively. All four questions are reasonable, in that
making questions, after being provided with information they seek answers to questions that would be of interest
about a correctly specified regression analysis. There were to decision makers. However, they are not the types of
six different conditions, which varied in the overall fit of questions that reports in economics journals usually lead
the regression model (Conditions 1, 3, and 5 with R2 = readers to pose, and thus, they test a respondent’s ability to
0.50, the others with R2 = 0.25), as well as in the amount reason in a correct statistical manner given the information
and type of information provided. Figs. 1 and 2 report the provided. In Appendix A, we provide the rationale behind
information provided to the respondents for Conditions 1 the questions and the correct answers.
and 2, which is similar in form and content to the outputs
of many reports in the economic literature (and consistent 3.3. Respondents and method
with Wooldridge, 2008). Conditions 3 and 4 used the
same tables, but provided the bivariate scatter-plots of the We sent web-based surveys to faculty members in
dependent and independent variables in addition to the economics departments at leading universities worldwide.
standard deviation of the estimated residuals—see Figs. 3 From the top 150 departments, ranked by numbers of
and 4. In Conditions 5 and 6, the statistical outputs of the econometric publications between 1989 and 2005 (Baltagi,
regression analyses were not provided, but the bivariate 2007, Table 3), we randomly selected 113.3 Within each
graphs of the dependent and independent variables were, department, we randomly selected up to 36 faculty
as in Figs. 3 and 4.2 In other words, for these two conditions members. We ordered them alphabetically by their names
we were intrigued by what would happen if respondents and assigned Condition 1 to the first person, Condition 2 to
were limited to only consulting graphs. the second person,. . . , Condition 6 to the sixth person, then
Similarly to our survey on current practice in Section 2, again Condition 1 to the seventh person, and so on.
we again restrict our attention to cross-sectional analy- We conducted the survey online by personally sending
ses in our experimental conditions. We are primarily con- a link for the survey, along with a short explanation, to the
cerned with determining the way in which findings on professional email address of each prospective participant.
average causal effects are used for predictions and decision In this way, we managed to keep the survey strictly
making. Our variations over different conditions would not anonymous. We do know the large pool of institutions
be valid for time series studies, where the R2 statistic does to which the participants belong, but have no means of
not provide information on the model fit. It is important identifying the individual sources of the answers. The
to add that results are also discussed in the text in pub- participants answered the survey voluntarily. They had
lished papers. These discussions, which are mostly con- no time constraints and were allowed to use calculators
fined to certain coefficient estimates and their statistical or computers if they wished. We told all prospective
significance levels, might distract decision makers from participants that, at the completion of the research, the
the uncertainties about outcomes. None of our conditions study along with the feedback on questions and answers
involve such discussions. would be posted on the web and that they would be
notified,4 but did not offer them any economic incentives
for participation.
3.2. Questions
As can be seen from Table 2, we dispatched a total of
3013 requests to participate. About one-quarter of poten-
For Conditions 1, 3, and 5, we asked the following
tial respondents (26%) opened the survey and, we presume,
questions:
Fig. 1. Presentation of Condition 1. This mimics the methodology of 60% of the publications that were surveyed, and also the suggestions of Wooldridge
(2008).
looked at the set-ups and questions. About a third of these econometricians, and more than two-thirds (77%) used
(or 9% of all potential respondents) actually completed regression analysis in their work (41% ‘‘often’’ or ‘‘always’’).
the survey. The proportions of potential respondents who
opened the surveys and responded was highest for Condi-
4. Results
tions 5 and 6 (40%), as opposed to the 30% and 32% in Con-
ditions 1 and 2, and 3 and 4, respectively. The average time
taken to complete the survey was also lowest for Condi- 4.1. Condition 1
tions 5 and 6 (see the notes to Table 2). We consider these
outcomes again when we discuss the results below. The respondents’ answers to Condition 1 are summa-
Table 2 documents characteristics of our respondents. rized in Fig. 5. Three answers were removed from the data,
In terms of position, the majority (59%) are at the rank being only ‘‘I don’t know’’, or ‘‘?’’. For the first two ques-
of Associate Professor or higher. They also work in a tions, responses within ±5 of the correct amount were
wide variety of fields within the economics profession. considered correct. For questions 3 and 4, we considered
Thirteen percent of respondents classified themselves as correct any responses that were within ±5% of the answer.
700 E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711
Fig. 3. Bivariate scatter plot of Condition 1 and information on SER. Both were provided to participants in Condition 3, along with the estimation results.
Only the graph was provided in Condition 5.
Fig. 4. Bivariate scatter plot of Condition 2 and information on SER. Both were provided to participants in Condition 4, along with estimation results. Only
the graph was provided in Condition 6.
E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711 701
Fig. 5. Histograms for the responses to Condition 1. The top-left figure shows answers to question 1, the one on the top-right shows answers to question
2, the one on the bottom-left those to question 3, and the one on the bottom-right those to question 4. Each histogram also displays the question and the
approximate correct answer. The dark column identifies the responses that we considered correct. Above each column is the number of participants who
gave that particular answer. There were 39, 35, 45 and 44 responses to questions 1–4, respectively.
person with X = 0 will also be subject to a random Y > 0.936 is almost certain. Incidentally, the high rate
shock, the value of X needed to ensure this condition of correct answers to question 4 suggests that the failure
is approximately 67. to respond accurately to questions 1–3 was not because
3. 60% of the participants suggest that, given X = 1, participants failed to pay attention to the task (i.e., they
the probability of obtaining an outcome that is above were not responding ‘‘randomly’’).
the lower bound of the estimated coefficient’s 95% Our findings echo those of Lawrence and Makridakis
confidence interval is very high (greater than 80%). (1989), who showed in an experiment that decision
Instead, the correct probability is approximately 51%, makers tend to construct confidence intervals of forecasts
as the uncertainty around the coefficient estimates in using estimated coefficients, and fail to correctly take into
this case is small compared to the uncertainty due to account the randomness inherent in the process they are
the error term. evaluating. Our results are also consistent with those of
4. 84% of participants gave an approximately correct Goldstein and Taleb (2007), who showed that failing to
answer of 50% to question 4. interpret a statistic appropriately can lead to incorrect
The participants’ answers to the first two questions assessments of risk.
suggest that the uncertainty affecting Y is not directly In summary, the results of Condition 1 show that the
visible in the presentation of the results. The answers to most common way of displaying results in the empirical
question 3, on the other hand, shed light on what the economics literature leads to an illusion of predictability, in
majority of our sample sees as being the main source that part of the uncertainty is invisible to the respondents.
of fluctuation in the dependent variable. The results In Condition 2, we test this interpretation by seeing
suggest that it is the uncertainty concerning the estimated whether the answers to Condition 1 are robust to different
coefficients that is seen to be important, not the magnitude levels of uncertainty.
of the SER. In the jargon of popular econometrics texts,
whereas respondents were sensitive to one of the two 4.2. Conditions 2–4
sources of prediction error, namely the sampling error,
they ignored the error term of the regression equation. If the presentation of the results causes the error term to
The apparent invisibility of the random component in the be ignored, then the answers of the decision makers should
presentation lures respondents into disregarding the error not change in different set-ups, regardless of the variance
term, and into confusing an outcome with its estimated of the error term, provided that its expectation is zero. To
expected value. test this, we change only the variance of the error term
In their answers to questions 3 and 4, the majority of in Condition 2 (see Fig. 2). Conditions 3 and 4 replicate
participants claim that if someone chooses X = 1, there is a Conditions 1 and 2, except that we add scatter plots and
50% probability of obtaining Y > 1.001, but that obtaining SER statistics — see Figs. 3 and 4.
702 E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711
Table 2
Characteristics of respondents.
Condition 1 2 3 4 5 6 Total %
Table 3
Comparison of results for Conditions 1 to 6.
Condition 1 2 3 4 5 6
2
R 0.50 0.25 0.50 0.25 0.50 0.25
Scatter plot No No Yes Yes Yes Yes
Estimation resuls Yes Yes Yes Yes No No
Percentage of participants whose answer to:
Question (1) was X < 10 (Incorrect) 72 67 61 41 3 7
Question (2) was X < 10 (Incorrect) 71 70 67 47 3 15
Question (3) was above 80% (Incorrect) 60 64 63 32 9 7
Question (4) was approx. 50% (Correct) 84 88 76 84 91 93
Approximate correct answers are
Question 1 47 82 47 82 47 82
Question 2 67 116 67 116 67 116
Question 3 (%) 51 51 51 51 51 51
Question 4 (%) 50 50 50 50 50 50
Number of participants
Question 1 39 36 44 32 31 41
Question 2 35 30 39 32 30 39
Question 3 45 42 49 37 32 43
Question 4 44 41 49 37 32 43
Notes:
Question (1) What would be the minimum value of X that an individual would need to make sure that s/he obtains a positive outcome (Y > 0) with 95%
probability?
Question (2) What minimum, positive value of X would make sure, with 95% probability, that the individual obtains more Y than a person who has X = 0?
Question (3) Given that the 95% confidence interval for β is (a, b), if an individual has X = 1, what would be the probability that s/he gets Y > a?
Question (4) If an individual has X = 1, what would be the probability that s/he gets Y > β̂ ?
In Conditions 1, 3 and 5, a = 0.936, b = 1.067 and β̂ = 1.001; in Conditions 2, 4 and 6, a = 0.911, b = 1.13 and β̂ = 1.02.
The histograms of the responses to the four questions misperceptions demonstrated in the respondents’ answers
in Conditions 2–4 are remarkably similar to those of suggest that the way in which regression results are
Condition 1 (see Appendix B). These similarities are presented in publications can prevent even knowledgeable
displayed in Table 3. individuals from differentiating among different clouds of
The similarities between the responses in Conditions data points and uncertainties. At an early stage of our
1 and 2 show that – under the influence of the current investigation, we also conducted the same survey (using
methodology – economists are led to overestimate the Conditions 1 and 2) with a group of 50 graduate students in
effects of explanatory factors on economic outcomes. The economics at Universitat Pompeu Fabra who had recently
E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711 703
Fig. 6. Histograms for the responses to Condition 5. The top-left figure shows answers to question 1, the one on the top-right shows answers to question
2, the one on the bottom-left those to question 3, and the one on the bottom-right those to question 4. Each histogram also displays the question and the
approximate correct answer. The dark column identifies the responses that we considered correct. Above each column is the number of participants who
gave that particular answer. There were 31, 30, 32 and 32 responses to questions 1–4, respectively.
taken an advanced econometrics course, as well as with 30 standard errors, and fail to consider the uncertainty inher-
academic social scientists (recruited through the European ent in the relationships between the dependent and in-
Association for Decision Making). The results (not reported dependent variables. What happens, therefore, when they
here) were similar to those of our sample of economists, cannot see estimates of coefficients and related statistics,
and suggest that the origins of the misperceptions can be but have only a bivariate scatter plot? This is the essence
traced back to the methodology, as opposed to professional of Conditions 5 and 6 (see the graphs in Figs. 3 and 4).
backgrounds. Fig. 6 displays the histograms for the responses
Table 3 indicates that when the representation is to the four questions in Condition 5. The responses
augmented with a graph of actual observations and with to Condition 6 were similar, and the histograms are
statistical information on the magnitude of the error term displayed in Appendix B. These histograms show that the
(SER), the perceptions of the relevant uncertainty, and participants are much more accurate in their assessments
consequently the predictions, improve. However, around of uncertainty now than in the previous conditions (see
half of the participants still fail to take the error term also Table 3). In fact, when the coefficient estimates
into account when making predictions, and give answers are not available, they are forced to attend solely to
similar to those in Conditions 1 and 2 (see Appendix B the graph, which depicts the uncertainty within the
for histograms of responses to Conditions 3 and 4). This dependent variable adequately. This further suggests that
suggests that respondents still rely mainly on the table scant attention was paid to the graphs when the coefficient
showing the estimated coefficients and their standard estimates were present. Despite the unrealistic manner
errors as the main tool for assessing uncertainty. Since of presenting the results, Conditions 5 and 6 show that
the information provided in Conditions 3 and 4 is rarely a simple graph can be better suited to assessing the
provided in published papers, this does not provide much predictability of an outcome than a table with coefficient
hope for improvement. Possibly more drastic changes are estimates, or even than a presentation that includes both a
necessary. Conditions 5 and 6 were designed to test this graph and a table.
suggestion.
In Conditions 5 and 6, most of the participants, includ-
ing some of those who made the most accurate predic-
4.3. Conditions 5 and 6 tions, protested in their comments about the insufficiency
of the information provided for the task. They claimed that
Our results so far suggest that, when making predic- it was impossible to determine the answers without the
tions using regression analysis, economists pay an exces- coefficient estimates, and that all they did was to ‘‘guess’’
sive amount of attention to coefficient estimates and their the outcomes approximately. Yet their guesses were more
704 E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711
accurate than the predictions from the previous condi- not typically address explicit decision making questions,
tions, which were the result of a careful investigation of the the models can be used to estimate, say, the probability of
coefficient estimates and time-consuming computations. reaching a given level of output for a specific level of input,
Indeed, as Table 2 indicates, the respondents in Conditions as well as the economic significance of the findings. It is
5 and 6 spent significantly less time on the task than those also important to understand that a policy that achieves
in Conditions 1 and 2 (t (40) = 2.71 and t (40) = 2.38, a significantly positive effect ‘‘on average’’ might still
p = 0.01 and 0.02, respectively). be undesirable, because it leaves a large fraction of the
population worse off. Hence, the questions are essential
4.4. Effects of training and experience but ‘‘tricky’’ only in the sense that they are not the sorts
of questions which economists typically ask.
Table 2 shows that our sample of 257 economists Second, as was noted earlier, 26% of potential respon-
varied widely in terms of professorial rank and the use dents took the time to open (and look at?) our survey
of regression analysis in their work. We failed to find any questions, and 9% answered. Does this mean that our re-
relationship between the numbers of correct answers and spondents were biased, and if so, in what direction were
either professorial rank or frequency of using regression they biased? We clearly cannot answer this question, but
analysis. A higher percentage of statisticians, financial we can state that our sample contained a substantial num-
economists and econometricians performed well relative ber of respondents (257), who represent various differ-
to the average respondent (with 64%, 56%, and 51% ent characteristics of academic economists. Moreover, they
providing correct answers, respectively, compared to the were relevant respondents, in that they were recruited
overall average of 35%). When the answers were more worldwide from leading departments of economics, as
accurate, the average time spent was also slightly greater judged by publications in econometrics (Baltagi, 2007).
(10.2 min versus 9.3). Appendix C shows in detail the Third, by maintaining anonymity in the responses, we
characteristics and proportions of respondents who gave were unable to offer incentives to our respondents. How-
accurate answers in Conditions 1–4. ever, would incentives have made a difference? Clearly, we
cannot say without conducting a specific study. However,
5. Discussion the consensus from previous results in experimental eco-
nomics is that incentives increase effort and reduce the
We conducted a survey of the probabilistic predictions variance in the responses, but do not necessarily increase
made by economists on the basis of regression outputs the average accuracy (Camerer & Hogarth, 1999). We also
similar to those published in leading economics journals. note that when professionals are asked questions which
When given only the regression statistics which are relate to their level of competence, there is little incen-
typically reported in such journals, many respondents tive to provide casual answers. Interestingly, our survey is
made inappropriate inferences. In particular, they seemed a good simulation of the circumstances under which many
to locate the uncertainty of prediction in estimates of the economists read journal articles: there are no explicit mon-
regression coefficients, but not in the standard error of etary incentives; readers do not wish to make additional
the regression (SER). Indeed, the responses hardly differed computations or to do work to fill in gaps left by the au-
depending on whether the fit of the estimated model was thors; and time is precious. Thus, the presentation of re-
0.25 or 0.50. sults is crucial.
We also provided some respondents with scatter plots Since our investigation concerns the way in which
of the regression, together with explicit information on the statistical results are presented in academic journals,
SER. However, this had only a small ameliorative effect, it is important to ask what specific audience authors
suggesting that respondents relied principally on the have in mind. The goal in leading economics journals
regression statistics (e.g., coefficients and their standard is scientific: to identify which variables have an impact
errors) when making their judgments. Finally, we forced on some economic output and to assess the strength of
other respondents to rely on a graphical representation by the relationship. Indeed, the discussion of results often
providing only a scatter plot, with no regression statistics. involves terms such as a ‘‘strong’’ effect, where the rhetoric
Members of this group complained that they did not have reflects the size of t-statistics and the like. Moreover, the
sufficient information, but – most importantly – were more strength of a relationship is often described only from the
accurate in their responses than the other groups, and also perspective of an average effect, e.g., that a unit increase
took less time to answer. in an independent variable implies a δ increase in the
Several issues could be raised about our study, in rela- dependent variable, on average.
tion to the nature of the questions asked, the specific re- As preliminary statements of the relevance of specific
spondents recruited, and their motivations for answering economic variables, this practice is acceptable. Indeed, al-
our questions. We now address these issues. though authors undoubtedly want to emphasize the sci-
First, we deliberately asked questions that are usually entific importance of their findings, we see no evidence of
not posed in journal articles because we sought to deliberate attempts to mislead readers into believing that
illuminate economists’ appreciations of the predictability the results imply a greater control over the dependent vari-
of economic relationships, as opposed to the assessment able than is, in fact, the case. In addition, the papers have
of the ‘‘significance’’ of certain variables (McCloskey & been reviewed by peers who are typically not shy about ex-
Ziliak, 1996; Ziliak & McCloskey, 2004, 2008). This is pressing their reservations. However, from a decision mak-
important. For example, even though economics articles do ing perspective, the typical form of presentation can lead to
E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711 705
an illusion of predictability of the outcomes, given the un- would be less accurate if the law of large numbers did
derlying regression model. Specifically, there can be a con- not hold. Hence, in more realistic scenarios, where our
siderable degree of variability around the expectations of assumptions are not valid, decisions that are weighted
effects, which needs to be calibrated in the interpretation towards expected values and coefficient estimates would
of results. Thus, readers who don’t ‘‘go beyond the informa- be even less accurate than our results indicate.
tion given’’ and take the trouble to calculate, say, the impli- How then can current practice be improved? Our
cations of some decision-oriented questions, may gain an results show that providing graphs alone led to the
inaccurate view of the results obtained. most accurate inferences. However, since this excludes
At one level, it could be argued that the principle of the actual statistical analysis evaluating the relationships
caveat emptor should apply. That is, consumers of eco- between different variables, we do not deem it a practical
nomic research should know how to use the information solution. Nevertheless, we do believe that it is appropriate
provided, and it is their responsibility to assess the uncer- to present graphs together with summary statistics, as we
tainty appropriately. It is not the fault of either the authors did in Conditions 3 and 4, although this methodology does
or the journals if they cannot. However, we make two argu- not eliminate the problem.
ments against the caveat emptor principle, as applied here. We seriously doubt that any substantial modification
First, as has been demonstrated by our survey, even of current practice will be accepted. We therefore suggest
knowledgeable economists experience difficulty in going augmenting reports by requiring the authors to provide
beyond the information provided in typical outputs of internet links to simulation tools. These could explore
regression analysis. If one wants to make the argument different implications of the analysis, as well as let readers
that people ‘‘ought’’ to do something, then it should also pose different probabilistic questions. In short, we propose
be clearly demonstrated that they ‘‘can’’. that tools be provided which allow readers to experience
Second, given the vast numbers of economic reports the uncertainty in the outcomes of the regression.5
available, it is unlikely that most readers will take the In fact, we recently embarked on a test of the ef-
necessary steps to go beyond the information provided. fectiveness of simulations in facilitating probabilistic in-
As a consequence, by reading journals in economics they ferences (Hogarth & Soyer, 2011). In two experiments,
will necessarily acquire a false impression of what the conducted with participants at varying levels of statis-
knowledge gained from economic research allows one to tical sophistication, respondents were provided with an
say. In short, they will believe that economic outputs are interface where they sequentially sampled the outcomes
far more predictable than is actually the case. predicted by an underlying model. In the first, we tested
We make all of the above statements under the assump- responses to seven well-known probabilistic puzzles. The
tion that econometric models describe empirical phenom- second involved simulating the predictions of an estimated
ena appropriately. In reality, such models may suffer from regression model, given one’s choices, in order to make in-
a variety of problems associated with the omission of key vestment decisions. The results of both experiments are
variables, measurement errors, multicollinearity, or esti- unequivocal. Experience obtained through simulations led
mating the future values of predictors. It can only be shown to far more accurate inferences than attempts at analy-
that model assumptions are, at best, approximately sat- sis. Also, the participants preferred using the experiential
isfied (they are not ‘‘rejected’’ by the data). Moreover, methodology over analysis. Moreover, when aided by sim-
whereas the model-data fit is maximized within the par- ulation, participants who were naïve with respect to prob-
ticular sample observed, there is no guarantee that the abilistic reasoning performed as well as those with uni-
estimated relationships will be maintained in other sam- versity training in statistical inference. The results support
ples. Indeed, the R2 value estimated on a fitting sample our suggestion that the authors of empirical papers supple-
inevitably ‘‘shrinks’’ when predicting to a new sample, ment the outputs of their analyses with simulation models
and estimating the amount of shrinkage a priori is prob- that allow decision makers to ‘‘go beyond the information
given’’ and ‘‘experience’’ the outcomes of the model given
lematic. There is also evidence that statistical significance
their inputs.
is often wrongly associated with replicability (Tversky &
Although our suggestion would impose an additional
Kahneman, 1971; see also Hubbard & Armstrong, 1994).
burden on authors, it would reduce both effort and
Possibly, if authors discussed these issues further, people’s
misinterpretation on the part of readers, and would make
perceptions of the predictability of outcomes would im-
any empirical article a more accessible scientific product.
prove. However, these considerations are beyond the scope
Moreover, it has the potential to correct other statistical
of the present study.
misinterpretations that were not identified by our study.
Furthermore, because our aim was to isolate the
As such, we believe that our suggestion goes a long
impact of the presentation mode on predictions, we
way to toward increasing our understanding of economic
made many simplifying assumptions. For instance, errors
phenomena. At the same time, it also calls for additional
that are heteroskedastic and non-normally distributed,
research into understanding when and why different
or the presence of fewer observations at the more
presentation formats lead to misinterpretation.
extreme values of the dependent variable would also
increase prediction error. Even though many estimation
procedures do not require assumptions, such as that of 5 For example, by following the link http://www.econ.upf.edu/∼soyer/
normally distributed random disturbances, in order to Emre_Soyer/Econometrics_Project.html, the reader can investigate many
obtain consistent estimates, the explanations which they questions concerning the two regression set-ups that we examined in this
provide through coefficient estimates and average values paper, and can also experience simulated outcomes.
706 E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711
Thus, the answer to question 2 is: Question 4 asks about the probability of obtaining an
outcome above the point estimate, given a value of X = 1.
(1.645 ∗ 41 − 0.32) In Conditions 1, 3 and 5, the point estimate is 1.001. We can
X = ≈ 67. (A.6)
1.001 use similar calculations in order to obtain an answer.
Similar reasoning is involved for Condition 2 (and
Pr(Yi > 1.001 | Xi = 1)
thus also Conditions 4 and 6). For these conditions, the
equivalent of Eq. (A.1) is = Pr(Ĉ + β̂ Xi + ê > 1.001 | Xi = 1)
= Pr(ê > 1.001 − Ĉ − β̂ Xi | Xi = 1)
SDER = se(ê) = Var(Y )(1 − R2 )
ê 1.001 − Ĉ − β̂ Xi
>
(59.252 )(0.75) ≈ 51, = Pr | Xi = 1
= (A.7) se(ê) se(ê)
1.001 − 0.32 − 1.001
such that the answer to question 1 is:
=1−Φ
29
(1.645 ∗ 51 − 0.62)
X = ≈ 82. (A.8) = 1 − Φ (−0.01) ≈ 0.5. (A.12)
1.02
As for question 2, we need to find out about Eq. (A.4) in For questions 3 and 4 of Condition 2 (and thus also 4
this condition: and 6), we follow a similar line of reasoning, using the
appropriate estimates. Thus, for question 3,
Var(Yi | Xi = xi ) + Var(Yi | Xi = 0)
Pr(Yi > 0.911 | Xi = 1)
= 512 + 512 ≈ 72, (A.9)
= Pr(Ĉ + β̂ Xi + ê > 0.911 | Xi = 1)
so that the answer to question 2 in Condition 2 becomes:
= Pr(ê > 0.911 − Ĉ − β̂ Xi | Xi = 1)
(1.645 ∗ 72 − 0.62)
X = ≈ 116. (A.10) ê 0.911 − Ĉ − β̂ Xi
1.02 = Pr > | Xi = 1
se(ê) se(ê)
0.911 − 0.61 − 1.02
=1−Φ
A.3. Answers to questions 3 and 4 51
= 1 − Φ (−0.015) ≈ 0.51, (A.13)
Here, we inquire about the way in which decision mak-
ers weight the different sources of uncertainty within the and for question 4,
dependent variable. The answers to these questions pro-
vide insights as to whether or not the typical presenta- Pr(Yi > 1.02 | Xi = 1)
tion of the results leads the participants to consider that = Pr(Ĉ + β̂ Xi + ê > 1.02 | Xi = 1)
the fluctuation around the estimated coefficient is a larger
source of uncertainty in the realization of Y than it really = Pr(ê > 1.02 − Ĉ − β̂ Xi | Xi = 1)
is. ê 1.02 − Ĉ − β̂ Xi
Question 3 asks about the probability of obtaining an = Pr > | Xi = 1
se(ê) se(ê)
outcome above the lower-bound of the 95% confidence
1.02 − 0.61 − 1.02
interval of the estimated coefficient, given a value of
X = 1.
=1−Φ
51
In Conditions 1, 3 and 5, the lower-bound is 0.936. We
= 1 − Φ (−0.01) ≈ 0.5. (A.14)
can find an approximate answer to this question using the
estimated model and the SER from Eq. (A.1), that is
Fig. B.1. Histograms for the responses to Condition 2. The top-left figure shows answers to question 1, the one on the top-right shows answers to question
2, the one on the bottom-left those to question 3, and the one on the bottom-right those to question 4. Each histogram also displays the question and the
approximate correct answer. The dark column identifies the responses that we considered correct. Above each column is the number of participants who
gave that particular answer. There were 36, 30, 42 and 41 responses to questions 1–4, respectively.
Fig. B.2. Histograms for the responses to Condition 3. The top-left figure shows answers to question 1, the one on the top-right shows answers to question
2, the one on the bottom-left those to question 3, and the one on the bottom-right those to question 4. Each histogram also displays the question and the
approximate correct answer. The dark column identifies the responses that we considered correct. Above each column is the number of participants who
gave that particular answer. There were 44, 39, 49 and 49 responses to questions 1–4, respectively.
E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711 709
Fig. B.3. Histograms for the responses to Condition 4. The top-left figure shows answers to question 1, the one on the top-right shows answers to question
2, the one on the bottom-left those to question 3 and the one on the bottom-right those to question 4. Each histogram also displays the question and the
approximate correct answer. The dark column identifies the responses that we considered correct. Above each column is the number of participants who
gave that particular answer. There were 32, 32, 37 and 37 responses to questions 1–4, respectively.
Fig. B.4. Histograms for the responses to Condition 6. The top-left figure shows answers to question 1, the one on the top-right to shows answers question
2, the one on the bottom-left those to question 3 and the one on the bottom-right those to question 4. Each histogram also displays the question and the
approximate correct answer. The dark column identifies the responses that we considered correct. Above each column is the number of participants who
gave that particular answer. There were 41, 39, 43 and 43 responses to questions 1–4, respectively.
710 E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711
Table C.1
Relationships between training, experience and responses in Conditions 1–4 (the number of respondents with correct answers is given in parentheses).
Position
Professor 17 (4) 14 (5) 19 (6) 18 (11) 68 (26) 38
Associate Professor 8 (2) 7 (3) 12 (4) 10 (8) 37 (17) 46
Assistant Professor 12 (5) 18 (4) 16 (6) 9 (2) 55 (17) 31
Senior Lecturer 0 (0) 2 (1) 1 (0) 0 (0) 3 (1) 33
Lecturer 6 (1) 4 (0) 1 (0) 0 (0) 12 (1) 8
Post-Doctoral Researcher 2 (0) 0 (0) 0 (0) 0 (0) 2 (0) 0
Total 45 (12) 45 (13) 49 (13) 38 (21) 177 (62) 35
Research fields
Econometrics 14 (6) 11 (6) 10 (5) 14 (8) 49 (25) 51
Labor economics 12 (5) 11 (2) 14 (3) 10 (7) 47 (17) 36
Monetary economics 5 (1) 2 (0) 5 (2) 2 (0) 14 (3) 21
Financial economics 4 (1) 5 (3) 4 (3) 3 (2) 16 (9) 56
Behavioral economics 3 (1) 7 (2) 2 (1) 3 (0) 15 (4) 27
Developmental economics 8 (1) 2 (1) 9 (3) 5 (1) 24 (6) 25
Health economics 4 (0) 3 (0) 5 (1) 1 (1) 13 (2) 15
Political economy 3 (1) 5 (1) 7 (3) 4 (2) 19 (7) 37
Public economics 9 (1) 6 (1) 10 (4) 8 (6) 33 (12) 36
Environmental economics 1 (0) 2 (1) 3 (0) 2 (1) 8 (2) 25
Industrial organization 2 (1) 6 (1) 6 (1) 2 (1) 16 (3) 19
Game theory 4 (1) 1 (1) 4 (1) 5 (2) 14 (5) 36
International economics 6 (2) 6 (0) 7 (1) 2 (1) 21 (4) 19
Macroeconomics 9 (2) 9 (2) 13 (2) 6 (5) 37 (11) 30
Microeconomics 11 (2) 4 (2) 11 (5) 7 (4) 33 (13) 39
Economic history 2 (0) 2 (0) 6 (3) 2 (1) 12 (4) 33
Statistics 3 (1) 4 (4) 1 (1) 1 (1) 11 (7) 64
Other 0 (0) 0 (0) 1 (1) 0 (0) 1 (1) 100
Use of regression analysis
Never 7 (1) 5 (0) 11 (7) 11 (5) 34 (13) 38
Some 11 (4) 16 (6) 17 (0) 10 (5) 54 (15) 28
Often 16 (4) 14 (5) 7 (2) 7 (6) 44 (17) 39
Always 5 (3) 5 (1) 8 (4) 6 (2) 24 (10) 42
Total 39 (12) 40 (12) 43 (13) 34 (18) 156 (55) 35
Average minutes spent 12 (10.9) 10.6 (12.6) 7.4 (11.2) 7.5 (7.4) 8.1 (10.2) 8.1
Std. dev. 12 (9.4) 7.8 (9) 7.1 (12.3) 5.3 (5.2) 7.7 (9) 7.7
Thaler, R. H., & Sunstein, C. R. (2008). Nudge: improving decisions about Emre Soyer is a Ph.D. student in the Department of Economics & Business
health, wealth, and happiness. New Haven, CT: Yale University Press. at Universitat Pompeu Fabra, Barcelona. A graduate of Koc University
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. (Istanbul, Turkey) and the University of Nottingham (U.K), he is interested
Psychological Bulletin, 76, 105–110. in ways of structuring situations so as to help unleash human potential
Wooldridge, J. M. (2008). Introductory econometrics: a modern approach across a wide range of areas, ranging from simple decision problems to
(3rd ed.). International Student Edition, Thomson, South Western. the content of educational programs.
Zellner, A. (1984). Posterior odds ratios for regression hypotheses: general
considerations and some specific results. In A. Zellner (Ed.), Basic
issues in econometrics (pp. 275–305). Chicago, IL: The University of
Chicago Press. Robin M. Hogarth is an ICREA Research Professor in the Department of
Zellner, A. (2004). To test or not to test and if so, how? Comments on ‘‘size Economics & Business at Universitat Pompeu Fabra, Barcelona. He has
matters’’. Journal of Socio-Economics, 33, 581–586. previously held appointments at INSEAD, London Business School, and
Ziliak, S. T., & McCloskey, D. N. (2004). Size matters: the standard error the University of Chicago. He has published several books (most recently
of regressions in the American Economic Review. Journal of Socio- Dance with Chance with Spyros Makridakis and Anil Gaba) and many
Economics, 33, 527–546. articles in psychology, management, and economics on topics related to
Ziliak, S. T., & McCloskey, D. N. (2008). The cult of statistical significance: human decision making. He is a past President of both the Society for
how the standard error costs us jobs, justice, and lives. Ann Arbor: Judgment and Decision Making and the European Association for Decision
University of Michigan Press. Making.