0% found this document useful (0 votes)
41 views8 pages

Integrating Explanation and Prediction in Computational Social Science

The document discusses the integration of explanation and prediction in computational social science, highlighting the need for a framework that reconciles the differing epistemic values of social and computer scientists. It proposes a schema categorizing research activities based on their emphasis on explanation versus prediction, advocating for 'integrative modeling' that combines both approaches. The authors argue that embracing both perspectives can enhance understanding and improve the robustness of models in social science research.

Uploaded by

amberchiyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views8 pages

Integrating Explanation and Prediction in Computational Social Science

The document discusses the integration of explanation and prediction in computational social science, highlighting the need for a framework that reconciles the differing epistemic values of social and computer scientists. It proposes a schema categorizing research activities based on their emphasis on explanation versus prediction, advocating for 'integrative modeling' that combines both approaches. The authors argue that embracing both perspectives can enhance understanding and improve the robustness of models in social science research.

Uploaded by

amberchiyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Perspective

Integrating explanation and prediction in


computational social science

https://doi.org/10.1038/s41586-021-03659-0 Jake M. Hofman1,17 ✉, Duncan J. Watts2,3,4,17 ✉, Susan Athey5, Filiz Garip6, Thomas L. Griffiths7,8,
Jon Kleinberg9,10, Helen Margetts11,12, Sendhil Mullainathan13, Matthew J. Salganik6,
Received: 23 February 2021
Simine Vazire14, Alessandro Vespignani15 & Tal Yarkoni16
Accepted: 20 May 2021

Published online: 30 June 2021


Computational social science is more than just large repositories of digital data and
Check for updates the computational methods needed to construct and analyse them. It also represents
a convergence of different fields with different ways of thinking about and doing
science. The goal of this Perspective is to provide some clarity around how these
approaches differ from one another and to propose how they might be productively
integrated. Towards this end we make two contributions. The first is a schema for
thinking about research activities along two dimensions—the extent to which work is
explanatory, focusing on identifying and estimating causal effects, and the degree of
consideration given to testing predictions of outcomes—and how these two priorities
can complement, rather than compete with, one another. Our second contribution is
to advocate that computational social scientists devote more attention to combining
prediction and explanation, which we call integrative modelling, and to outline some
practical suggestions for realizing this goal.

In the past 15 years, social science has experienced the beginnings In turn, these different values have led social and computer scientists
of a ‘computational revolution’ that is still unfolding1–4. In part this to prefer different methods from one another, and to invoke different
revolution has been driven by the technological revolution of the inter- standards of evidence. For example, whereas quantitative methods
net, which has effectively digitized the social, economic, political, in social science are designed to identify causal relationships or to
and cultural activities of billions of people, generating vast reposi- obtain unbiased estimates of theoretically interesting parameters,
tories of digital data as a byproduct5. And in part it has been driven machine learning methods are typically designed to minimize total
by an influx of methods and practices from computer science that error on as-yet unseen data9,10. As a result, it is standard practice for
were needed to deal with new classes of data—such as search and social scientists to fit their models entirely ‘in-sample’, on the grounds
social media data—that have tended to be noisier, more unstruc- that they are seeking to explain social processes and not to predict
tured, and less ‘designed’ than traditional social science data (for outcomes, whereas for computer scientists evaluation on ‘held out’
example, surveys and lab experiments). One obvious and important data is considered obligatory11. Conversely, computer scientists often
outcome of these dual processes has been the emergence of a new allow model complexity to increase as long as it continues to improve
field, now called computational social science2,4, that has generated predictive performance, whereas for social scientists models should
considerable interest among social scientists and computer scientists be grounded in, and therefore constrained by, substantive theory12.
alike6. We emphasize that both approaches are defensible on their own
What we argue in this paper, however, is that another outcome—less terms, and both have generated large, productive scientific litera-
obvious but potentially even more important—has been the surfacing tures; however, both approaches have also been subjected to serious
of a tension between the epistemic values of social and computer sci- criticism. On the one hand, theory-driven empirical social science
entists. On the one hand, social scientists have traditionally prioritized has been criticized for generating findings that fail to replicate13, fail
the formulation of interpretatively satisfying explanations of individual to generalize14, fail to predict outcomes of interest15,16, and fail to offer
and collective human behaviour, often invoking causal mechanisms solutions to real-world problems17,18. On the other hand, complex pre-
derived from substantive theory7. On the other hand, computer scien- dictive models have also been criticized for failing to generalize19 as
tists have traditionally been more concerned with developing accurate well as being uninterpretable20 and biased21. Meanwhile, extravagant
predictive models, whether or not they correspond to causal mecha- claims that the ability to mine sufficiently large datasets will result in
nisms or are even interpretable8. an ‘end of theory’ have been widely panned22. How might we continue

1
Microsoft Research, New York, NY, USA. 2Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA. 3The Annenberg School of Communication,
University of Pennsylvania, Philadelphia, PA, USA. 4Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA, USA. 5Graduate School of Business,
Stanford University, Stanford, CA, USA. 6Department of Sociology, Princeton University, Princeton, NJ, USA. 7Department of Psychology, Princeton University, Princeton, NJ, USA.
8
Department of Computer Science, Princeton University, Princeton, NJ, USA. 9Department of Computer Science, Cornell University, Ithaca, NY, USA. 10Department of Information Science,
Cornell University, Ithaca, NY, USA. 11Oxford Internet Institute, University of Oxford, Oxford, UK. 12Public Policy Programme, The Alan Turing Institute, London, UK. 13Booth School of Business,
University of Chicago, Chicago, IL, USA. 14Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia. 15Laboratory for the Modeling of Biological
and Socio-technical Systems, Northeastern University, Boston, MA, USA. 16Department of Psychology, University of Texas at Austin, Austin, TX, USA. 17These authors contributed equally:
Jake M. Hofman, Duncan J. Watts. ✉e-mail: [email protected]; [email protected]

Nature | Vol 595 | 8 July 2021 | 181


Perspective
to benefit from the decades of thinking and methodological develop- Conversely, purely predictive exercises can also risk confusing pre-
ment that have been invested in these two canonical traditions while diction with explanation. Predictive models that exploit statistical
also acknowledging the legitimacy of these criticisms? Relatedly, how associations to forecast outcomes, sometimes with seemingly impres-
might social and computer scientists constructively reconcile their sive accuracy, can confer the feeling of having understood a phenom-
distinct epistemic values to produce new methods and standards of enon. But they often rely, sometimes implicitly, on the assumption that
evidence that both can agree are desirable? these predictions are to be evaluated exclusively in settings where the
Our position is that each tradition, while continuing to advance its relationships between the predictors and the outcome of interest are
own goals, can benefit from taking seriously the goals of the other. Spe- stable33. As a result, model performance can change markedly under
cifically, we make two related contributions. First, we argue that while interventions that alter the associations in question19, or can otherwise
the goals of prediction and explanation appear distinct in the abstract result in biased or misleading interpretations34.
they can easily be conflated in practice, leading to confusion about what In fact, ‘predicting an outcome’ can refer to many different activities
any particular method can accomplish. We introduce a conceptual for which expectations of accuracy may vary widely. For example, the
framework for categorizing empirical methods in terms of their rela- finding that the volume of influenza-related search queries in a particu-
tive emphasis on prediction and explanation. In addition to clarifying lar geographic region is highly correlated (r = 0.9) with caseload data
the distinction between predictive and explanatory modelling, this from the US Centers for Disease Control (CDC) reported two weeks later
framework reveals a currently rare class of methods that integrate the seems impressive, until it is revealed that the same correlation can be
two. Second, we offer a series of suggestions that we hope will lead to obtained directly from the CDC data alone simply by using case counts
more of what we call integrative modelling. In addition, we advocate for from previous weeks to forecast those for future weeks35. Whether
clearer labelling of the explanatory and predictive power of individual a particular model is considered valuable or not therefore depends
contributions and argue that open science practices should be stand- not only on its absolute performance, but also its comparison to the
ardized between the computational and social sciences. In summary, we appropriate baseline(s).
conclude that while exclusively explanatory or predictive approaches In addition, the very same model estimated on the same data can
can and do contribute to our understanding of a phenomenon, claims yield qualitatively different conclusions regarding apparent predictive
to have understood that phenomenon should be evaluated in terms of accuracy—ranging from ‘extremely accurate’ to ‘relatively poor’—sim-
both. Considering the predictive power of explanatory models can help ply by making different choices during the evaluation procedure36.
to prioritize the causal effects we investigate and quantify how much By analogy with NHST, not only can predictive modelling appear to
they actually explain, and may reveal limits to our understanding of generate explanations when it does not; the predictions themselves
phenomena. Conversely, an eye towards explanation can focus our may be much weaker than they appear.
attention on the prediction problems that matter most and encourage
us to build more robust models that generalize better under interven-
tions and changes. Taking both explanation and prediction seriously A framework for integrative modelling
will therefore be likely to require researchers to embrace epistemic As these examples illustrate, the relationship between explanation
modesty, but will advance work at the intersection of the computational and prediction is often blurry in practice and can lead to confusion
and social sciences. about which goals are being satisfied by any particular research activ-
ity. To clarify our thinking, we shift from talking about explanation and
prediction in the abstract to more specifically discussing the types of
Prediction versus explanation empirical modelling activity that are common throughout computa-
To illustrate how the goals of prediction and explanation can be con- tional and social science.
flated, consider the common practice of employing null hypothesis We emphasize that our focus here is on empirical modelling activi-
significance testing (NHST)23 to reject a null hypothesis23,24 that some ties, not theoretical modelling such as mathematical and agent-based
theoretically motivated effect is absent (that is, is exactly zero) with a modelling. Theoretical work, which includes modelling as well as sub-
confidence that is controlled by a fixed false-positive rate, traditionally stantive and qualitative theory, is an essential counterpart to empirical
set to 5%. For example, a study might seek to reject the null hypothesis work—for example, theory is necessary in order to identify appropriate
that a job applicant’s perceived race has no effect on their prospects constructs to measure or predict, or to propose hypotheses to test.
of being hired25, or that ethnic or religious divisions within a country Here, however, we wish to focus on research activities whose aim is
have no effect on the likelihood of civil war15. to test and validate models using empirical data. To further clarify
As many previous authors have noted, NHST has been widely mis- the scope of our argument, by ‘models’ we mostly mean the types of
applied in numerous ways—underpowered experiments, multiple statistical and algorithmic models that are widely used in quantitative
comparisons, inappropriate stopping rules, and so on—that tend to social science, data science, and applied machine learning. However,
produce a surprisingly high rate of false-positive findings26,27, and have our framework could also be applied to explanatory and predictive
led to widely discussed replication problems28. From the perspective of analyses more generally (for example, mechanistic models, small-n case
integrating explanation and prediction, NHST is problematic for other, studies or comparative studies, studies using prediction markets, and
more fundamental reasons. NHST invokes the language of prediction; so on) as long as they somehow use empirical data to validate explana-
however, the prediction that is being made is often not directly about tions or predictions.
the outcome of interest, nor even about the magnitude of some theo- Concretely, we propose the conceptual framework illustrated sche-
retically interesting effect, but simply that the hypothesized effect matically in Table 1. The two dimensions of the Table represent differing
is not zero. In other words, a common application of NHST is not so levels of emphasis placed on explanation and prediction, respectively,
much to test predictions at all but instead to argue that a theory is not where we have partitioned the space into four quadrants: descriptive
inconsistent with the data and then to use the theory as an explanatory modelling, explanatory modelling, predictive modelling, and integra-
tool. Furthermore, while there are circumstances under which it is use- tive modelling.
ful to show that an effect is unlikely to be zero, in the complex world Descriptive modelling (quadrant 1) refers to activities that are funda-
of human and social behaviour it is highly likely that many effects are mental to any scientific endeavour: how to think about, define, meas-
non-zero29,30. Showing that one’s preferred theory cannot be ruled out ure, collect, and describe relationships between quantities of interest.
by the data is therefore an exceptionally weak test of the theory31,32, and Activities in this quadrant include traditional statistics and survey
hence explains much less than it appears to. research as well as computational methods such as topic modelling

182 | Nature | Vol 595 | 8 July 2021


Table 1 | A schematic for organizing empirical modelling along two dimensions, representing the different levels of emphasis
placed on prediction and explanation

No intervention or distributional changes Under interventions or distributional changes


Focus on specific features or effects Quadrant 1: Descriptive modelling Quadrant 2: Explanatory modelling
Describe situations in the past or present (but neither Estimate effects of changing a situation (but many effects
causal nor predictive) are small)
Focus on predicting outcomes Quadrant 3: Predictive modelling Quadrant 4: Integrative modelling
Forecast outcomes for similar situations in the future Predict outcomes and estimate effects in as yet unseen
(but can break under changes) situations
The rows highlight where we focus our attention (on either specific features that might affect an outcome of interest, or directly on the outcome itself), whereas the columns specify what types
of situations we are modelling (a ‘fixed’ world in which no changes or interventions take place, or one in which features or inputs are actively manipulated or change owing to other uncontrolled
forces).

and community detection in networks10. For example, much of what that the data on which the model is evaluated (the held-out or test
is known about public opinion, the state of the economy, and everyday data) are different from the data on which the model was estimated
human experience is derived from survey research, whether conducted (the training data). Activities in this quadrant encompass time series
by federal statistical agencies such as the Bureau of Labour Statistics or modelling43, prediction contests44, and much of supervised machine
research organizations such as Pew Research Center. Statistical analyses learning10, ranging from simple linear regression to complex artificial
of administrative data are also often descriptive in nature. For example, neural networks. By evaluating performance on a held-out test set,
recent studies have documented important differences in mortality these methods focus on producing predictions that generalize well to
rates37, wealth gaps38 and intergenerational economic mobility39 across future observations. From a policy perspective, it can be helpful to have
racial and ethnic groups. Qualitative and comparative methods that high-quality forecasts of future events even if those forecasts are not
are popular in sociology, communications, and anthropology also causal in nature9,45–47. For example, applications of machine learning to
fall into this quadrant. Finally, much of the progress in computational human behaviour abound in online advertising and recommendation
social science to date has been in using digital signals and platforms to systems, but can also detect potentially viral content on social media
investigate previously unmeasurable concepts5,40. Descriptive work, in early in its trajectory48. Although these algorithms do not identify what
other words, whether qualitative or quantitative, is useful and interest- is causing people to click or content to spread, they can still be useful
ing in its own right and also foundational to the activities conducted inputs for decision-makers—for example, alerting human reviewers
in the other three quadrants. to check potentially large cascades for harmful misinformation. That
Moving beyond description, explanatory modelling (quadrant said, there is often an implicit assumption that the data used to train
2) refers to activities whose goal is to identify and estimate causal and test the model come from the same data-generating process, akin
effects, but that do not focus directly on predicting outcomes. Most to making forecasts in a static (albeit possibly noisy) world. As a result,
of traditional empirical sociology, political science, economics, while these methods often work well for a fixed data distribution, they
and psychology falls into this quadrant, which encompasses a wide may not generalize to settings in which features or inputs are actively
range of methods, including statistical modelling of observational manipulated (as in a controlled experiment or policy change) or change
data, lab experiments, field experiments, and qualitative methods. as a result of other, uncontrolled factors.
Some methods (for example, in randomized or natural experiments, Combining the explanatory properties of quadrant 2 and the pre-
or non-experimental identification strategies such as instrumental dictive properties of quadrant 3, integrative modelling (quadrant 4)
variables and regression discontinuity designs) isolate causal effects refers to activities that attempt to predict as-yet unseen outcomes
by design, whereas others (for example, regression modelling, qualita- in terms of causal relationships. More specifically, whereas quad-
tive data) invoke causal interpretations based on theory. Regardless, rant 3 concerns itself with data that are out of sample, but still from
methods in this quadrant tend to prioritize simplicity, considering one the same (statistical) distribution, here the focus is on generalizing
or only a handful of features that may affect an outcome of interest. We ‘out of distribution’ to a situation that might change either naturally,
emphasize that these approaches can be very useful for understanding owing to some factor out of our control, or because of some inten-
individual causal effects, shaping theoretical models, and even guiding tional intervention such as an experiment or change in policy. This
policy. For example, field experiments that show that job applicants category includes distributional changes for settings that we have
with characteristically ‘Black’ names are less likely to be interviewed observed before (that is, setting an input feature to a specific value,
than those with ‘white’ names25 reveal the presence of structural rac- rather than simply observing it to be at that value) as well as the more
ism and inform public debates about discrimination with respect to extreme case of entirely new situations (that is, setting an input feature
gender, race, and other protected attributes. Relatedly, quantifying to an entirely new value that we have never seen before). Integrative
difficult-to-assess effects, such as the impact of gender and racial modelling therefore requires attention to quadrant 2 concerns about
diversity on policing41, can motivate concrete policy interventions. estimating causal, rather than simply associational, effects49, while
Nonetheless, the emphasis on studying effects in isolation can lead to simultaneously considering the impact of all such effects to forecast
little, if any, attention being paid to predictive accuracy. As many effects outcomes as accurately as possible (that is, quadrant 3). Ideally work
are small, and simple models can fail to incorporate the broader set of in this quadrant would generate high-quality predictions about future
features pertinent to the outcome being studied, these methods can outcomes in a (potentially) changing world. However, forcing one’s
suffer from relatively poor predictive performance. explanations to make predictions can reveal that they explain less than
In contrast with explanatory modelling, predictive modelling one would like15,50, thereby motivating and guiding the search for more
(quadrant 3) refers to activities that attempt to predict the outcome complete explanations51. Alternatively, such a search may reveal the
of interest directly but do not explicitly concern themselves with the presence of a fundamental limit to predictive accuracy that results
identification of causal effects. ‘Prediction’ in this quadrant may or may from the presence of system complexity or intrinsic randomness52,
not be about actual future events; however, in contrast with quadrants in which case the conclusion may be that we can explain less than we
1 and 2, it refers exclusively to ‘out of sample’ prediction42, meaning would like, even in principle53.

Nature | Vol 595 | 8 July 2021 | 183


Perspective
In addition to clarifying the distinction in practice between predictive content spreads. This work proposes a theory in which content that
and explanatory research activities, Table 1 illustrates our second main reflects positive sentiments spreads further than negative content.
point: that whereas quadrants 1, 2, and 3 are all amply populated both Conversely, in quadrant 3 is research that uses as much information as
with traditional and computational social science research, quadrant possible to passively forecast content popularity48,59,60. Here machine
4 is—with a handful of possible exceptions that we discuss in detail learning techniques are used with an eye towards maximizing predic-
below—relatively empty. To an extent, the sparsity of quadrant 4 is not tive accuracy, resulting in statistical models that exploit many features
surprising. Models that carefully synthesize the causal relationships without necessarily focusing on which of these relationships are causal
between different relevant factors to make high-quality predictions of as opposed to merely correlational.
future outcomes are inherently more difficult to formulate and evalu- As yet, little, if any, work on this problem would fall in quadrant 4;
ate than models that aim only for explanatory or predictive power in however, such studies are easy to imagine. For example, one might
isolation. Nonetheless, we also believe that quadrant 4 activities are attempt to explicitly predict the spread of content that has been experi-
rare because they require one to embrace epistemic values that have mentally manipulated, say by changing content that an individual plans
historically been regarded as standing in opposition to one another; to post to affect its emotional valence or by studying how the same piece
that is, that explanatory insight necessarily comes at the cost of predic- of content spreads when exogenously seeded to different individuals.
tive accuracy and vice versa. If this is true, then viewing them instead Experiments of this sort would immediately reinforce or challenge
as complements, wherein each can reinforce the other, repositions results from the other quadrants and would also help to formulate
quadrant 4 not as a painful tradeoff but rather as an exciting opportu- predictively accurate causal explanations.
nity for new and impactful research. Orienting our attention towards integrative modelling can also
To be clear, the opportunity highlighted by Table 1 is not that inspire new ways of evaluating the robustness of our findings in other
researchers, computational or otherwise, should focus only or even quadrants. Specifically, we can ask how well our estimates and predic-
mostly on quadrant 4. To the contrary, an enormous amount of inter- tions generalize under the types of interventions or changes consid-
esting, high-quality social science exists in the other quadrants, and ered in quadrant 4. In practice, this would mean more cross-domain
we see no reason for that not to continue. Indeed, even if one’s goal is or out-of-distribution model testing: how well does a causal estimate
to end up in quadrant 4, it is arguably impossible to get there without made in one domain transfer to another domain, or how well does a pre-
spending a good deal of time in quadrants 1, 2, and 3. Nonetheless, as dictive model fit to one data distribution generalize to another? While
we will argue in the next section, quadrant 4 research activities that informal acknowledgements are often made regarding limitations to
explicitly integrate explanatory and predictive thinking are likely to generalizability, it is currently rare to see explicit tests of this type in
add value over and above what can be achieved in quadrants 1–3 alone; published research. Many of our models are likely to fail at these tasks,
thus quadrant 4 deserves more attention than it has received so far. but it would be better to clearly recognize and quantify the progress
yet to be made than to lose sight of developing high-quality, integrative
models that would succeed at them.
Suggestions Methods from one quadrant can also be leveraged to benefit work in
The opportunity that we have just highlighted in turn provokes three another. In quadrant 2 there are recent examples of using methods from
related suggestions for methodological innovation in computational machine learning to improve the causal estimates made with existing
social science. First, we make our call for the integration of explana- explanatory techniques, such as matching and instrumental variables61,
tory and predictive modelling more concrete by sketching out some as well as to develop new techniques such as adaptive experimentation
specific approaches to quadrant 4 research. Second, we advocate for to more efficiently learn the effects of deploying different policies62 and
an explicit labelling system that can be used to more clearly character- ‘causal tree’ models for estimating heterogeneous treatment effects63.
ize individual research contributions, identifying both the quadrant Predictive models have also been used here as a benchmark to assess
to which it belongs and the level of granularity offered by it. Third, the ‘completeness’ of explanatory models51. Conversely, in quadrant 3
we note that open science practices that have been developed within there are prominent examples in which structural causal models have
the explanatory modelling community can be adapted to benefit the been leveraged to improve the generalizability of predictive models49,64.
predictive modelling community, and vice versa. We can also imagine methods that truly sit in quadrant 4. For example,
structural modelling in economics and marketing aspires to “identify
Integrate modelling approaches mechanisms that determine outcomes and are designed to analyse
Our first suggestion is to encourage more work in quadrant 4 by counterfactual policies, quantifying impacts on specific outcomes as
identifying concrete ways of integrating predictive and explanatory well as effects in the short and longer run.”65. An example entails using
modelling. At the highest level, simply thinking explicitly about which estimated models of consumer preferences derived from historical
quadrants our current models sit in can motivate integrative research choice data to analyse the effect of a proposed merger. While it is rare
designs. Take the example of understanding how information spreads to find studies that directly assess the predictive power of such models,
through a social network, a question that has received a great deal of as they often concern not-yet-implemented changes, such an extension
attention with the recent availability of data from online social networks is clearly possible. For example, Athey et al.66 used data from sealed-bid
that makes it possible to track with high fidelity how content spreads auctions to estimate bidder values and make predictions about open
from one person to the next. At this point there have been hundreds, ascending auctions, and the predictions were then compared to out-
if not thousands, of studies that explore this question54. Some sit comes in those auctions.
squarely in quadrant 1, as purely descriptive studies that measure the Another method that we believe is particularly promising for making
size and structure of large and representative sets of online information progress in quadrant 4 is akin to a ‘coordinate ascent’ algorithm, wherein
cascades48,55. These efforts have provided insights into how content researchers iteratively alternate between predictive and explanatory
spreads, some of which align with ideas put forth several decades ago56 modelling. Agrawal et al.12 provide an example of this kind of approach,
and others that challenge them57. combining the methods of psychology and machine learning. Their
Other studies lie in quadrants 2 and 3. For instance, there is work in starting point was the Moral Machine dataset, a large-scale experiment
quadrant 2 that aims to identify features of online content that have that collected tens of millions of judgments from participants all over
a causal effect on the spread of information58. Here regression mod- the world solving ‘trolley car’ moral reasoning problems67. The original
els are used to estimate the extent to which a handful of high-level study was focused on estimating causal effects, manipulating vari-
sentiment features (for example, awe, anger, sadness) affect how far ables related to the identity of the members of different groups who

184 | Nature | Vol 595 | 8 July 2021


Table 2 | A label scheme for clarifying the nature and granularity of research contributions according to the four quadrants
discussed above

Granularity Quadrant 1 Quadrant 2 Quadrant 3 Quadrant 4


Describes something Tests a causal claim Tests a (passive) predictive claim Tests a claim both for causality and predictive accuracy
Low Reports stylized facts Tests for a non-zero effect Predicts directional or aggregate Predicts directional or aggregate outcomes under
outcomes changes or interventions
Medium Reports population Tests for a directional effect Predicts magnitude and direction Predicts magnitude and direction of aggregate
averages of aggregate outcomes outcomes under changes or interventions
High Reports individual Estimates the magnitude Predicts magnitude and direction Predicts magnitude and direction of individual outcomes
outcomes and direction of an effect of individual outcomes under changes or interventions
The rows distinguish between different levels of granularity in each quadrant. By ‘directional’, we mean results that report only whether a given association or effect is positive or negative in
sign, whereas by ‘magnitude and direction’ we mean not only the sign of a relationship but also the numerical size of the correlation or effect.

could be hit by an out-of-control vehicle and measuring the changes averages or ‘stylized facts’ (that is, the sort of qualitative statements that
in participants’ judgements of the moral acceptability of different are often used in summaries of scientific work, such as “income rises
outcomes. Agrawal et al.12 used this dataset as the basis for building with education”). In quadrant 2, estimating the magnitude of an effect
a predictive model, using a black box machine learning method (an is more informative than determining only its sign (positive or nega-
artificial neural network) to predict people’s decisions. This predictive tive), which is in turn more informative than simply establishing that it
model was used to critique a more traditional cognitive model and to is unlikely to be zero. Likewise, estimates of effect sizes made across a
identify potential causal factors that might have influenced people’s range of conditions are more informative than those that are made for
decisions. The cognitive model was then evaluated in a new round of only one set of conditions (for example, the particular settings chosen
experiments that tested its predictions about the consequences of for a lab experiment14). In quadrant 3, predictions about outcomes
manipulating those causal factors. can also be subjected to tests at widely different levels, depending on
numerous, often benign-seeming, details of the test36. For example: (a)
Clearly label contributions predictions about distributional properties (for example, population
Our second suggestion is deceptively simple: researchers should clearly averages) are less informative than predictions of individual outcomes;
label their research activities according to the type of contributions (b) predictions about which ‘bucket’ an observation falls into (for exam-
they make. Simply adding labels to published research sounds trivial, ple, above or below some threshold, as in most classification tasks) tell
but checklists68, badges69, and other labelling schemes are already a us less than predictions of specific outcome values (as in regression); (c)
central component of efforts to improve the transparency, openness, ex-ante predictions made immediately before an event are less difficult
and reproducibility of science70. Inspired by these efforts, we argue than those made far in advance; and (d) predictions that are evaluated
that encouraging researchers to clearly identify the nature of their against poor or inappropriate baseline models—or where a baseline is
contribution would be clarifying both for ourselves and for others, and absent—are less informative than those that are compared against a
propose the labelling scheme in Table 2 for this purpose. We anticipate strong baseline35. The same distinctions apply to quadrant 4, with the
that many other labelling schemes could be proposed, each of which key difference being that claims made in this quadrant are evaluated
would have advantages and disadvantages. At a minimum, however, we under some change in the data-generating process, whether through
advocate for a scheme that satisfies two very general properties: first, intentional experimentation or changes that result from other external
it should differentiate as cleanly as possible between contributions factors. Requiring researchers to state explicitly the level of granularity
in the four quadrants of Table 1; and second, within each quadrant it at which a particular claim is made will, we hope, lead to more accurate
should identify the level of granularity (for example, high, medium or interpretations of our findings.
low) that is exhibited by the result.
Focusing first on the columns of Table 2, we recognize that the Standardize open science practices
boundaries of the quadrants will, in reality, be blurry, and that indi- Our third suggestion is to standardize open science practices between
vidual papers will sometimes comprise a blend of contributions across those engaged in predictive and explanatory modelling. Over the last
quadrants or granularity levels; however, we believe that surfacing several years, scientists working in each tradition have promoted best
these ambiguities and making them explicit would itself be a useful practices to facilitate transparent, reproducible, and cumulative sci-
exercise. If, for example, it is unclear whether a particular claim is merely ence; specifically, pre-registration in the explanatory modelling com-
descriptive (for example, there exists a difference in outcome variable munity71, and the common task framework in the predictive modelling
y between two groups A and B) or is intended as a causal claim (for community72. Here we highlight how each community can learn from
example, that the difference exists because A and B differ on some other and leverage best practices developed in the other.
variable x), requiring us to attest that our model tests a causal claim in
order to place it in quadrant 2 should cause us to reflect on our choice Pre-registration. Pre-registration is the act of publicly declaring one’s
of language and possibly to clarify it. Such a clarification would also plans for how any given research activity will be done before it is actually
help to avoid confusion that can arise from any given research method carried out and is designed with a simple goal in mind: to make it easier
falling into more than one quadrant, depending on the objectives of for readers and reviewers to tell the difference between planned and
the researcher (see example in Box 1). unplanned analyses. This procedure can help to calibrate expectations
Focusing next on the rows, Table 2 is also intended to clarify that it about the reliability of reported findings and, in turn, reduce the inci-
is possible to engage in activities that reveal widely different amounts dence of unreliable, false-positive results in research that tests a given
of information while remaining within a given quadrant. In quadrant hypothesis or prediction27,71. Specifically, pre-registration reduces the
1, for example, a description that specifies the association between risk of making undisclosed post hoc, data-dependent decisions (for
individual-level attributes and outcomes tells us more about a phe- example, which of many possible statistical tests to run) that can lead
nomenon than one that does the same things at the level of population to non-replicable findings.

Nature | Vol 595 | 8 July 2021 | 185


Perspective
evaluation. Importantly, this evaluation happens on a separate, hidden
Box 1 test set that is accessible to the organizer but not the participants, which
helps to guard against overfitting to a particular subset of the data.
How to label a contribution The common task framework originated in the predictive model-
ling community where it is often used for ‘prediction contests’ such
A regression model of the form y = β x can equally appear in all four as the prominent Netflix Prize Challenge78. However, the common task
quadrants, depending on how the equation is applied and framework has benefits beyond simply increasing predictive perfor-
interpreted. In quadrant 1, the association between the outcome mance, and both the predictive and explanatory modelling communi-
and the predictor(s) x is simply described without any causal ties could benefit from adopting it more broadly. In terms of predictive
interpretation or claim about predictive accuracy. In quadrant 2, modelling, increased use of the common task framework would result
the same model can be estimated but the focus is on the sign, in easier comparison and synthesis between what are currently dispa-
statistical significance, and sometimes size of the estimated rate research efforts. Recalling the task of predicting how information
coefficient β , often tied to a causal interpretation derived from spreads discussed earlier, there are currently many such efforts that
substantive theory. In quadrant 3, the same equation can again be are quite difficult to compare because although they claim to tackle
estimated, but now the focus is on measuring the error (for the same problem, they each use different datasets, define different
example, R2) associated with predicted values of y by comparing modelling tasks, or use different metrics to quantify success36. Central-
them with previously unseen observations45. Finally, the same izing these efforts under the common task framework would force a
model could fall into quadrant 4 if the goal is to compare the diverse set of researchers to find common ground in deciding on what
predictive accuracy of different theories51, and potentially to guide the real problems of interest are. It would also standardize the evalu-
the development of new theories12,84 that are either more ation of progress and make it easy to combine insights across studies.
predictively accurate or that generalize to a broader set of Likewise, the common task framework could be useful for explana-
circumstances. tory modelling. In fact, the common task framework can be thought
of as a way of scaling up pre-registration and registered reports from
individual researchers to collections of research teams or even entire
Until now, pre-registration has been applied almost entirely in the fields. One example is the recent Fragile Families Challenge50, which
context of what we call explanatory modelling (quadrant 2), where small tasked researchers with the problem of forecasting different life
sample sizes (for example, in randomized controlled trials) combined outcomes for disadvantaged children and families. This use of the
with undisclosed flexibility in the data analysis and modelling process common task framework not only centralized efforts on a prediction
led to a high incidence of researchers being unable to replicate pub- problem that is important in its own right, but also generated novel
lished results. However, we believe that it could also be valuable for questions about the predictability of different life outcomes for the
predictive modelling (quadrant 3) where, in spite of much larger sample social science community. Another example is the Universal Causal
sizes, researchers still have many degrees of freedom73 in their analytical Evaluation Engine80, which facilitates collective progress on causal
choices. Furthermore, pre-registration can offer a cleaner delineation inference through the common task framework79. The organizers create
between the data used to train and validate a model (also known as synthetic data (for which they know the true causal effects) and make
postdiction71) compared to the data used to test it (prediction). The it available to participants who can submit estimates of those effects
former should be used to develop a model, while the latter should be using their preferred methods. This procedure allows for unbiased
used only once, at the point when all aspects of a model (including its evaluation of different inference methods across a range of research-
complexity, hyperparameters, and so on) have been determined and ers and research problems.
it is ready to be evaluated. While this distinction is clear in theory, in
practice research can suffer from confusion about validation versus
test datasets, or from multiple uses of test sets within the modelling Outlook
process74,75. The goal of this Perspective is to advocate for advancing research in
In practice, pre-registration suffers from a number of limitations that the computational and social sciences by integrating predictive and
reduce its value and complicate the interpretation of pre-registered explanatory approaches to scientific inquiry. Our suggestions for doing
findings71. On its own, in other words, it is not a panacea. Nonetheless, so, discussed in detail above and summarized in Box 2, are intended to
the increased use of pre-registration in both explanatory and predictive clarify existing styles of work as well as providing useful and actionable
modelling activities would be likely to reduce the incidence of unreli- advice for researchers interested in integrative modelling. At the same
able results and to improve the transparency and replicability of scien- time, we note that the suggestions that we make here are not exhaus-
tific workflows. Reinforcing pre-registration is the related practice of tive, comprehensive, or without challenges: integrative modelling as
registered reports76,77, wherein researchers submit their pre-registered we have described it is, on its own, neither necessary nor sufficient for
research and analysis plan for peer review before carrying out the study. our collective success as a field.
While registered reports also have their implementation challenges, Notably, the issue of model interpretability is missing from the
their adoption would place more emphasis on the quality of the ques- framework and suggestions presented above. Specifically, in dis-
tions being asked and the methods used to answer them than on the cussing explanatory modelling, we have focused on the estimation
answers themselves. of causal effects, regardless of whether those effects are explicitly
tied to theoretically motivated mechanisms that are interpretable
Common task framework. A second practice that could be standard- as “the cogs and wheels of the causal process”7. This is not because
ized across communities is use of the common task framework72 to cen- we do not find value in uncovering and understanding causal mecha-
tralize the collective efforts of many researchers in a given field. In this nisms, but rather because it is our view that interpretability is logi-
paradigm there is agreement upon a question of interest, a dataset that cally independent of both the causal and predictive properties of a
pertains to it, and a specific modelling task to be undertaken with that model. That is, in principle a model can accurately predict outcomes
dataset to address the motivating question. An organizer then makes under interventions or previously unseen circumstances (out of dis-
some of the data available to participants and declares the criteria by tribution), thereby demonstrating that it captures the relevant causal
which research efforts will be evaluated. Participating researchers can relationships, and still be resistant to human intuition (for example,
then iterate between developing their models and submitting them for quantum mechanics in the 1920s). Conversely, a theory can create
计算社会科学⽤于捕捉不同变量间的因果关系,虽然跟理论模
型的关联性⽐较低,但是很多理论只是⼀种直觉
186 | Nature | Vol 595 | 8 July 2021
facilitate more replicable, more cumulative, and ultimately more use-
Box 2 ful social science.

Summary of suggestions
1. Watts, D. J. A twenty-first century science. Nature 445, 489 (2007).
● Integrate predictive and explanatory modelling 2. Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).
○ Look to sparsely populated quadrants for new research 3. Salganik, M. J. Bit by Bit: Social Research in the Digital Age (Princeton Univ. Press, 2018).
4. Lazer, D. M. J. et al. Computational social science: obstacles and opportunities. Science
opportunities 369, 1060–1062 (2020).
○ Test existing methods to see how they generalize under 5. Lazer, D. et al. Meaningful measures of human society in the twenty-first century. Nature
interventions or distributional changes https://doi.org/10.1038/s41586-021-03660-7 (2021).
6. Wing, J. M. Computational thinking. Commun. ACM 49, 33–35 (2006).
○ Develop new methods that iterate between predictive and 7. Hedström, P. & Ylikoski, P. Causal mechanisms in the social sciences. Annu. Rev. Sociol.
explanatory modelling 36, 49–67 (2010).
8. Breiman, L. Statistical modeling: the two cultures (with comments and a rejoinder by the
● Clearly label contributions according to the quadrant in which
author). Stat. Sci. 16, 199–231 (2001).
they make a claim, and the granularity of that claim We view our paper as an extension of Brieman’s dichotomy (the ‘algorithmic’ and ‘data
● Standardize open science practices across the social and modelling’ cultures), arguing that these approaches should be integrated.
9. Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach. J. Econ.
computer sciences, encouraging, for instance, pre-registration
Perspect. 31, 87–106 (2017).
for predictive models and the common task framework for This paper explores the relationships between predictive models and causal inference.
explanatory modelling 10. Molina, M. & Garip, F. Machine learning for sociology. Annu. Rev. Sociol. 45, 27–45 (2019).
11. Shmueli, G. To explain or to predict? Stat. Sci. 25, 289–310 (2010).
We build on Schmueli’s distinction between prediction and explanation and propose a
framework for integrating the two approaches.
the subjective experience of having made sense of many diverse phe- 12. Agrawal, M., Peterson, J. C. & Griffiths, T. L. Scaling up psychology via Scientific Regret
Minimization. Proc. Natl Acad. Sci. USA 117, 8825–8835 (2020).
nomena without being either predictively accurate or demonstrably
This paper exemplifies what we call integrative modelling.
causal81 (for example, conspiracy theories). 13. Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).
Interpretable explanations, of course, can be valued for other 14. Yarkoni, T. The generalizability crisis. Behav. Brain Sci. https://doi.org/10.1017/
S0140525X20001685 (2020).
reasons. For example, interpretability allows scientists to ‘mentally
15. Ward, M. D., Greenhill, B. D. & Bakke, K. M. The perils of policy by p-value: predicting civil
simulate’ their models, thereby generating plausible hypotheses for conflicts. J. Peace Res. 47, 363–375 (2010).
subsequent testing. Clearly this ability is helpful to theory develop- 16. Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons
from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).
ment, especially when data are sparse or noisy, which is often the case 17. Watts, D. J. Should social science be more solution-oriented? Nat. Hum. Behav. 1, 0015
for social phenomena. Equally important, interpretable models are (2017).
often easier to communicate and discuss (verbally or in text), thereby 18. Berkman, E. T. & Wilson, S. M. So useful as a good theory? The practicality crisis in (social)
psychological theory. Perspect. Psychol. Sci. https://doi.org/10.1177/1745691620969650
increasing the likelihood that others will pay attention to them, use (2021).
them, or improve upon them. In other words, interpretability is a per- 19. Athey, S. Beyond prediction: Using big data for policy problems. Science 355, 483–485
fectly valid property to desire of an explanation, and can be very useful (2017).
20. Lipton, Z. C. The mythos of model interpretability. Queue 16, 31–57 (2018).
pragmatically. It is our opinion, however, that it should be valued on its 21. Kleinberg, J., Ludwig, J., Mullainathan, S. & Sunstein, C. R. Discrimination in the age of
own merits, not on the grounds that it directly improves the predictive algorithms. J. Legal Anal. 10, 113–174 (2018).
or causal properties of a model. 22. Coveney, P. V., Dougherty, E. R. & Highfield, R. R. Big data need big theory too. Philos.
Trans. R. Soc. A 374, 20160153 (2016).
We also acknowledge that there are costs associated with adopting 23. Gigerenzer, G. Mindless statistics. J. Socio-Econ. 33, 587–606 (2004).
the integrative modelling practices that we have described. As men- 24. Cohen, J. The earth is round (p < .05). Am. Psychol. 49, 997–1003 (1994).
tioned earlier, evaluating explanations in terms of their predictive 25. Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and
Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94, 991–1013
accuracy may reveal that our existing theories explain less than we (2004).
would like53. Likewise, clearly labelling contributions as descriptive, 26. Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124
explanatory, predictive and so on may cast our findings in a less flat- (2005).
27. Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed
tering light than if they are described in vague or ambiguous language. flexibility in data collection and analysis allows presenting anything as significant.
Pre-registration requires additional time and effort from individual Psychol. Sci. 22, 1359–1366 (2011).
researchers, and some have criticized it as de-emphasizing important 28. Open Science Collaboration. Estimating the reproducibility of psychological science.
Science 349, aac4716 (2015).
exploratory work. Increased adoption of registered reports requires 29. Meehl, P. E. Why summaries of research on psychological theories are often
changes to editorial and review processes, and therefore the coordi- uninterpretable. Psychol. Rep. 66, 195–244 (1990).
nation of many individuals with potentially disparate interests. The 30. Gelman, A. Causality and statistical learning. Am. J. Sociol. 117, 955–966 (2011).
31. Dienes, Z. Understanding Psychology as a Science: An Introduction to Scientific and
common task framework demands a great deal of effort on the part Statistical Inference (Macmillan, 2008).
of those organizing an instance of it82, as well as adoption by others in 32. Schrodt, P. A. Seven deadly sins of contemporary quantitative political analysis. J. Peace
the field once a task is created. It is also subject to what has been called Res. 51, 287–300 (2014).
33. Lazer, D., Kennedy, R., King, G. & Vespignani, A. The parable of Google flu: traps in big
Goodhardt’s law83: “When a measure becomes a target, it ceases to be data analysis. Science 343, 1203–1205 (2014).
a good measure.” 34. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an
That said, it is our view that wider adoption of these practices would algorithm used to manage the health of populations. Science 366, 447–453 (2019).
35. Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M. & Watts, D. J. Predicting consumer
be a net benefit for the field of computational social science. Explora- behavior with web search. Proc. Natl Acad. Sci. USA 107, 17486–17490 (2010).
tory work is important and should be encouraged, but pre-registration 36. Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems.
Science 355, 486–488 (2017).
is crucial in that it helps to distinguish the act of testing models from
37. Case, A. & Deaton, A. Rising morbidity and mortality in midlife among white non-Hispanic
the process of building them. Registered reports help us to focus on Americans in the 21st century. Proc. Natl Acad. Sci. USA 112, 15078–15083 (2015).
the informativeness of inquiries being conducted without biasing 38. Oliver, M. L., Shapiro, T. M. & Shapiro, T. Black Wealth, White Wealth: A New Perspective on
Racial Inequality (Taylor & Francis, 2006).
our attention based on the outcomes of those tests. And the common
39. Chetty, R., Hendren, N., Kline, P. & Saez, E. Where is the land of opportunity? The
task framework provides a way of uniting sub-fields and disciplines geography of intergenerational mobility in the United States. Q. J. Econ. 129, 1553–1623
to accelerate collective progress. Most importantly, thinking clearly (2014).
40. Wagner, C. et al. Measuring algorithmically infused societies. Nature https://doi.
about the epistemic values of explanation and prediction not only
org/10.1038/s41586-021-03666-1 (2021).
helps us to recognize their distinct contributions but also reveals new 41. Ba, B. A., Knox, D., Mummolo, J. & Rivera, R. The role of officer race and gender in police–
ways to integrate them in empirical research. Doing so will, we believe, civilian interactions in Chicago. Science 371, 696–702 (2021).

Nature | Vol 595 | 8 July 2021 | 187


Perspective
42. Provost, F. & Fawcett, T. Data Science for Business: What You Need to Know about Data 66. Athey, S., Levin, J. & Seira, E. Comparing open and sealed bid auctions: evidence from
Mining and Data-Analytic Thinking (O’Reilly Media, 2013). timber auctions*. Q. J. Econ. 126, 207–257 (2011).
43. Makridakis, S., Wheelwright, S. C. & Hyndman, R. J. Forecasting Methods and 67. Awad, E. et al. The Moral Machine experiment. Nature 563, 59–64 (2018).
Applications (Wiley, 1998). 68. Aczel, B. et al. A consensus-based transparency checklist. Nat. Hum. Behav. 4, 4–6
44. Tetlock, P. E. Expert Political Judgment: How Good Is It? How Can We Know? (Princeton (2020).
Univ. Press, 2005). 69. Kidwell, M. C. et al. Badges to acknowledge open practices: a simple, low-cost, effective
45. Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. method for increasing transparency. PLoS Biol. 14, e1002456 (2016).
Am. Econ. Rev. 105, 491–495 (2015). 70. Nosek, B. A. et al. Promoting an open research culture. Science 348, 1422–1425 (2015).
46. Dowding, K. & Miller, C. On prediction in political science. Eur. J. Polit. Res. 58, 1001–1018 71. Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution.
(2019). Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).
47. Galesic, M. et al. Human social sensing is an untapped resource for computational social 72. Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).
science. Nature https://doi.org/10.1038/s41586-021-03649-2 (2021). 73. Gelman, A. & Loken, E. The statistical crisis in science. Am. Sci. 102, 460 (2014).
48. Cheng, J., Adamic, L., Dow, P. A., Kleinberg, J. M. & Leskovec, J. Can cascades be 74. Rao, R. B., Fung, G. & Rosales, R. On the dangers of cross-validation. An experimental
predicted? In WWW '14: Proc. 23rd International Conference on World Wide Web 925–936 evaluation. In Proc. 2008 SIAM International Conference on Data Mining 588–596
(2014). (Society for Industrial and Applied Mathematics, 2008).
49. Pearl, J. The seven tools of causal inference, with reflections on machine learning. 75. Dwork, C. et al. The reusable holdout: preserving validity in adaptive data analysis.
Commun. ACM 62, 54–60 (2019). Science 349, 636–638 (2015).
This paper outlines the need for causal thinking in building predictive models. 76. Chambers, C. D. Registered reports: a new publishing initiative at Cortex. Cortex 49,
50. Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass 609–610 (2013).
collaboration. Proc. Natl Acad. Sci. USA 117, 8398–8403 (2020). 77. Nosek, B. A. & Lakens, D. Registered reports: a method to increase the credibility of
51. Fudenberg, D., Kleinberg, J., Liang, A. & Mullainathan, S. Measuring the completeness of published reports. Soc. Psychol. 45, 137–141 (2014).
theories. SSRN https://doi.org/10.2139/ssrn.3018785 (2019). 78. Bennett, J. & Lanning, S. The Netflix Prize. In Proc. KDD Cup and Workshop 2007 (2007).
52. Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to 79. Dorie, V., Hill, J., Shalit, U., Scott, M. & Cervone, D. Automated versus do-it-yourself
prediction in complex social systems. In WWW '16: Proc 25th International Conference on methods for causal inference: lessons learned from a data analysis competition. SSO
World Wide Web 683–694 (2016). Schweiz. Monatsschr. Zahnheilkd. 34, 43–68 (2019).
53. Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 80. Lin, A., Merchant, A., Sarkar, S. K. & D’Amour, A. Universal causal evaluation engine: an API
(2014). for empirically evaluating causal inference models. in Proc. Machine Learning Research
This paper argues that sociologists should pay more attention to prediction versus (eds Le, T. D. et al.) Vol. 104, 50–58 (PMLR, 2019).
interpretability when evaluating their explanations. 81. Craver, C. F. Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience
54. Zhou, F., Xu, X., Trajcevski, G. & Zhang, K. A survey of information cascade analysis: (Clarendon, 2007).
models, predictions, and recent advances. ACM Comput. Surv. 54, 1–36 (2021). 82. Salganik, M. J., Lundberg, I., Kindel, A. T. & McLanahan, S. Introduction to the special
55. Goel, S., Watts, D. J. & Goldstein, D. G. The structure of online diffusion networks. In EC '12: collection on the Fragile Families Challenge. Socius https://doi.
Proc. 13th ACM Conference on Electronic Commerce (2012). org/10.1177/2378023119871580 (2019).
56. Wu, S., Hofman, J. M., Mason, W. A. & Watts, D. J. Who says what to whom on Twitter. In 83. Strathern, M. ‘Improving ratings’: audit in the British university system. Eur. Rev. 5,
WWW’11: Proc 20th International Conference on World Wide Web 705–714 (2011). 305–321 (1997).
57. Goel, S., Anderson, A., Hofman, J. & Watts, D. J. The structural virality of online diffusion. 84. Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale
Manage. Sci. 62, 180–196 (2015). experiments and machine learning to discover new theories of human decision-making.
58. Berger, J. & Milkman, K. L. What makes online content viral? J. Mark. Res. 49, 192–205 Science 372, 1209–1214 (2021).
(2012).
59. Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying Author contributions J.M.H. and D.J.W. conceptualized and helped to write and prepare the
influence on Twitter. In WSDM '11: Proc. Fourth ACM International Conference on Web manuscript. They contributed equally to these efforts. All authors were involved in and
Search and Data Mining 65–74 (2011). discussed the structure of the manuscript at various stages of its development.
60. Tan, C., Lee, L. & Pang, B. The effect of wording on message propagation: topic- and
author-controlled natural experiments on Twitter. In Proc. 52nd Annual Meeting of the
Association for Computational Linguistics 175–185 (2014). Competing interests The authors declare no competing interests.
61. Liu, T., Ungar, L. & Kording, K. Quantifying causality in data science with
quasi-experiments. Nat. Comput. Sci. 1, 24–32 (2021). Additional information
62. Hochberg, I. et al. Encouraging physical activity in patients with diabetes through Correspondence and requests for materials should be addressed to J.M.H. or D.J.W.
automatic personalized feedback via reinforcement learning improves glycemic control. Peer review information Nature thanks Noortje Marres, Melanie Mitchell and Scott Page for
Diabetes Care 39, e59–e60 (2016). their contribution to the peer review of this work.
63. Athey, S. & Imbens, G. Recursive partitioning for heterogeneous causal effects. Proc. Natl Reprints and permissions information is available at http://www.nature.com/reprints.
Acad. Sci. USA 113, 7353–7360 (2016). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
64. Charles, D., Chickering, M. & Simard, P. Counterfactual reasoning and learning systems: published maps and institutional affiliations.
the example of computational advertising. J. Mach. Learn. Res. 14, 3207–3260 (2013).
65. Low, H. & Meghir, C. The use of structural models in econometrics. J. Econ. Perspect. 31,
© Springer Nature Limited 2021
33–58 (2017).

188 | Nature | Vol 595 | 8 July 2021

You might also like