Alison Gopnik Cognitive Development
Alison Gopnik Cognitive Development
Alison Gopnik Cognitive Development
Although probabilistic models of cognitive development account of how prior knowledge and new evidence may be
have become increasingly prevalent, one challenge is to combined to update a set of beliefs.
account for how children might cope with a potentially The probabilistic modeling approach is not without
vast number of possible hypotheses. We propose that critique [2]. Many of the critiques stem from the fact that
children might address this problem by ‘sampling’ hy- the Bayesian view, just by itself, is extremely flexible and
potheses from a probability distribution. We discuss can accommodate a very wide range of data patterns (just
empirical results demonstrating signatures of sampling, as connectionist or production system models can, in prin-
which offer an explanation for the variability of chil- ciple, accommodate any data pattern). To be informative,
dren’s responses. The sampling hypothesis provides probabilistic model accounts must specify the nature of the
an algorithmic account of how children might address generative models and the likelihood functions in detail.
computationally intractable problems and suggests a Indeed, an advantage of this approach is that it requires
way to make sense of their ‘noisy’ behavior. the theorist to make this specification in a precise and
detailed way, and therefore generates precise quantitative
predictions about that particular probabilistic model.
Probabilistic models in development Most probabilistic models operate at what Marr [3]
In the course of development, children’s beliefs about the world called the ‘computational’ level of analysis. Computation-
undergo substantial revision. Probabilistic models of cognitive al-level models provide clear descriptions of the problems
development (see Glossary) provide a potential account of the learner faces and describe ideal solutions for those
some aspects of this remarkable learning process [1]. These problems. Probabilistic models at this computational level
models can rigorously characterize the structure of early can characterize how children infer beliefs from evidence in
representations and their revision. On this account, children’s at least some cases, such as causal learning tasks [4–7]. In
beliefs, such as their ‘intuitive theories’, can be formally de-
scribed as generative models, for example, as causal graphs,
grammars, or taxonomies. A generative model predicts some Glossary
patterns of evidence and not others. For example, a particular Exact analytical solution (exact learning): in mathematics, the mechanical
graphical model of a causal system will predict that some steps used to carry out a computation leads to a precise single numerical
patterns of contingency between events are more likely than result. This is contrasted with approximate solutions (approximate infer-
ence), which provide guesses about likely numerical results. Approximate
others; a grammar will predict that certain sentences will be solutions are employed when computing the analytical solution is intractable
more likely to be acceptable than others. or would simply take longer than desirable. Approximate solutions often trade
If theories are expressed as probabilistic generative time for accuracy – the longer an algorithm runs, the closer to correct the
answer will be.
models, then the process of revising those theories can Monte Carlo methods: algorithms that depend on repeated stochastic
be formally described as Bayesian inference. Different (random) sampling to produce a numerical estimate of the result. The name
generative models systematically generate some patterns derives from the fact that casinos are likewise based on sampling from
particular probability distributions with every roll of the dice or spin of a wheel.
of data rather than others, so a learner can start with the Posterior probability: the conditional probability that a hypothesis is true, after
data and infer which model was most likely to have gener- the evidence is taken into account. The posterior distribution is the
ated those data, guided by prior knowledge. Formally, probability distribution over hypotheses defined by these probabilities.
Probabilistic approaches to cognitive development: assuming that, at a
prior knowledge is expressed in a ‘prior’ probability distri- computational level, processes of inference and learning in cognitive devel-
bution over hypotheses, and Bayes’ rule indicates how to opment can be characterized in terms of rationally updating a probability
distribution over hypotheses in accordance with Bayes’ rule. Strict binary rule-
compute a ‘posterior’ distribution that incorporates the
based models of learning are a contrasting example; learners might follow a
data. This approach can thus provide a desirably precise heuristic that allows them to identify a deterministic outcome (e.g., a yes/no
decision on whether an object falls into a particular category given some
threshold).
Corresponding author: Bonawitz, E. ([email protected]) Probability matching in reinforcement learning: in contrast to always
Keywords: cognitive development; sampling hypothesis; approximate Bayesian producing an action that will most likely bring about the reward, probability
inference; causal learning. matching in reinforcement learning is when the learner matches the proportion
1364-6613/ of his or her responses to the relative rates of reward. This entails sometimes
ß 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tics.2014.06.006 producing a response that is not the most likely to be rewarded, and hence not
maximizing possible rewards.
these studies, researchers assess the state of children’s problems [12]. Often, this variability is assumed to be
prior beliefs, carefully control the evidence they receive, the result of external ‘noise’ such as attention or memory
and then examine which hypotheses they endorse. Chil- failures, and this may often be true. But it is also possible
dren tend to choose the hypotheses that have the greatest that at least sometimes the variability in children’s
posterior probability according to a Bayesian analysis. responding is systematic, as the sampling hypothesis
Children’s responses on these tasks on average look like would predict.
the posterior distributions predicted by these computation- For example, in causal learning tasks children tend to
al-level models. However, that does not necessarily imply pick the hypothesis that is most probable. But not all
that learners are working through the calculations pre- children pick the most likely response, and an individual
scribed by Bayes’ rule at the ‘algorithmic’ level. A very child may change responses from trial to trial [5–7]. The
large number of hypotheses may be compatible with any proportion of times that children select a hypothesis
pattern of evidence, and it would be impossible to assess increases as the hypothesis receives more support, but
each of these hypotheses individually. This problem might children still sometimes produce alternative hypotheses.
be particularly challenging for young children who might That might mean that children are sampling their
have more restricted memory and information-processing responses from a posterior distribution.
capacities than adults. Alternatively, it might be that children aim to produce a
So how do learners behave in a way that is apparently best guess all the time, and that the variability in their
consistent with probabilistic models when it is unlikely responses is simply a reflection of stochastically produced
that they are actually assessing all the possible hypotheses errors and ‘noise’ caused by other factors. Children might
in practice? be ‘noisy maximizers’, producing an error-laden attempt at
the most likely answer.
Approximating probabilistic models: the sampling Yet another alternative is that children’s behavior in
hypothesis these tasks does reflect the probability of hypotheses but
Applications of probabilistic models in computer science does so through a simpler process than hypothesis sampling.
must also tackle the problem of evaluating large spaces of Children, similar to adults and even non-human animals,
hypotheses. They often do so by randomly but systemati- frequently produce a pattern called probability matching in
cally sampling a few hypotheses rather than exhaustively reinforcement learning [13]. This ‘naı̈ve frequency match-
considering all possibilities. These calculations use ‘Monte ing’ alternative suggests that learners may simply match
Carlo’ methods. They obtain the equivalent of samples the frequency of responses to those of rewards.
from the posterior distribution without computing the So the question is, is the variability in children’s
whole posterior distribution itself. A system that uses this answers the result of sampling, is it an error-laden attempt
sort of sampling will be variable, because it will entertain at maximization, or does it involve naı̈ve frequency match-
different hypotheses apparently at random. ing?
However, importantly, this variability will also be sys-
tematic. The system will sample more probable hypotheses Empirical support for the sampling hypothesis
with greater frequency than less probable ones, so the A recent set of developmental studies presents the first test
distribution of responses will reflect the probability of of the sampling hypothesis, distinguishing sampling from
the hypotheses. And most importantly, such a system will both noisy maximizing and naı̈ve frequency matching
be efficient, because it trades approximation error for alternatives. Denison et al. [14] explored the degree to
computing time. The success of Monte Carlo algorithms which children match posterior probabilities in a causal
in computer science and statistics suggests an exciting inference task (Figure 1). Children saw a bin with a
hypothesis for cognitive development. The algorithms chil- varying number of blue and red chips. In the first experi-
dren use to perform inductive inference might also involve ment, children had several chances to guess whether a chip
sampling. We call this the sampling hypothesis. that randomly fell out of the bin was red or blue. The
Some recent work supports the idea that adults may probability that each chip had fallen out of the bin was
sometimes approximate Bayesian inference through psy- directly related to the proportion of red and blue chips.
chological processes that are equivalent to sampling. Par- When the bin was 80% full of red chips, there was an 80%
ticipants in a simple judgment task provided responses chance that the randomly selected chip was red. Children’s
that suggested they sampled their judgments from an behavior showed signatures of sampling: they guessed ‘red’
internal distribution, rather than providing a single best or ‘blue’ in proportion to the probability that a red or blue
guess [8]. It is often observed that people produce chip had fallen in the bin.
responses proportional to the Bayesian posterior probabil- In a second and third experiment, the proportion of
ities [9]. Although producing just a few samples may lead to colored chips was systematically varied. Children provided
behavior that appears suboptimal [10], it is a rational responses that matched the posterior probability of hy-
strategy for compromising between the cost of errors potheses; when the probability of a hypothesis decreased,
and the opportunity cost of taking more samples [11]. children’s selection of that hypothesis also decreased.
The sampling hypothesis is especially interesting from a Noisy maximizing would instead predict that children
developmental perspective, because it might explain at would favor the most likely hypothesis at constant rates
least some of the variability in children’s behavior. Devel- across varying probabilities (i.e., whether the probability of
opmental studies have pointed to the extensive variability a red chip is 95% or 75%, children should guess ‘red’ at near
in children’s responses, hypotheses, and solutions to ceiling levels).
498
Forum Trends in Cognitive Sciences October 2014, Vol. 18, No. 10
Chips make the machine go. Count chips: Transfer to bag My bag pped and a block fell
1, 2...N1 red; into the machine!
1,2...N2 blue. What color do you think it was?
80% ‘Red’ 95% ‘Red’ 75% ‘Red’ 82% ‘Red’ 64% ‘Red’
20% ‘Blue’ 5% ‘Blue’ 25% ‘Blue’ 9% ‘Blue’ 18% ‘Blue’
14 red
9% ‘Green’ 18% ‘Green’ 8 blue
( k
1 of the bags (unknown) chosen Frequency control
100
% frequency of children’s response
75
50
25
0
0 25 50 75 100
% probability of chip
TRENDS in Cognitive Sciences
Figure 1. Example methods, proportion of chips per condition, and results from Denison et al. [14].
In a final experiment, the probability did not directly learning through reinforcement. Consistent with the sam-
reflect the frequencies of the chips, providing a way to pling hypothesis, children’s guesses matched the posterior
distinguish sampling from naı̈ve frequency matching. Chil- distribution of hypotheses rather than the simple frequen-
dren saw two blue chips in one bin, and 14 red and six blue cies of the red and blue chips.
chips in the other bin. Then the bins were obscured and one These experiments showed that children were behaving
unknown bin was randomly selected, so that the probabili- in a way that was consistent with sampling and probabi-
ty of the blue chip was 65%, whereas the frequency of blue listic models. But they did not determine which type of
chips was only 36%. Children did not appear to naively sampling algorithm children might use. A first challenge in
match frequencies as they would in simple probability exploring sampling algorithms is to demonstrate that
499
Forum Trends in Cognitive Sciences October 2014, Vol. 18, No. 10
there are psychologically plausible strategies that can and over time, they converge on the hypothesis that is most
produce behavior that is generally consistent with the likely.
predictions of probabilistic models. Bonawitz et al. [15] So far, these algorithms have been explored in casual
mathematically demonstrated that a surprisingly simple learning tasks. It is not yet known how general such
version of a single-sample algorithm (based on a win-stay, strategies may be. Different types of learning, such as
lose-shift strategy) will produce behavior that is consistent syntactic inference, might employ different approaches.
with the exact analytical solution when aggregated across Furthermore, the particular algorithms children employ
multiple participants. In this algorithm, the learner ini- may depend on task demands, development, or even indi-
tially chooses a guess at random and then tends to stay vidual preference.
with that guess unless it is contradicted by the evidence. As We have suggested that children may revise their causal
the evidence against the initial guess grows stronger, the beliefs by randomly sampling from a probability distribu-
learner will be increasingly likely to resample from the tion. Sampling is an efficient way to search through a space
distribution and try another guess. of possibilities while still acting in a way that is consistent
with probabilistic inference, and so it can be an algorithmic
Identifying the sampling algorithm instantiation of Bayesian inference. The sampling hypoth-
A second challenge is empirically identifying which sam- esis also suggests that the variability of children’s
pling algorithm learners might be using on a particular responses may sometimes reflect the use of this type of
task. An algorithm that samples a new guess from an algorithm rather than being just noise.
updated posterior after each observation of data (indepen-
dent sampling) will behave differently from a ‘win-stay, Acknowledgments
lose-shift’ algorithm, similar to the one proposed by Bona- This research was supported in part by the James S. McDonnell Causal
witz et al. [15]. Although both are sampling approaches Learning collaborative and a grant IIS-0845410 from the National
Science Foundation.
and would produce behavior consistent with probabilistic
models in the long run, they will have different conse-
References
quences for short-term behavior. The win-stay, lose-shift
1 Gopnik, A. and Wellman, H.M. (2012) Reconstructing constructivism:
strategy will lead to a distinctive pattern of dependencies. causal models, Bayesian learning mechanisms and the theory theory.
A learner’s initial guess will shape their immediate sub- Psychol. Bull. 38, 1085–1108
sequent guesses, even if the initial guess was chosen at 2 Marcus, G.F. and Davis, E. (2013) How robust are probabilistic models
random. of higher-level cognition? Psychol. Sci. 24, 2351–2360
3 Marr, D. (1982) Vision, Freeman Publishers
Bonawitz et al. [15] found that preschool-aged children 4 Bonawitz, E.B. et al. (2012) Balancing theories and evidence in
and adults produce a characteristic pattern of dependen- children’s exploration, explanations, and learning. Cogn. Psychol. 64,
cies in their responses to a causal learning task that was 215–234
consistent with their particular win-stay, lose-shift algo- 5 Kushnir, T. and Gopnik, A. (2005) Children infer causal strength from
rithm. They were able to demonstrate this by using a mini- probabilities and interventions. Psychol. Sci. 16, 678–683
6 Schulz, L.E. et al. (2007) Can being scared make your tummy ache?
microgenetic method. They presented children with initial Naive theories, ambiguous evidence and preschoolers’ causal
evidence that was compatible with several different hy- inferences. Dev. Psychol. 43, 1124–1139
potheses and asked them to guess which hypothesis was 7 Sobel, D.M. et al. (2004) Children’s causal inferences from indirect
correct. Then on each trial they added evidence that tended evidence: backwards blocking and Bayesian reasoning in preschoolers.
Cogn. Sci. 28, 303–333
to confirm or disconfirm that guess and asked the children
8 Vul, E. and Pashler, H. (2008) Measuring the crowd within:
to guess again. Even though individual learners might probabilistic representations within individuals. Psychol. Sci. 19,
seem to be randomly veering from one hypothesis to the 645–647
next, on aggregate their responses approximated the exact 9 Goodman, N.D. et al. (2008) A rational analysis of rule-based concept
analytical Bayesian solution. The win-stay, lose-shift algo- learning. Cogn. Sci. 32, 108–154
10 Mozer, M.C. et al. (2008) Optimal predictions in everyday cognition: the
rithm predicted this approximate Bayesian response on
wisdom of individuals or crowds? Cogn. Sci. 32, 1133–1147
aggregate and best captured the trial-by-trial data in the 11 Vul, E. et al. (2014) One and done? Optimal decisions from very few
individual responses. samples. Cogn. Sci. 38, 599–637
12 Siegler, R.S. (1998) Emerging Minds: The Process of Change in
Concluding remarks and future perspectives Children’s Thinking, Oxford University Press
13 Jones, M.H. and Liverant, S. (1960) Effects of age differences on choice
These studies are just a starting point for asking what behavior. Child Dev. 31, 673–680
algorithms best capture early learning. Sampling algo- 14 Denison, S. et al. (2013) Rational variability in children’s causal
rithms like these may provide a balance between ‘explore’ inferences: the sampling hypothesis. Cognition 126, 285–300
and ‘exploit’ strategies in learning. They allow the learner 15 Bonawitz, E. et al. (2014) Win-stay, lose-shift: a simple sequential
to consider potentially unlikely hypotheses on occasion – algorithm for approximating Bayesian inference. Cogn. Psychol. (in
press)
hypotheses that may prove to be correct later. In aggregate
500