Bayesian Models of The Mind
Bayesian Models of The Mind
BAYESIAN MODELS
OF THE MIND
Michael Rescorla
University of California, Los Angeles
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Shaftesbury Road, Cambridge CB2 8EA, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,
New Delhi – 110025, India
103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467
www.cambridge.org
Information on this title: www.cambridge.org/9781009517805
DOI: 10.1017/9781108955973
© Michael Rescorla 2024
This publication is in copyright. Subject to statutory exception and to the provisions
of relevant collective licensing agreements, no reproduction of any part may take
place without the written permission of Cambridge University Press & Assessment.
When citing this work, please include a reference to the DOI 10.1017/9781108955973
First published 2024
A catalogue record for this publication is available from the British Library
ISBN 978-1-009-51780-5 Hardback
ISBN 978-1-108-95829-5 Paperback
ISSN 2633-9080 (online)
ISSN 2633-9072 (print)
Cambridge University Press & Assessment has no responsibility for the persistence
or accuracy of URLs for external or third-party internet websites referred to in this
publication and does not guarantee that any content on such websites is, or will
remain, accurate or appropriate.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind
DOI: 10.1017/9781108955973
First published online: December 2024
Michael Rescorla
University of California, Los Angeles
Author for correspondence: Michael Rescorla, [email protected]
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Contents
1 Introduction 1
6 Mental Representation 52
7 Anti-representationalism 65
8 Conclusion 75
References 88
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 1
1 Introduction
Thomas Bayes was an eighteenth-century minister and mathematician who passed
his life in relative obscurity. Upon his death in 1761, his friend Richard Price found
among his papers a document entitled “An Essay Towards Solving a Problem in the
Doctrine of Chances.” Price, recognizing the essay’s immense significance, saw to
its posthumous publication (Bayes, 1763). Bayes’s insights gave birth to what is
now known as Bayesian decision theory: a mathematical framework that models
reasoning and decision-making under uncertain conditions. Named after Bayes due
to his founding insights, the framework was first systematically articulated by
Pierre-Simon Laplace (1814/1902). Despite frequent vicissitudes in development,
reception, and application, the framework attracted increasingly many adherents
beginning in the early twentieth century and accelerating as the century progressed
(McGrayne, 2011). It currently enjoys great popularity, finding widespread use
within statistics (Berger, 1985; Gelman et al., 2014), philosophy (Earman, 1992),
machine learning (Murphy, 2023), robotics (Thrun, Burgard & Fox, 2005), physics
(Trotta, 2008), medical science (Ashby, 2006), and myriad other disciplines.
Bayesian decision theory originated as a theory of how people should oper-
ate, not a theory of how they actually operate. Nevertheless, cognitive scientists
increasingly use it to describe the actual workings of the human mind. Over the
past few decades, cognitive science has produced impressive Bayesian models
of mental activity. The models postulate that certain mental processes conform,
or approximately conform, to Bayesian norms. Bayesian models offered within
cognitive science have illuminated numerous mental phenomena, such as
perception, motor control, and navigation.
This Element has a two-fold purpose. First, it provides a self-contained intro-
duction to the foundations of Bayesian cognitive science. Second, it explores
what we can learn about the mind from Bayesian models offered by cognitive
scientists.
On the second front, my main concern is how Bayesian cognitive science
relates to mental representation. Just as the heart serves to pump blood and the
stomach serves to digest food, one of the mind’s principal functions is to
represent the world. For instance, I have various beliefs about Napoleon: that
he was born in Corsica, that he was an emperor, and so on. Thus, the mind
somehow reaches beyond itself to represent external reality. In that sense, the
mind is a representational organ. Historically, most philosophers have agreed
that the mind’s representational capacity is among its key features. However,
prominent scientists and philosophers throughout the past century have ques-
tioned whether representation deserves any place in the science of the mind. As
a result, controversy continues to fester over the explanatory value of mental
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
2 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 3
ω 2 A;
Ω ¼ f1; 2; 3; 4; 5; 6g;
1
Readers seeking a more leisurely introduction to Bayesian decision theory have many options
pitched at varying levels of difficulty. Hacking (2001) is aimed at philosophers and makes
relatively modest mathematical demands. Stone (2013) occupies an intermediate level of diffi-
culty. Berger (1985) and Gelman et al. (2014) are standard statistical references and are more
mathematically demanding.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
4 Philosophy of Mind
that is, the set containing elements 1, 2, 3, 4, 5, and 6. The hypothesis that the
player rolls an even number corresponds to the set
f2; 4; 6g:
that is, the set of outcomes in which Seabiscuit finishes before every other horse.
Philosophers commonly assume that probabilities attach to propositions. In
the scientific and mathematical literature, one rarely finds any appeal to proposi-
tions. Instead, researchers follow Kolmogorov in attaching probabilities to sets.
Under certain assumptions, one can recapture talk about “propositions” within
Kolmogorov’s setting. One can treat Ω as containing possible worlds, and one
can analyze propositions as sets of possible worlds (Stalnaker, 1984). These
assumptions are not mandated by Kolmogorov’s axiomatization. For example,
the simple outcome space f1; 2; 3; 4; 5; 6g is allowed by Kolmogorov’s axio-
matization, even though its elements are not possible worlds.
When probabilities attach to sets of outcomes, elementary set-theoretic oper-
ations mimic the propositional operations negation, conjunction, and disjunction:
while the hypothesis that the player does not roll 1 is its complement
If we intersect together disjoint sets (i.e. sets that share no members), then the
result is the empty set ∅ containing no members. The hypothesis that the
player rolls an even number and the player rolls an odd number is
f2; 4; 6g ∩ f1; 3; 5g ¼ ∅ :
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
6 Philosophy of Mind
Axiom 2: PðΩÞ ¼ 1:
Axiom 3: Additivity.
To elucidate additivity, suppose that H1 and H2 are disjoint events. For example,
let H1 be the hypothesis that Seabiscuit wins the race and H2 the hypothesis that
War Admiral wins the race. Consider the union H1 ∪ H2 : the hypothesis that
Seabiscuit wins the race or War Admiral wins the race. Additivity requires that:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 7
Figure 4 H1 and H2 are disjoint events. Additivity requires that their union (the
total shaded area) receive a probability equal to the sum of the probabilities
assigned to them individually.
In general, the probability that either of two disjoint events occurs is found by
adding together the probabilities assigned to the individual events. See Figure 4.
As discussed in Section A2, Kolmogorov ultimately uses a somewhat stronger
version of additivity than I have articulated here.
Axioms 1–3 can be applied to objective probabilities or to subjective prob-
abilities. Applied to objective probabilities, they are construed as constraints
that probabilities do in fact satisfy. Applied to subjective probabilities, they are
construed as constraints that probabilities should satisfy: an agent is rational to
the extent that her credences satisfy the axioms.2
The core tenet of Bayesian decision theory is that credences should conform
to the probability calculus axioms. Since Bayesians advance the probability
calculus axioms as normative constraints, we may ask why these particular
axioms are supposed to be rationally privileged. Why is someone who conforms
2
An alternative formulation of the probability calculus centers on sentences rather than sets.
Whereas Kolmogorov assigns probabilities to sets of outcomes, the alternative formulation
assigns them to sentences drawn from a suitable language. One can develop probability theory
on this alternative sentential basis (e.g. Gaifman & Snir, 1982). Some Bayesian models found in
cognitive science, especially models of high-level cognition, use sentential rather than set-
theoretic axiomatization (Piantadosi & Jacobs, 2016). For example, sentential models have
been successfully applied to causality (Goodman, Ullman, & Tenenbaum, 2011), kinship (Katz
et al., 2008), and analogical reasoning (Cheyette & Piantadosi, 2017). However, set-theoretic
axiomatization underlies the vast bulk of research in Bayesian cognitive science, including all or
virtually all research into relatively low-level processes such as perception, motor control, and
navigation. This Element focuses exclusively on models that use Kolmogorov’s set-theoretic
axiomatization. Much of what I say about those models would apply, in suitably modified form, to
models that use sentential axiomatization.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
8 Philosophy of Mind
X ðωÞ ¼ x
means that the asteroid has speed x in world ω as it enters our solar system. X is
a function from Ω (a set of possible worlds) to ℝ (the set of real numbers). More
generally, suppose we have an outcome space Ω. A random variable is
a function that carries each outcome ω to a real number x:
X ðωÞ ¼ x :
fω : a ≤ X ðωÞ ≤ bg:
3
These locutions are extensionally equivalent, although they have somewhat different connota-
tions. See Fristedt & Gray (1997, p. 12) for helpful discussion.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 9
This set is notated as X 1 ½a; b. It contains those possible worlds where the
asteroid’s speed falls between a and b, so it codifies the hypothesis that
the asteroid’s speed falls between a and b. More generally, given a random
variable X defined on outcome space Ω, X 1 ½a; b codifies the hypothesis that
X’s value falls between a and b. See Figure 6.
As a second illustration, consider the asteroid’s position when it hits
the earth’s surface. We can describe asteroid position using an ordered
pair ðx; yÞ drawn from a canonical coordinate system (e.g. longitude and
latitude). We now want a function X that maps each possible world ω to
an x-coordinate and a second function Y that maps ω to a y-coordinate.
The conjunction
means that the asteroid lands at location ðx; yÞ in possible world ω. Taken
together, X and Y map Ω (a set of possible worlds) into ℝ2 (the set of ordered
pairs of real numbers). We may use X and Y to define various events of interest.
For example, consider the rectangle depicted in Figure 7. Call this rectangle
REC. We would like to codify the hypothesis that the asteroid lands within
REC. To do so, we collect together all the possible worlds where the asteroid
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
10 Philosophy of Mind
lands within REC. In other words, we consider the set of possible worlds ω such
that ðX ðωÞ; Y ðωÞÞ belongs to REC:
This set contains exactly those possible worlds where the asteroid lands within
REC, so it codifies the hypothesis that the asteroid lands within REC. See
Figure 8.
Random variables are tremendously useful in probability theory. The under-
lying outcome space Ω is often hard to describe or otherwise resistant to direct
mathematical analysis. In particular, it is not easy to define probabilities directly
over sets of possible worlds. A random variable shifts attention from Ω to
a friendlier outcome space, such as ℝ or ℝ2, greatly augmenting our expressive
and analytic power. I will illustrate in the next section.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 11
Figure 9 The curve is the pdf. The area under the curve between a and b is the
probability assigned to ½a; b.
Figure 9, it is vital to remember that the numbers on the vertical axis are not
probabilities. They are probability densities. Probabilities are determined by
probability densities as follows: the probability assigned to an interval ½a; b is
the area under pðxÞ stretching from a to b. In this manner, the pdf (a function
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
12 Philosophy of Mind
pðsÞ > 0:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 13
Figure 11 Three Gaussian pdfs. The orange pdf has mean a. The blue and green
pdfs have mean b. The blue pdf has smaller variance than the green pdf. The
orange pdf has intermediate variance.
What about the probability assigned to fsg, that is, the set whose sole member is
s? It is not hard to show that
PðfsgÞ ¼ 0:
Intuitively: the probability assigned to fsg is the area under pðxÞ stretching from
s to s, and that area is simply 0. Thus, the probability density pðsÞ assigned to an
individual point s differs from the probability PðfsgÞ assigned to the event fsg.
Note that, even though each individual event fsg receives probability 0, we
nevertheless have
when a 6¼ b. This may at first seem surprising, but it does not violate the
probability calculus axioms. The axioms allow each event fsg to receive
probability 0 even while ½a; b receives positive probability.
The notion of pdf generalizes to ℝ2 . In the two-dimensional case,
a probability distribution assigns probabilities to sets containing ordered pairs
ðx; yÞ. For example, suppose we are modeling the asteroid’s position ðx; yÞ when
it hits the earth’s surface. The probability distribution assigns a probability to
each rectangle: this is the probability that the asteroid’s position falls within that
rectangle. In the two-dimensional case, a pdf is a nonnegative function pðx; yÞ
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
14 Philosophy of Mind
Figure 13 An alternative depiction of the pdf from Figure 12. Lighter shading
signifies higher probability density assigned to point ðx; yÞ.
over ℝ2 such that the total volume under the curve is 1. The probability assigned
to a region is the volume under the curve in that region. See Figures 12, 13, and
14. A famous example is the class of two-dimensional Gaussian distributions,
which generalize one-dimensional Gaussians to ℝ2 . See Figures 15 and 16.
Once again, it is crucial to distinguish between probability and probability
density. Probability densities attach to ordered pairs (x, y). Probabilities attach
to sets of ordered pairs.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 15
Figure 14 The pdf from Figure 12, restricted to the portion lying over
a rectangle in the ðx; yÞ plane. The volume under this portion of the pdf is the
probability assigned to the rectangle.
PðA ∩ BÞ
PðAjBÞ¼df :
PðBÞ
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
16 Philosophy of Mind
Figure 16 An alternative depiction of the pdf from Figure 15. Lighter shading
signifies higher probability density assigned to point ðx; yÞ.
Figure 17 To compute PðAjBÞ using the ratio formula, divide PðA ∩ BÞ by PðBÞ.
For heuristic purposes, assume that the probability assigned to a region is
proportional to the region’s area. Then Figure 17 depicts a case where PðAÞ is
much smaller than PðAjBÞ.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 17
reach Earth given that the asteroid has speed s when it enters the solar system.
Suppose that our probability distribution P over asteroid speed has a pdf p(x).
As indicated in Section 2.4, the probability assigned to the event fsg is 0:
PðfsgÞ ¼ 0:
y
x
c
b
a
Figure 18 Conditional densities for the pdf from Figure 12. To compute pðyjaÞ,
we fix X’s value at a and consider the resulting cross-section curve. Area under
the cross-section curve may not be 1. Thus, we divide by a normalization
constant to ensure that area under curve is 1. The result of dividing by the
normalization constant is depicted in Figure 19. Similarly for pðyjbÞ and pðyjcÞ.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
18 Philosophy of Mind
b
c
Figure 19 Three one-dimensional pdfs over Y induced by Figure 18. The blue
pdf is pðyjaÞ, the orange pdf is pðyjbÞ, and the green pdf is pðyjcÞ. These three
curves are normalized versions of the three cross-section curves from Figure 18.
y
x
c
b
a
Figure 20 Conditional densities for the Gaussian pdf from Figure 15. This
figure depicts the unnormalized cross-section curves. To convert the cross-
section curves into pdfs, we divide by a normalization constant. The normalized
curves are depicted in Figure 21.
for two other possible values b and c of X. Figures 20 and 21 depict the same
procedure, this time applied to the pdf from Figure 15. See Section A6 for full
mathematical details.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 19
b
c
Figure 21 The one-dimensional pdfs over Y induced by Figure 20. The blue pdf
is pðyjaÞ, the orange pdf is pðyjbÞ, and the green pdf is pðyjcÞ.
3.1 Conditionalization
Credences evolve. If I learn that Seabiscuit is sick, then I should lower my
credence that he will win the race. Intuitively, this is because I have a relatively
low credence that Seabiscuit will win the race given that he is sick. More
generally, suppose that I begin with credence PðHÞ and then learn E. To
conditionalize on E is to replace my former credence PðHÞ with PðHjEÞ. My
old conditional credence PðHjEÞ becomes my new unconditional credence in
H. PðHÞ is called the prior probability and PðHjEÞ is called the posterior
probability. We may write
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
20 Philosophy of Mind
PðHÞPðEjHÞ
PðHjEÞ ¼ : ð1Þ
PðEÞ
Equation (1) expresses the posterior probability PðHjEÞ in terms of the prior
probability PðHÞ and the prior likelihood PðEjHÞ. The denominator PðEÞ
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 21
PðHjEÞ ∝ PðHÞPðEjHÞ;
which highlights that the posterior is proportional to the prior times the prior
likelihood:
4
In the scientific literature, the phrase “Bayes’s Rule” is used sometimes to denote
Conditionalization, sometimes to denote Bayes’s theorem, and sometime to denote an admixture
of the two.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
22 Philosophy of Mind
This is the form of Bayes’s theorem most commonly used in scientific applica-
tions, including within Bayesian cognitive science.
We obtain a helpful visualization of (4) by holding y fixed and regarding
pðyjxÞ as a function solely of x. Viewed in this way, pðyjxÞ is called the
likelihood function or sometimes just the likelihood. Intuitively, the likeli-
hood is an initial attempt at forming a probability density over x. The initial
attempt takes into account evidence y but not the prior information encoded
by pðxÞ.5 Bayes’s theorem tells us how to combine the initial attempt pðyjxÞ
with the prior pðxÞ, yielding the posterior density pðxjyÞ. Figures 22 and 23
illustrate. In both figures, the posterior is a compromise between the prior
and the likelihood. In Figure 22, the likelihood is wide, so the posterior
remains fairly close to the prior. In Figure 23, the likelihood is narrow, so
it pulls the posterior far from the prior. For example, suppose that pðyjxÞ is
the conditional density of measuring speed y given that the asteroid has
speed x. Assuming noisy but unbiased measurement, the likelihood peaks
at y. If measurements are very noisy, then the likelihood is wide
(Figure 22), and the prior over asteroid speed exerts more influence on
the posterior. If measurements are less noisy, then the likelihood is narrow
(Figure 23), and the prior exerts less influence.
5
The likelihood is not generally a pdf because the area under the curve need not be 1. However, one
can always normalize and convert the likelihood into a pdf, so we can regard the likelihood as an
“unnormalized” pdf.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 23
Figure 22 The likelihood peaks at the measured value y. The posterior mean is
intermediate between the prior mean and y. Intuitively, the posterior is
a compromise between the likelihood and the prior.
Figure 23 The prior is the same as in Figure 22. The likelihood once again peaks
at y but is narrower. As a result, the posterior is narrower and is pulled closer to
the likelihood. In the asteroid example, a narrower likelihood corresponds to
a case where speed measurements are less noisy. It makes intuitive sense that
less noisy measurements would exert more influence.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
24 Philosophy of Mind
Figure 24 A pdf where most probability mass lies far from the mode.
3.4 Implementation
Suppose we want a physical system (such as a computer or a robot) to implement
Bayesian inference. Our first task is to decide how the system will encode cre-
dences. A major hurdle is that infinitely many distinct probabilities must often be
encoded. For example, a pdf determines the probability assigned to each interval
½a; b. There are infinitely many such intervals. A finite physical system cannot
explicitly enumerate the credence assigned to each interval. In other words, it
cannot explicitly list each individual probability Pð½a; bÞ. After all, a finite physical
system cannot explicitly list infinitely many distinct pieces of information. When
credences cannot be explicitly enumerated, they must instead be implicitly encoded.
To illustrate implicit encoding, consider the class of Gaussian distributions.
Look again at Figure 10. As noted in Section 2.4, a Gaussian distribution is
completely described by two numbers: its mean and its variance. For that
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 25
ChðAÞ;
where ChðAÞ is the objective chance that the physical system draws an outcome
belonging to A. The key idea behind sampling encoding is that these objective
chances can encode subjective probabilities (Icard, 2016). The subjective prob-
ability assigned to A is simply the objective chance that a sample belongs to A:
PðAÞ ¼ ChðAÞ:
6
Let H1 ; …; Hn be a collection of disjoint events whose union
X is Ω. The law of total probability, a
theorem of the probability calculus, states that PðEÞ ¼ PðEjHn ÞPðHn Þ.
n
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
26 Philosophy of Mind
principle computable from pðxÞ and pðyjxÞ, the computation may be impossible
in practice.
A computation is tractable when it can be executed by a physical system
with limited time and memory at its disposal. A computation is intractable
when it is not tractable. These definitions can be made mathematically precise,
but the present level of precision suffices for our purposes. The previous
paragraph may be summarized as follows: computation of the posterior is
not always tractable.7
The standard solution in Bayesian statistics is to find tractable algorithms that
approximately implement Bayesian inference. Even when we cannot exactly
compute the posterior, we can often come quite close—close enough for
practical purposes. Even when we cannot conform to the normative ideal
enshrined by Bayesian decision theory, we can often tractably approximate
the normative ideal.
One popular approximation strategy is called Markov chain Monte Carlo
(Murphy, 2023, pp. 493–536). MCMC algorithms use sampling to encode
a credal assignment that approximates the posterior. An MCMC algorithm for
approximating the posterior proceeds in discrete time stages:
t1 ; t2 ; t3 ; . . . ; tn ; . . .
Ch1 ðAÞ is the objective chance at time t1 of sampling an outcome that belongs to
A. Ch2 ðAÞ is the objective chance at time t2 of sampling an outcome that belongs
to A. Chn ðAÞ is the objective chance at time tn of sampling an outcome that
belongs to A. Objective chances evolve as the algorithm proceeds, converging
asymptotically to the posterior: as the algorithm proceeds, Chn ðAÞ grows ever
closer to the posterior probability assigned to A. After enough time has passed,
the system’s sampling behavior approximates the posterior quite well. See
Figures 25 and 26. There are general convergence results ensuring that, in
a wide range of cases, objective chances fairly quickly approach posterior
probabilities (Brooks et al., 2011).
7
Computational complexity theory studies the distinction between tractable and intractable com-
putation. See van Rooij et al. (2019) for general discussion of computational complexity theory in
relation to cognitive science.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 27
Figure 25 An illustration of MCMC approximation, for the pdf from Figure 12.
The orange dots are samples in the ðx; yÞ plane. Samples cluster in regions of
high probability.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
28 Philosophy of Mind
4.1 Perception
How does the perceptual system estimate distal conditions based upon proximal
sensory input? For example, how does it estimate the shapes, sizes, and loca-
tions of nearby objects based upon retinal stimulations? Proximal sensory
stimulations underdetermine distal conditions: numerous possible distal condi-
tions can cause the same proximal stimulations. Moreover, sensory input is
corrupted by noise during both transduction and transmission to the brain.
Despite ambiguous and noisy sensory input, the perceptual system typically
forms highly accurate estimates of distal conditions.
Helmholtz (1867/1925) proposed that the perceptual system estimates distal
conditions through an unconscious inference. Bayesian perceptual psychology
develops Helmholtz’s proposal, postulating unconscious Bayesian inferences
executed by the perceptual system (Knill & Richards, 1996; Vilares & Kording,
2011; Rescorla, 2015a). A typical Bayesian model estimates a specific variable
(e.g. shape) based on one or more proximal sensory cues (e.g. shading). The
perceptual system starts with a prior probability over the distal variable and
a prior likelihood that relates the distal variable to proximal sensory input. Upon
receiving sensory input, the perceptual system computes the posterior (or an
approximation to the posterior) over the distal variable. On that basis, the
perceptual system forms a privileged estimate of distal conditions. In most
Bayesian models, the estimate is chosen through expected utility maximization.
In other models, the privileged estimate is chosen not deterministically but
stochastically. For example, the model from (Mamassian, Landy & Maloney,
2002) implements probability matching: estimates are chosen stochastically,
with objective probability matching the posterior.
A simple example of the Bayesian approach concerns perceptual estimation
of shape from shading. As Figure 27 illustrates, shading is an ambiguous cue to
shape. In principle, the stimulus on the left could result from a convex object lit
from overhead or a concave object lit from below. Despite the ambiguity, we
perceive the stimulus on the left as convex and the stimulus on the right as
concave. How does the perceptual system estimate shape based upon the
ambiguous evidence provided by shading? The dominant theory in perceptual
psychology has long been that the perceptual system somehow “assumes” that
8
Readers seeking a more comprehensive overview of the empirical literature might consult
(Griffiths, Kemp & Tenenbaum, 2008), (Chater & Oaksford, 2008), or (Ma, Kording &
Goldreich, 2023).
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 29
Figure 27 Shading is an ambiguous cue to shape. The stimulus on the left could
result from a convex object lit from overhead or a concave object lit from below.
The perceptual system “assumes” that light comes from overhead, so we
perceive the stimulus on the left as convex and the stimulus on the right as
concave. Reprinted from https://commons.wikimedia.org/wiki/File:%27Light-
from-above%27_prior.jpg, under Creative Commons Attribution-Share Alike
4.0 International license.
light comes from overhead rather than below (Rittenhouse, 1786). This
theory translates naturally into a Bayesian setting. On a Bayesian approach,
the perceptual system estimates shape based on a prior over shapes, a prior
over lighting directions, and a prior likelihood that assigns a probability to
a given shading pattern conditional on the stimulus having a given shape
and the light coming from a given direction (Stone, 2011). The prior over
lighting directions favors overhead lighting directions. Consequently, the
posterior favors the convex interpretation of the left-hand stimulus from
Figure 27.
Bayesian models often posit that, when the perceptual system estimates the
value of distal variable X, the prior over X has a pdf pðxÞ. Models often also posit
that the prior likelihood for sensory variable Y given X has a conditional density
pðyjxÞ. Upon receiving sensory input y, the perceptual system forms new
credences determined by a density pnew ðxÞ. In some models, new credences
are given by the posterior density:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
30 Philosophy of Mind
Based on pnew ðxÞ, the perceptual system selects an estimate x* of X’s value.
See Figure 28.
The motion estimation model given by Weiss, Simoncelli, and Adelson
(2002) is a good example of Bayesian perceptual psychology’s explanatory
power. The model estimates the velocity of a moving stimulus. The model posits
a prior density pðvÞ over velocities. Crucially, the prior favors slow speeds. This
reflects the environmental regularity that objects usually move fairly slowly.
The model also posits a likelihood pðIjvÞ, where I measures light intensity over
the retina. Upon receiving input I, the perceptual system computes the posterior
pðvjIÞ and on that basis forms a privileged velocity estimate v*. The model
explains an array of illusions that had previously resisted unified explanation.
For example, it explains why low contrast stimuli seem to move slower than
high contrast stimuli: low contrast stimuli yield a wide likelihood, so the “slow
speed” prior exerts more influence over the posterior. See Figure 29. As this
example illustrates, Bayesian perceptual models can often explain perceptual
phenomena that otherwise elude satisfying explanation.
Subsequent research has further illuminated the “slow speed” prior and its
crucial role in motion perception (e.g. Stocker & Simoncelli, 2006). In
a particularly notable contribution, Kwon, Tadin, and Knill (2015) generalized
the “slow speed” prior to construct a highly successful model of object-tracking.
For further discussion of the motion estimation model, see Rescorla (2015a);
Rescorla (2018b). For further discussion of the object-tracking model, see
Rescorla (2020c).
Another successful application of Bayesian perceptual modeling is cue
combination. The perceptual system typically estimates a single distal variable
based on multiple cues, such as visual and haptic cues to size. Due to sensory
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 31
Figure 29 Illustrates how the “slow speed” prior influences motion estimation.
When the stimulus has high contrast, the likelihood is narrow and the “slow
speed” prior exerts relatively little influence on the posterior. When the stimulus
has low contrast, the likelihood is wide and the prior exerts relatively more
influence on the posterior. v̂, the posterior mean, is smaller in the low contrast
condition (b) than in the high contrast condition (a). Reprinted with permission
from Springer Nature Customer Service Center GmbH: Springer Nature,
Nature, “Noise Characteristics and Prior Expectations in Human Visual
Perception” (Stocker & Simoncelli, 2006).
noise, estimates based on distinct sensory cues will typically differ at least to
a small degree. The perceptual system must combine distinct sensory cues into
a single unified estimate of the distal variable. Ernst and Banks (2002) showed
that the Bayesian framework can successfully model combination of visual and
haptic cues to size. Researchers have subsequently generalized this finding to
numerous other cases of cue combination within and across modalities
(Trommershäuser, Kording & Landy, 2011). See Rescorla (2020b) for further
discussion of cue combination in a Bayesian setting.
Bayesian perceptual inference is subpersonal and inaccessible to conscious
introspection or control. These inferences are executed by the perceptual
system, not by the perceiver. A typical perceiver is not aware that her perceptual
system uses a “slow speed” prior. The perceptual system, not the perceiver,
encodes and deploys the prior. The perceiver is not consciously aware of any
inference based on the prior.
Perceptual priors are highly mutable, changing rapidly in response to altered
environmental statistics. Adams, Graf, and Ernst (2004) exposed subjects to
deviant visual-haptic input indicating an altered lighting direction. In response,
shape perception and lightness perception rapidly changed, reflecting a change
in the “light from overhead” prior. Similarly, the “slow speed” prior rapidly
changes in response to fast-moving stimuli (Sotiroupolous, Seitz & Seriès, 2011).
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
32 Philosophy of Mind
There is also evidence that prior likelihoods are mutable (Sato & Kording, 2014;
Sato, Toyoizumi & Aihara, 2007; Seydell, Knill & Trommershäuser, 2010).
Changing priors can themselves be modeled in Bayesian terms (Kwon & Knill,
2013).
As final illustration of Bayesian perceptual psychology’s explanatory power,
consider central tendency bias: perceptual estimates of a magnitude are biased
towards the mean of the sample distribution (Hollingworth, 1910). Relatively large
magnitudes tend to be underestimated, while relatively small magnitudes tend to be
overestimated. Depending on the case, the sample distribution may arise naturally
or may be experimentally imposed. Central tendency bias is a ubiquitous effect,
arising when subjects estimate line length (Duffy et al., 2010), interval duration
(Jazayeri & Shadlen, 2010), color (Olkkonen, McCarthy & Allred, 2014), and
many other magnitudes. It is readily explicable from a Bayesian perspective. The
key posit is that the prior adapts to match environmental statistics. For example,
when the subject encounters stimuli drawn from an experimentally imposed sample
distribution, the prior shifts to match that distribution. The shifted prior pulls
estimates towards the prior mean. See Figure 30. Researchers have elaborated
this intuitive idea into models that successfully explain central tendency bias for
a number of perceptual tasks (Glasauer, 2019; Glasauer & Shi, 2022; Petzschner,
Glasauer & Stephan, 2015). The models achieve a close fit with psychophysical
data, including detailed patterns governing the extent to which central tendency bias
occurs in different situations.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 33
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
34 Philosophy of Mind
hand velocity, and the target location. Typically, the cost function has two
components. The first component, which is task-dependent, rewards achieve-
ment of the task goal (e.g. reaching the target). The second component, which is
task-independent, penalizes energetic expenditure. At every stage, the control-
ler selects a motor command that minimizes expected costs. See Figure 31.
OFC models of motor control have achieved great empirical success
(McNamee & Wolpert, 2019). Most notably, OFC explains patterns in
repeated performance of a task. When a subject repeatedly executes a task,
the movement details vary across trials. As Bernstein (1967) first showed,
and as subsequent research has amply confirmed, movement details vary
more along task-irrelevant dimensions than task-relevant dimensions. The
discrepancy between task-relevant variation and task-irrelevant variation is
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 35
9
For further discussion of Bayesian sensorimotor psychology, with an emphasis on OFC, see
Rescorla (2016); Rescorla (2019). See also Burge (2022, pp. 502–530).
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
36 Philosophy of Mind
4.3 Navigation
Animal navigation has been intensively studied for many decades across several
disciplines, including psychology, ethology, and neuroscience. At present,
Bayesian modeling does not figure as prominently in the study of navigation
as it does in perceptual psychology and sensorimotor psychology. Nevertheless,
recent studies provide strong evidence that Bayesian inference plays a crucial
role in human navigation.
I focus on a navigational strategy called dead reckoning. During dead reckon-
ing, the navigator exploits self-motion cues to maintain a running estimate of her
own position. Self-motion cues include optic flow, efference copy, vestibular
signals, and so on. Dead reckoning is sometimes called “path integration,”
because position is the integral of velocity. Dead reckoning pervades the animal
kingdom (Gallistel, 1990, pp. 57–102), from the desert ant to humans.
A key fact about human dead reckoning is that, in many experimental condi-
tions, subjects overshoot the target destination. Traditionally, overshooting was
explained through a “leaky integrator” model (Lappe et al., 2011). The basic idea
is that subjects imperfectly integrate velocity to compute position: rather than
computing the true integral, subjects compute a slightly smaller quantity. As the
distance traveled increases, “leaks” accumulate and the discrepancy between
estimated position and true position increases. Lakshminarasimhan et al. (2018)
offer an alternative Bayesian explanation. They posit a “slow speed” prior over
self-motion. The “slow speed” prior biases estimated velocity below the true
velocity, which leads the subject to underestimate distance traveled. See
Figure 33.
The “slow speed” model explains several phenomena that the “leaky integra-
tor” model does not. For example, Lakshminarasimhan et al. (2018) studied
dead reckoning in a virtual reality setup. They manipulated the optic flow cue by
altering the density of plane elements: greater density entails a more reliable
cue. Decreased cue reliability corresponds to a relatively wide likelihood. The
“slow speed” model predicts that, when the likelihood is wide, the posterior will
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 37
Figure 33 Comparison of the “slow speed” prior model and a “leaky integrator”
model. (For heuristic purposes, the comparison only depicts one-dimensional
linear velocity. The actual model also considers angular velocity.) The panel on
the left shows the subject’s true velocity over time. The top row schematizes the
“slow speed” prior model. At a given moment, the “slow speed” prior (in green)
combines with the likelihood to yield a posterior over possible velocities. The
resulting velocity estimates are consistently smaller than actual velocity, due to
the influence of the “slow speed” prior. When velocity estimates are integrated
to form position estimates, the position estimates are biased. The bottom panel
shows a Bayesian version of the “leaky integrator” approach. (One can also
develop the “leaky integrator” approach in a non-Bayesian setting.) The prior is
uniform. As a result, velocity estimates are not biased towards smaller
velocities. The bias in position estimates stems from leaky integration, not from
biased velocity estimates. Reprinted from (Lakshminarasimhan et al., 2018)
with permission from Elsevier.
be more strongly affected by the “slow speed” prior, causing even more
overshooting. In contrast, the “leaky integrator” model does not predict that
a degraded optic flow cue causes increased overshooting. See Figure 34. The
human data exhibited more overshooting in response to the degraded optic flow
cue, conforming closely to the “slow speed” model’s predictions.
Another striking phenomenon explained by the model: when the target is
relatively distant, overshooting gives way to undershooting. The farther the
subject travels, the greater the uncertainty regarding her position, so the wider
her pdf over possible positions. When the pdf becomes quite wide, its area of
overlap with the target decreases. As a result, expected utility peaks before the
target when the target is relatively far away. For sufficiently large distances, this
bias towards undershooting swamps the bias induced by the “slow speed” prior.
The Bayesian model, by analyzing how these two biases interact with each other
and with optic flow reliability, achieves a good match with actual human
performance.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
38 Philosophy of Mind
Figure 34 Differing predictions of the “slow speed” model and the “leaky
integrator” model. The red likelihood function is narrow, corresponding to
a reliable optic flow cue. The blue likelihood function is wide, corresponding to
a degraded optic flow cue. In the “slow speed” prior model, the prior exerts
more influence in degraded cases, causing a more biased position estimate. The
“leaky integrator” model predicts that a change in cue reliability does not affect
the position estimate. The “leaky integrator” approach is illustrated here in
a Bayesian format, but the same prediction prevails for non-Bayesian versions.
Reprinted from (Lakshminarasimhan et al., 2018) with
permission from Elsevier.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 39
Dead reckoning is only one navigation strategy found in the animal kingdom.
Equally important is piloting, during which the creature uses landmarks to
estimate its own position (Gallistel, 1990, p. 41, pp. 88–93, pp. 120–123).
Even relatively primitive creatures, such as rats and bats, engage in piloting.
Of course, humans routinely do so. There is strong evidence that human piloting
relies on Bayesian inference (Jetzschke et al., 2017), as does human combin-
ation of self-motion cues and landmark cues (Chen et al., 2017).
Dead reckoning and piloting are key to navigation, but they are just the
beginning. Piloting presupposes mapping: estimation of landmark locations.
Mapping also figures prominently in robotics, where the standard solution centers
upon approximate Bayesian inference (Thrun, Burgard & Fox, 2005). Several
researchers have conjectured that some mammals likewise implement Bayesian
mapping (Gallistel, 2008; Rescorla, 2009). The conjecture fits well with every-
thing we know about mammalian navigation (Savelli & Knierim, 2019;
Shikauchi et al., 2021). Moreover, it can explain within a single theoretical
framework disparate navigational phenomena that otherwise resist unified
explanation (Kessler, Frankenstein & Rothkopf, 2024). The topic merits, and
will surely receive, further investigation.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
40 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 41
10
Many supposedly anti-Bayesian phenomena documented by Kahneman and Tversky (such as
the conjunction fallacy) involve explicit probability judgments: researchers ask subjects to judge
relative probabilities of various possibilities; elicited judgments violate the probability calculus
axioms. Poor performance in an explicit probabilistic task is hardly evidence that the subject
does not execute Bayesian inference, any more than poor performance in a symbolic logic class
is evidence that a student does not execute deductive inference. Bayesian cognitive science does
not claim that ordinary people are good at probability theory. It claims that ordinary people (or
their psychological subsystems) assign subjective probabilities and execute Bayesian operations
over the assigned probabilities. Typically, the probability assignments are not explicit but are
instead implicitly encoded.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
42 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 43
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
44 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 45
mandates risk aversion in rich conditions and risk proneness in poor conditions.
Block writes (2023, pp. 209–210):
[T]he pea plant behaves as if it represented mean levels of nutrients and their
degree of uncertainty. Since the pea plant lacks a nervous system, we can be
pretty sure that there are no such representations. Somehow, natural selection
has found a way for plants to behave according to some of the norms of
Bayesian rationality without those representations. The challenge to
Rescorla’s reasoning is that we have to allow for the possibility that the
same is true of our perceptual systems.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
46 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 47
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
48 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 49
may emerge. To undermine the argument from altered priors, one must provide
a specific alternative explanation and show that it is at least as satisfying as the
realist explanation.
In this connection, consider a system trained through reinforcement learning
to simulate Bayesian inference given certain priors. By varying the rewards, we
can train the system to simulate Bayesian inference given another set of priors.
Accordingly, instrumentalists might hope that reinforcement learning can
explain changes to the input‒output mapping. In many cases, though, subjects
receive either no feedback or extremely limited feedback on their performance.
To illustrate, consider the (Petzschner & Glasauer, 2011) dead reckoning study.
Participants received no feedback on their performance during each session,
aside from a few initial training trials to ensure familiarity with the virtual
reality setup. How, then, can reinforcement learning explain why subjects
displayed central tendency bias? There was no “reward” to drive the ongoing
change in learned responses. This study provides evidence that subjects itera-
tively update a distance prior in response to accumulated evidence.
Perhaps instrumentalist theories will eventually emerge that explain chan-
ging input‒output mappings without an appeal to changing priors. We would
then need to compare those instrumentalist theories with realist Bayesian
theories. Until that time, we do well to develop the realist perspective and see
where it leads.11
11
See Rescorla (2020c) for more on the argument from altered priors and for general defense of
realism.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
50 Philosophy of Mind
There are infinitely many of these intervals. The brain is a finite physical system
and hence, as discussed in Section 3.4, cannot explicitly list each individual
probability Pð½a; bÞ. Since the brain cannot enumerate the probability assigned
to each interval, probabilities must be implicitly encoded by neural activity. The
two main implicit encoding schemes under active consideration were men-
tioned in Section 3.4:
• Parametric encoding: the brain encodes parameters for the pdf. One possibility
is that parameters are encoded by spike counts in a neural population (Ma et al.,
2006). Each neuron is associated with a preferred stimulus value, and each
neuron’s spike count is interpreted as the strength of its “vote” for that stimulus
value. “Votes” across the neural population determine parameters of a pdf, e.g.,
the mean and variance of a Gaussian. See Figures 35 and 36.
• Sampling encoding: the brain encodes a probability distribution via sampling
propensities. For example, a neuron’s membrane potential might encode a sample
(Orbán et al., 2016). The objective chance distribution governing membrane
potentials encodes the subjective probability distribution for the variable.
Figure 35 The tuning curve for a neuron summarizes the neuron’s average
response to a stimulus value. Figure 35 depicts tuning curves for a hypothetical
neural population tuned to a one-dimensional continuous distal stimulus. The
horizontal axis contains possible stimulus values. Each tuning curve depicts the
average response (measured in spikes per second) of the corresponding neuron
to possible stimulus values. Each tuning curve peaks at a preferred value of the
stimulus. The black tuning curve has preferred stimulus value a.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 51
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
52 Philosophy of Mind
6 Mental Representation
The previous section advanced a realist perspective on the credal states posited
by Bayesian models. I now want to probe more deeply into the nature of the
posited credal states. I will explore how they relate to the mind’s representa-
tional nature.
The phrase “mental representation” is used many different ways in contem-
porary philosophy and psychology. My own usage reflects a tradition that traces
back to Frege (1892/1997) and continues through contemporary figures such as
Burge (2010) and Fodor (1975; 1987; 2008). According to this tradition, mental
representation is connected with veridicality-conditions: conditions for veridi-
cally representing the world. Examples:
• Beliefs are the sorts of things that can be true or false. My belief that Napoleon
was born in Corsica is true if Napoleon was born in Corsica, false if he was not.
• Intentions are the sorts of things that can be fulfilled or thwarted. My intention
to eat lentils for lunch is fulfilled if I eat lentils for lunch, thwarted if I do not.
• Perceptual states are the sorts of things than can be accurate or inaccurate.
Suppose I perceive object o as being a green cube. Then my perceptual state is
accurate only if o is green and cubical.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 53
falsity on how things are with Napoleon (rather than some other person). That
my belief is about Napoleon helps determine the belief’s truth-condition. So
being about Napoleon is a representational property of my belief. Similarly,
suppose I perceive some object as a green cube. The mere fact that my percep-
tual state represents green cubicality does not determine whether the state is
accurate—accuracy also depends on which cube I am perceptually representing.
Nevertheless, my perceptual state depends for its accuracy on whether the
perceptually represented object is a green cube. That my perceptual state repre-
sents green cubicality helps determine the state’s accuracy-condition. So repre-
senting green cubicality is a representational property of my perceptual state.
I will argue that credal states posited within Bayesian cognitive science have
representational properties, and I will elucidate the explanatory role played by
these representational properties.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
54 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 55
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
56 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 57
3:28084 10 ¼ 32:8084
feet/sec. We cite a different number to specify the same speed. Speeds are
distinct from the numbers through which we measure speeds.12
When we specify a credal state through a pdf, our choice of pdf depends upon
a canonical choice of measurement units. A change in measurement units neces-
sitates a change in the pdf we use to specify the credal state. Figure 37 illustrates.
The blue pdf corresponds to meters/sec. The orange pdf corresponds to feet/sec.
The pdfs are different, but they specify the same underlying probability assign-
ment over possible speeds. They specify the same “slow speed” prior. Full
technical details are given in Section A5, but the point should be intuitively
clear even absent any technical details. A pdf is defined over real numbers, so it
can describe a credal allocation over possible speeds only relative to measurement
units that map speeds to real numbers. If we change the measurement units, then
we must use a different pdf to model the same credal allocation over speeds. The
different pdf will induce a different probability distribution over sets of real
numbers, even while the underlying credal allocation remains fixed.
Our choice of measurement units reflects our societal conventions, not
inherent features of the credal state itself. There is no reason to suspect that pre-
theoretic human navigation employs our conventional measurement units.
12
See Peacocke (2019) for a general account of physical magnitudes, including argumentation that
we should add these items to our ontology.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
58 Philosophy of Mind
Indeed, it may not use any measurement units at all. (Cf. Peacocke, 2019, p. 48.)
The same credal state could just as well be specified by a different pdf. For
example, there is no reason to regard the blue pdf from Figure 37 as privileged
over the orange pdf. Neither pdf has more psychological reality than the other.
Psychological reality resides in the underlying credal state—a credal allocation
over hypotheses regarding possible speeds—rather than the pdf.
The “slow speed” prior is a credal state that allocates credences over hypoth-
eses, where the hypotheses are individuated through the specific speeds that
they represent. The pdf is a purely mathematical function that reflects
a conventional choice of measurement units. The prior does not reflect any
such conventional choice. The pdf is a useful tool for specifying the under-
lying credal state, but its mathematical elegance should not dazzle us into
ascribing psychological reality to it. The credal state is psychologically real.
The pdf is not psychologically real, and neither is the induced probability
distribution over sets of real numbers.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 59
Recall from Section 2.3 that a random variable X maps an outcome space Ω to
the real numbers ℝ. For example, suppose the outcome space Ω contains
possible speeds of an asteroid. Each outcome ω in Ω is a speed that the asteroid
might have. Speeds are physical magnitudes, not real numbers. Assuming
a canonical choice of measurement units, we can measure magnitudes using
real numbers. Let X be a random variable that maps each speed to the corres-
ponding real number, using meters/sec as canonical units. Thus,
X ðωÞ ¼ x
Figure 38 Illustrates the relation between P and μ. μ maps ½a; b to the same
probability as that to which P maps X 1 ½a; b.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
60 Philosophy of Mind
sets of speeds. More generally, and as discussed more rigorously in Section A3,
we can always use a random variable to transfer a probability distribution over
sets of real numbers into a probability distribution over sets with members
drawn from the underlying outcome space.
Now consider a different random variable Y that maps each magnitude to the
corresponding real number using feet/sec. Thus,
Y ðωÞ ¼ y
when real number y specifies speed ω in feet/sec. Using the standard conversion
from meters/sec to feet/sec, we obtain the following relation between X and Y:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 61
precisely: the credal allocation assigns credences to sets whose members are
drawn from Ω. The pdf depends upon an arbitrary choice of measurement units.
The credal allocation does not. Psychological reality resides in the credal
allocation, not the pdf.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
62 Philosophy of Mind
13
Mahtani (2024) conducts a detailed investigation into the objects of credence, focused primarily
on the intersection of formal epistemology with philosophy of language rather than on Bayesian
cognitive science.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 63
(i) The model posits credal states (a prior and a posterior) that assign
probabilities to events X 1 ½a; b. There are uncountably many events
X 1 ½a; b, so the model posits a credal assignment over uncountably
many events.
(ii) The model posits credal states drawn from among uncountably many possible
options. This remains so even if we demand that credal assignments belong to
a fixed parametric family, such as the family of Gaussian distributions.
(iii) The model posits a privileged estimate x* of X’s value, as in Figure 28.
There are uncountably many possible values x*, so the model posits
a privileged estimate selected from among uncountably many options.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
64 Philosophy of Mind
I agree that there are finitary limits of some sort on human representational and
computational capacities. For example, we do not have infinite memory storage
capacity: the mind cannot explicitly list infinitely many distinct pieces of infor-
mation. Yet I wonder whether (i)‒(iii) flout any genuine finitary limits on human
mental activity. As discussed in Section 5.4, computational neuroscience offers
various theories of how the brain could, in principle, implement or approximately
implement Bayesian inference. The theories are biologically plausible, and they
fit well with diverse neurophysiological data. Several theories describe the brain
as implementing a Bayesian model that satisfies (i)‒(iii). Those theories feature
nondiscrete neural variables (e.g. membrane potential), which are taken to pro-
vide a substrate for credal states. Thus, (i)‒(iii) look compatible with lots of work
in contemporary computational neuroscience.
The classical computational theory of mind (CTM) holds that mental activity
is digital computation (Fodor, 1975; 1987; 2008; Gallistel & King, 2009;
Pylyshyn, 1984; Rescorla, 2020). A digital computing system has at most
countably many possible computational states. Hence, CTM is incompatible
with (ii) and (iii). However, CTM is compatible with (i). There is a well-
developed framework—computable probability theory—that studies how digi-
tal computing systems can encode and compute over probability distributions
(Ackerman, Freer & Roy, 2019). In this framework, the computing system often
satisfies (i) but not (ii) or (iii). The system encodes a credal assignment over
uncountably many events, but there are only countably many possible credal
assignments and privileged estimates x* available to the system. For example,
the system may encode a Gaussian distribution, but there are only countably
many distinct Gaussian distributions that it could have instead encoded (it can
only encode a Gaussian whose mean and variance are drawn from a fixed
countable set). In more practical terms, computer scientists and roboticists
frequently program digital systems to compute over nondiscrete random vari-
ables (e.g. Thrun, Burgard & Fox, 2005). These systems encode a wide range of
probability distributions, including Gaussian distributions and many others
besides. Their computations satisfy (i) though not (ii) and (iii). Thus, propon-
ents of CTM can happily allow that the mind assigns credences to uncountably
many events.
Infinitary Bayesian models raise thorny questions at the intersection of phil-
osophy, psychology, computation theory, and neuroscience.14 I cannot hope to
settle these questions here. For present purposes, the key point is that a realist
representationalist perspective on Bayesian cognitive science admits several
14
In particular, they engage longstanding debates over whether the mind executes digital versus
analog computation. See Rescorla (2020a) for an introductory discussion.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 65
Each position is compatible with realism, which commits to credal states and
transitions approximately like the ones posited by the model but does not insist
that the model is literally true. Each position is compatible with representation-
alism, which champions the representational nature of credal states but does not
mandate infinitary representational capacities.
7 Anti-representationalism
Anti-representationalists hold that we should expunge mental representation from
rigorous scientific theorizing. They seek to explain mental and behavioral phe-
nomena in strictly nonrepresentational terms. Different anti-representationalists
favor different nonrepresentational paradigms:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
66 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 67
(a) Computational models of the mind mention inputs and outputs, but they do
not mention internal states that mediate between inputs and outputs.
(b) Computational models describe inputs and outputs in purely mathematical
terms, without mentioning any representational properties of the inputs or
outputs.
(c) Representational discourse plays a purely heuristic role in cognitive science
theorizing. It makes no genuine explanatory contribution.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
68 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 69
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
70 Philosophy of Mind
I respond as follows:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 71
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
72 Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 73
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
74 Philosophy of Mind
transitions, and a PPC model, which specifies how the Bayesian model is
neurally implemented by a predictive coding mechanism. I agree with Hutto
and Myin that there is no reason to think that the system represents its own
inputs or own neural activity. Nevertheless, we have strong reason to think that
the system represents the environment. We have strong reason to describe the
system’s credal states in representational terms, as allocating credences over
representationally-individuated hypotheses. Only then can we preserve explan-
ations that rely on representationally-characterized credal states. For example,
how can we explain overshooting in dead reckoning unless we posit a prior that
favors slower speeds? I have no idea how enactivists would interpret the “slow
speed” prior in nonrepresentational terms, let alone how the ensuing explan-
ations would work.
Hutto and Myin (pp. 151–155) express skepticism about my representation-
alist interpretation of Bayesian models. They do not provide a developed
alternative interpretation. They do not indicate how to gloss credal states and
transitions in nonrepresentationalist enactivist terms. In fact, they barely discuss
credal states: they mention priors a mere handful of times, and they do not
mention posteriors at all. They do not analyze a single specific Bayesian model
of mental activity, even in the most schematic way. Their treatment gives no hint
how enactivists might eschew representational vocabulary while preserving the
explanatory power of Bayesian modeling.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Bayesian Models of the Mind 75
8 Conclusion
The mind operates amid constant uncertainty stemming from multiple sources,
including noise, ambiguous input, and conflicting sensory cues. Bayesian
cognitive science postulates that the mind grapples with uncertainty by impli-
citly encoding credal assignments over hypotheses. The encoded credences
influence inference and decision-making, roughly in accord with Bayesian
norms. The Bayesian program draws support from strong empirical evidence
across a range of psychological domains.
I have analyzed Bayesian modeling from a realist representationalist per-
spective that takes seriously the postulation of credal states and transitions.
Realists hold that, when a Bayesian model is explanatorily successful, we have
good reason to accept the existence of credal states and transitions roughly like
those posited by the model. Representationalists hold that the posited credal
states assign credences to hypotheses individuated through their representa-
tional properties. The realist representationalist interpretation fits much better
with scientific practice than do rival instrumentalist or anti-representationalist
interpretations.
Throughout my discussion, I have highlighted foundational questions raised
by the Bayesian paradigm. Which mental processes approximately conform to
Bayesian norms, and which do not? How do nature and nurture jointly influence
priors employed by the mind? How are credal states neurally implemented?
How does the brain transition from one credal state to another? What computa-
tional strategies does it use to approximate intractable Bayesian inferences?
What is it to attach a credence to a hypothesis? Given that hypotheses are sets of
outcomes, what exactly are the outcomes? How literally should we construe an
infinitary Bayesian model built atop an uncountable outcome space? Ongoing
research into these and other foundational questions promises to illuminate
how the representational mind, by approximating rational norms, copes with
perpetual uncertainty.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Appendix: Foundations of Probability
Theory
This appendix presents some key probabilistic concepts as they relate to
Bayesian modeling. It serves as a more mathematically rigorous complement
to the informal exposition from Sections 2 and 3.
A few preliminary definitions are in order. Let A and B be sets. A function
f from A to B is an injection iff f ðaÞ and f ðbÞ are distinct whenever a and b are
distinct. A function f from A to B is a surjection iff, for each b 2 B, there exists
a 2 A such that f ðaÞ ¼ b. A bijection is a function that is an injection and
a surjection. N is the set of natural numbers: f0; 1; 2; 3; . . .g. A is infinite iff
there exists an injection from N to A. A is countably infinite iff there exists
a bijection from N to A. A is countable iff it is finite or countably infinite. A is
uncountable iff it is infinite but not countably infinite. R is the set of real numbers.
½a; b is fx 2 R : a ≤ x ≤ bg. R n is the set of n-tuples drawn from R, that is,
A1 Measurable Spaces
In Kolmogorov’s axiomatization, probabilities attach to sets whose members
are drawn from an outcome space Ω. The powerset of Ω is the set containing all
subsets of Ω. We notate it as PðΩÞ. When Ω is finite, we can assign probabilities
to all members of PðΩÞ. When Ω is uncountable, it is often impossible to assign
intuitively plausible probabilities to all members of the powerset (Proschan &
Shaw, 2016, pp. 17–35). Instead, probability theorists assign probabilities to
certain privileged members of PðΩÞ. The privileged members, called events,
form a σ-field over Ω. A σ-field over Ω is a subset F of PðΩÞ such that:
Ω belongs to F .
If H belongs to F , then H c belongs to F .
If H1 ; H2 ; . . . ; Hn ; . . . belong to F , then their union ∪ Hn also belongs to F .
n
The union ∪ Hn is the set containing all elements that belong to at least one of
n
the sets Hn . There may be a countable infinity of sets Hn .
We typically choose a σ-field that arises organically from our interests. For
example, suppose we are modeling an asteroid’s speed using outcome space R.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Appendix: Foundations of Probability Theory 77
A natural question is whether the asteroid’s speed x falls in the interval ½a; b.
We would like to assign probabilities to all these intervals. At the very least,
then, our σ-field should contain every interval ½a; b. Consider the minimal
σ-field containing all intervals ½a; b. Call it B. Intuitively: we throw just enough
sets into B to ensure that B contains each interval ½a; b and is closed under
complementation and countable union. B’s members are called the Borel sets.
B usually serves as the most natural σ-field when the outcome space is R.
Similarly, suppose that the outcome space is R 2 , i.e., the set of ordered pairs of
real numbers. Consider the minimal σ-field containing all rectangles. Elements
of this σ-field are again called Borel sets. The same construction generalizes to
R n , for arbitrary n.
An outcome space Ω along with a σ-field F form a measurable space,
typically notated as (Ω, F ).
A2 Probability Measures
We now consider a function P that assigns probabilities to events belonging to
F . For each H 2 F , PðHÞ is the probability assigned to H. As indicated in
Section 2.2, Kolmogorov places three axiomatic constraints on P. Here are the
first two axioms:
0 ≤ PðHÞ ≤ 1:
PðΩÞ ¼ 1:
As for the third axiom (additivity), recall my formulation from Section 2.2:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
78 Appendix: Foundations of Probability Theory
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Appendix: Foundations of Probability Theory 79
This condition ensures that, for each Borel set B, F contains the hypothesis that
X’s value falls within B. For example, F includes each event X 1 ½a; b. For any
real number x, the event
fω 2 Ω : X ðωÞ ¼ xg
is typically notated as
X ¼ x:
We may write PðX ¼ xÞ for the probability that X has value x (e.g. the probabil-
ity that the asteroid has speed x). Similarly, the event
fω 2 Ω : X ðωÞ 6¼ xg
is typically notated as
X 6¼ x:
We may write PðX 6¼ xÞ for the probability that X does not have value x.
Given a probability space (Ω, F , P) and a random variable X, we can define
a probability measure μ over the measurable space ðℝ; BÞ:
μðBÞ¼df P X 1 ðBÞ , for every B 2 B.
Figure 38 illustrates, for the special case where B ¼ ½a; b. μ is called X’s
distribution. It is often easier to work with probability measures over ðℝ; BÞ
than with probability measures over (Ω, F ), especially when Ω is complicated.
These definitions generalize from R to R n . The definitions are the same,
except that we consider Borel sets over R n rather than R.
Given a function X from Ω to R and a probability measure μ over ðℝ; BÞ, we
can use X and μ to define a probability space with Ω as the outcome space.
Define σðX Þ, the σ-field generated by X, by
σðX Þ¼df X 1 ðBÞ : B 2 Bg:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
80 Appendix: Foundations of Probability Theory
PðX ¼ xÞ > 0:
To prove this statement, let us for each natural number n > 0 define Dn as follows:
Dn ¼ df x 2 R : Pð X ¼ xÞ > 1n :
PðX ¼ x1 ∪ X ¼ x2 ∪ . . . ∪ X ¼ xn Þ ¼
PðX ¼ x1 Þ þ PðX ¼ x2 Þ þ . . . þ PðX ¼ xn Þ:
Each individual term PðX ¼ xi Þ is greater than 1=n, so the sum on the right is
greater than
n
¼ 1;
n
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Appendix: Foundations of Probability Theory 81
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
82 Appendix: Foundations of Probability Theory
A pdf induces a probability measure μ over ðℝ; BÞ. The probability assigned by
μ to ½a; b is the area under pðxÞ stretching from a to b:
ðb
μð½a; bÞ ¼ pðxÞdx:
a
Figure 41 This pdf alters the pdf from Figure 9 at a single point c. The alteration
does not affect the area under the curve, so the two pdfs determine the same
probability distribution.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Appendix: Foundations of Probability Theory 83
Y ðωÞ ¼ kX ðωÞ:
Suppose that X’s distribution has a pdf pðxÞ. One can show that Y’s distribution
has a pdf qðyÞ given by
pðy=kÞ
qðyÞ ¼ : ð6Þ
k
See Ma, Kording & Goldreich (2023, pp. 333–336). The change in variable
(from X to Y) necessitates a change in pdf.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
84 Appendix: Foundations of Probability Theory
pðy j X ¼ aÞ:
pða; yÞ:
One might hope to set the conditional density pðy j X ¼ aÞ equal to pða; yÞ,
where we hold a fixed and allow y to vary. The only hitch is that pða; yÞ, viewed
as a function of y, may not be a pdf: the area under the curve may not be 1. We
must settle for proportionality rather than equality:
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Appendix: Foundations of Probability Theory 85
a constant to ensure that the area under the curve is 1. This constant is called
a normalization constant.
More formally, we may define conditional density as follows. Take pðx; yÞ as
given and define pðxÞ, the marginal pdf for X:
ð∞
pðxÞ¼df pðx; yÞdy:
∞
pðxÞ is computed by holding x fixed and integrating pðx; yÞ over all possible
values of y. Assuming that pðxÞ > 0, we may define the conditional density of
Y given X ¼ x by the equation
pðx; yÞ
pðy j X ¼ xÞ¼df : ð7Þ
pðxÞ
pðx; yÞ
pðy j xÞ ¼ : ð8Þ
pðxÞ
pðy j xÞ results from pðx; yÞ by holding x fixed and then normalizing. See
Figures 18, 19, 20, and 21. These definitions generalize to higher dimensions.16
It is often most natural to regard pðxÞ and pðy j xÞ as primitive rather than
defined. For example, pðxÞ might be a pdf for asteroid speed and pðy j xÞ might
be the conditional density of measuring speed y given that the asteroid has
speed x. Taken together, pðxÞ and pðy j xÞ determine a joint density pðx; yÞ: we
simply view equation (8) as a definition of pðx; yÞ rather than of pðy j xÞ. In
practice, we need not usually consider the joint density. It lies in the back-
ground of our theorizing, but we only explicitly consider pðxÞ, pðy j xÞ, and
pðx j yÞ.
The ratio formula and the theory of conditional densities suffice for most
applications of Bayesian decision theory. However, there are situations where
we would like to define conditional probabilities yet neither the ratio formula
nor the theory of conditional densities applies. To illustrate with a cognitive
16
This paragraph glosses over some major philosophical and mathematical complications. Due to
a phenomenon known as the Borel‒Kolmogorov paradox, we cannot condition directly on X ¼ x
when PðX ¼ xÞ ¼ 0 (Kolmogorov, 1933/1956). We instead condition on X ¼ x considered as
embedded within the σ-field σðXÞ. If we were to consider X ¼ x as embedded within a different
σ-field, then different conditional probabilities might result. The notation pðy j X ¼ xÞ is rather
misleading because it elides this relativity to an embedding σ-field. Similarly for the notation
pðy j xÞ. See Rescorla (2015c) for discussion of the Borel‒Kolmogorov paradox and its
ramifications.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
86 Appendix: Foundations of Probability Theory
PðH ∩ EÞ
PðH j EÞ ¼
PðEÞ
PðE ∩ HÞ
PðE j HÞ ¼ :
PðHÞ
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Appendix: Foundations of Probability Theory 87
PðHÞPðE j HÞ
PðH j EÞ ¼ : ð9Þ
PðEÞ
It is remarkable that this theorem follows almost trivially from the ratio formula
yet offers such profound insight into rational inference.
Now consider the case where we have a two-dimensional pdf pðx; yÞ. Define
conditional densities and marginals as in Section A6:
ð∞
pðxÞ¼df pðx; yÞdy
∞
ð∞
pðyÞ¼df pðx; yÞdx
∞
pðx; yÞ
pðy j xÞ¼df
pðxÞ
pðx; yÞ
pðx j yÞ¼df ;
pðyÞ
where the third definition presupposes pðxÞ > 0 and the fourth presupposes
pðyÞ > 0. From these latter two definitions,
By algebra,
pðy j xÞpðxÞ
pðx j yÞ ¼ ; ð10Þ
pðyÞ
which is Bayes’s theorem for pdfs. Note that 1=pðyÞ does not depend upon x. It
figures solely as a normalization constant. Although (9) and (10) look similar
and have similar proofs, they are distinct: (9) concerns conditional probabilities,
while (10) concerns conditional densities.
Bayes’s theorem generalizes beyond the formulations given here, using
Kolmogorov’s theory of conditional probability (Ghosal & van der Vaart, 2017,
p. 7). There are also some situations where no analogue to Bayes’s theorem is
available (Ghosal & van der Vaart, 2017, pp. 7–8). In those situations, one can still
conform to Conditionalization: one can respond to new evidence by replacing the
prior with the posterior. Unfortunately, one can no longer use anything like (9) or
(10) to compute the posterior.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
References
Abend, O., Kwiatkowski, T., Smith, N., Goldwater, S. & Steedman, S. (2017).
Bootstrapping language acquisition. Cognition, 164, 116–143.
Ackerman, N., Freer, C. & Roy, D. (2019). On the computability of conditional
probability. Journal of the ACM, 66, 1–40.
Adams, W., Graf, E. & Ernst, M. (2004). Experience can change the “light-
from-above” prior. Nature Neuroscience, 7, 1057–1058.
Aitchison, L. & Lengyel, M. (2017). With or without you: predictive coding
and Bayesian inference in the brain. Current Opinion in Neurobiology,
46, 219–227.
Ashby, D. (2006). Bayesian statistics in medicine: a 25 year review. Statistics in
Medicine, 25, 3589–3631.
Baker, C. & Tenenbaum, J. (2014). Modeling human plan recognition using
Bayesian theory of mind. In G. Sukthankar, R. P. Goldman, C. Geib,
D. Pynadath & H. Bui, eds., Plan, Activity, and Intent Recognition:
Theory and Practice. Waltham: Morgan Kaufmann, pp. 177–204.
Battaglia, P. W., Hamrick, J. & Tenenbaum, J. (2013). Simulation as an engine
of physical scene understanding. Proceedings of the National Academy of
Sciences, 110, 18327–18332.
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances.
Philosophical Transactions of the Royal Society of London, 53, 470–418.
Beck, J., Ma, W. J., Latham, P. E. & Pouget, A. (2007). Probabilistic population
codes and the exponential family of distributions. In P. Cisek, T. Drew &
J. F. Kalaska, eds., Computational Neuroscience: Theoretical Insights into
Brain Function. New York: Elsevier.
Berger, J. (1985). Statistical Decision Theory: Foundations, Concepts, and
Methods, 2nd ed., New York: Springer.
Berniker, M., & Voss, M. (2010). Learning priors for Bayesian computations in
the nervous system. PLOS One, 10, e12686.
Bernstein, N. (1967). The Coordination and Regulation of Movements. Oxford:
Pergamon.
Billingsley, P. (1995). Probability and Measure. 3rd ed. New York: Wiley.
Block, N. (2018). If perception is probabilistic, why does it not seem
probabilistic? Philosophical Transactions of the Royal Society B, 373,
20170341.
Block, N. (2023). The Border Between Perception and Cognition. Cambridge,
MA: MIT Press.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
References 89
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
90 References
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
References 91
Field, H. (2001). Truth and the Absence of Fact. Oxford: Clarendon Press.
Fiser, J., Berkes, P., Orbán, G. & Lengyel, M. (2010). Statistically optimal
perception and learning: from behavior to neural representations. Trends in
Cognitive Sciences, 14, 119–130.
Fodor, J. (1975). The Language of Thought. New York: Thomas Y. Crowell.
Fodor, J. (1987). Psychosemantics. Cambridge, MA: MIT Press.
Fodor, J. (2008). LOT2. Oxford: Clarendon Press.
Fodor, J. & Pylyshyn, Z. (1981). How direct is visual perception? Some reflec-
tions on Gibson’s “ecological approach.” Cognition, 9, 139–196.
Frege, G. (1892/1997). On Sinn and Bedeutung. Rpt. in M. Beaney, ed., The
Frege Reader, trans. M. Black. Malden, MA: Blackwell.
Fristedt, B. & Gray, L. (1997). A Modern Approach to Probability Theory.
Boston: Birkhäuser.
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature
Reviews Neuroscience, 11, 127–138.
Gaifman, H. & Snir, M. (1982). Probabilities over rich languages, testing, and
randomness. The Journal of Symbolic Logic, 47, 495–548.
Gallistel, C. R. (1990). The Organization of Learning. Cambridge, MA: MIT
Press.
Gallistel, C. R. (2008). Dead reckoning, cognitive maps, animal navigation, and
the representation of space: an introduction. In M. Jeffries & W.-K. Yeap,
eds., Robotics and Cognitive Approaches to Spatial Mapping. Berlin:
Springer, pp. 137‒143.
Gallistel, C. R. & King, A. (2009). Memory and the Computational Brain.
Malden, MA: Wiley-Blackwell.
Ganguli, D. & Simoncelli, E. (2014). Efficient sensory encoding and Bayesian
inference with heterogeneous neural populations. Neural Computation,
26, 2103–2134.
Gardner, J. (2019). Optimality and heuristics in perceptual neuroscience. Nature
Neuroscience, 22, 514–523.
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehatri, A. & Rubin, D. (2014).
Bayesian Data Analysis, 3rd ed. New York: CRC Press.
Ghosal, S. & van der Vaart, A. (2017). Fundamentals of Nonparametric
Bayesian Inference. Cambridge: Cambridge University Press.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston:
Houghton Mifflin.
Glasauer, S. (2019). Sequential Bayesian updating as a model for human
perception. Progress in Brain Research, 249, 3–18.
Glasauer, S. & Shi, Z. (2022). Individual beliefs about temporal continuity
explain variation of perceptual biases. Scientific Reports, 12, 10746.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
92 References
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
94 References
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
References 95
Nashed, J., Crevecoeur, F. & Scott, S. (2012). Influence of the behavioral goal
and environmental obstacles on rapid feedback responses. Journal of
Neurophysiology, 108, 999–1009.
Nielsen, M. (2021). A new argument for Kolmogorov Conditionalization. The
Review of Symbolic Logic, 14, 930–945.
Norris, D. (2006). The Bayesian reader: explaining word recognition as an
optimal Bayesian decision process. Psychological Review, 113, 327–357.
Oaksford, M. & Chater, N. (2007). Bayesian Rationality: The Probabilistic
Approach to Human Reasoning. Oxford: Oxford University Press.
Oaksford, M. & Chater, N. (2020). New paradigms in the psychology of
reasoning. Annual Review of Psychology, 71, 305–330.
Olkkonen, M., McCarthy, P. & Allred, S. (2014). The central tendency bias in
color perception: effects of internal and external noise. Journal of Vision,
14, 1–15.
Orbán, G., Berkes, P., Fiser, J. & Lengyel, M. (2016). Neural variability and
sampling-based probabilistic representations in the visual cortex. Neuron,
92, 530–542.
Orlandi, N. (2014). The Innocent Eye: Why Vision is not Cognitive Process.
Oxford: Oxford University Press.
Peacocke, C. (1994). Content, computation, and externalism. Mind and
Language, 9, 303–335.
Peacocke, C. (1999). Computation as involving content: a response to Egan.
Mind and Language, 14, 195–202.
Peacocke, C. (2019). The Primacy of Metaphysics. Oxford: Oxford University
Press.
Peters, M., Ma, W. J. & Shams, L. (2016). The size-weight illusion is not
anti-Bayesian after all. PeerJ, 4, e2124.
Pettigrew, R. (2019). Epistemic utility arguments for probabilism. In E. Zalta,
ed., The Stanford Encyclopedia of Philosophy (Winter 2019). https://plato
.stanford.edu/archives/win2019/entries/epistemic-utility.
Pettigrew, R. (2020). Dutch Book Arguments. Cambridge: Cambridge
University Press.
Petzschner, F. & Glasauer, S. (2011). Iterative Bayesian estimation as an
explanation for range and regression effects: a study on human path
integration. The Journal of Neuroscience, 31, 17220–17229.
Petzschner, F., Glasauer, S. & Stephan, K. (2015). A Bayesian perspective on
magnitude estimation. Trends in Cognitive Sciences, 19, 285–293.
Piantadosi, S. & Jacobs, R. (2016). Four problems solved by the probabilistic
language of thought. Current Directions in Psychological Science, 25, 54–59.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
96 References
Pouget, A., Beck, J., Ma., W. J. & Latham, P. (2013). Probabilistic brains:
knowns and unknowns. Nature Neuroscience, 16, 1170–1178.
Proschan, M. & Shaw, P. (2016). Essentials of Probability Theory for
Statisticians. New York: CRC Press.
Putnam, H. (1975). Mathematics, Matter, and Method: Philosophical Papers,
vol. 1. Cambridge: Cambridge University Press.
Pylyshyn, Z. (1984). Computation and Cognition. Cambridge, MA: MIT Press.
Quine, W. V. (1960). Word and Object. Cambridge, MA: MIT Press.
Rahnev, D. & Denisov, R. (2018). Suboptimality in perceptual decision making.
Behavioral and Brain Sciences, 41, e223.
Ramsey, F. P. (1931). Truth and probability. In R. Braithwaite, ed., The
Foundations of Mathematics and Other Logical Essays. London: Kegan,
Paul, Trench, Trubner & Co., pp.156‒198.
Ramsey, W. (2007). Representation Reconsidered. Cambridge: Cambridge
University Press.
Rao, R. and Ballard, D. (1999). Predictive coding in the visual cortex: a
functional interpretation of some extra-classical receptive-field effects.
Nature Neuroscience, 2, 79–87.
Rescorla, M. (2009). Cognitive maps and the language of thought. The British
Journal for the Philosophy of Science, 60, 377–407.
Rescorla, M. (2015a). Bayesian perceptual psychology. In M. Matthen, ed., The
Oxford Handbook of the Philosophy of Perception. Oxford: Oxford
University Press, pp. 694‒716.
Rescorla, M. (2015b). Review of Nico Orlandi’s The Innocent Eye: Why Vision
is not a Cognitive Process. Notre Dame Philosophical Reviews, January
2015.
Rescorla, M. (2015c). Some epistemological ramifications of the Borel‒
Kolmogorov paradox. Synthese, 192, 735–767.
Rescorla, M. (2016). Bayesian sensorimotor psychology. Mind and Language,
31, 3–36.
Rescorla, M. (2017). Review of Andy Clark’s Surfing Uncertainty. Notre Dame
Philosophical Reviews, January 2017.
Rescorla, M. (2018a). A Dutch book theorem and converse Dutch book theorem
for Kolmogorov Conditionalization. The Review of Symbolic Logic, 11,
705–735.
Rescorla, M. (2018b). An interventionist approach to psychological
explanation. Synthese, 195, 1909–1940.
Rescorla, M. (2019). Motor computation. In M. Colombo & M. Sprevak, eds.,
The Routledge Handbook of the Computational Mind. New York:
Routledge, pp. 424‒435.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
References 97
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
98 References
Savelli, F. & Knierim, J. (2019). Origin and role of path integration in the
cognitive representations of the hippocampus: computational insights into
open questions. Journal of Experimental Biology, 222, 1–13.
Scott, S. (2012). The computational and neural basis of voluntary motor control
and planning. Trends in Cognitive Sciences, 16, 541–549.
Seydell, A., Knill, D. & Trommershäuser, J. (2010). Adapting internal statistical
models for interpreting visual cues to depth. Journal of Vision, 10, 1–27.
Shadmehr, R. & Mussa-Ivaldi, S. (2012). Biological Learning and Control.
Cambridge, MA: MIT Press.
Shea, N. (2018). Representation in Cognitive Science. Oxford: Oxford
University Press.
Shikauchi, Y., Miyakoshi, M., Makeig, S. & Iversen, J. (2021). Bayesian models
of human navigation behavior in an augmented reality audiomaze.
European Journal of Neuroscience, 54, 8308–8317.
Skyrms, B. (1980). Causal Necessity. New Haven, CT: Yale University Press.
Skyrms, B. (1987). Dynamic coherence and probability kinematics. Philosophy
of Science, 54, 1–20.
Skyrms, B. (1995). Strict coherence, sigma coherence, and the metaphysics of
quantity. Philosophical Studies, 77, 39–55.
Sotiropoulos, G., Seitz, A. & Seriès, P. (2011). Changing expectations about
speed alters perceived motion direction. Current Biology, 21, R883‒R884.
Stalnaker, R. (1970). Probabilities and conditionals. Philosophy of Science, 37,
64–80.
Stalnaker, R. (1984). Inquiry. Cambridge, MA: MIT Press.
Steele, K. & Stefánsson, H. (2016). Decision theory. In E. Zalta, ed., The
Stanford Encyclopedia of Philosophy (Winter 2016). https://plato.stan
ford.edu/archives/win2016/entries/decision-theory.
Stich, S. (1983). From Folk Psychology to Cognitive Science. Cambridge, MA:
MIT Press.
Stocker, A. (2018). Credo for optimality. Behavioral and Brain Sciences, 41,
e244.
Stocker, A. & Simoncelli, E. (2006). Noise characteristics and prior expectations
in human visual speed perception. Nature Neuroscience, 4, 578–585.
Stone, J. (2011). Footprints sticking out of the sand, part 2: Children’s Bayesian
priors for shape and lighting direction. Perception, 40, 175–190.
Stone, J. (2013). Bayes’s Rule: A Tutorial Introduction to Bayesian Analysis.
Sebtel Press.
Temperley, D. (2007). Music and Probability. Cambridge, MA: MIT Press.
Thrun, S., Burgard, W. & Fox, D. (2005). Probabilistic Robotics. Cambridge,
MA: MIT Press.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
References 99
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Acknowledgments
I presented portions of this material at the 2019 Norwegian Summer Institute on
Language and Mind; a fall 2020 graduate seminar at UCLA; three sessions of
a spring 2022 graduate seminar led by Roberto Casati at the Institut Jean Nicod;
a spring 2024 Princeton University cognitive science colloquium; and a spring
2024 workshop on bounded rationality at the University of California, Berkeley.
I am grateful to all participants in these events, especially Tyler Brooke-Wilson,
Roberto Casati, Kenny Easwaran, Adam Elga, Verónica Gómez Sánchez,
Steven Gross, Elizabeth Harman, Geoffrey Lee, Sarah-Jane Leslie, John
MacFarlane, Alonso Molina, Nico Orlandi, Jiarui Qu, Georges Rey, Paul
Talma, David Thorstad, Alejandro Vesga, Francesca Zaffora Blando, and
Snow Zhang for their helpful feedback. I also thank Cosmo Grant, Thomas
Icard, Keith Frankish, and two anonymous referees for their comments on an
earlier draft of the manuscript. Finally, I thank Olivia Bollinger, who prepared
Figures 31 and 42, and Jiarui Qu, who prepared all the other original figures.
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Philosophy of Mind
Keith Frankish
The University of Sheffield
Keith Frankish is a philosopher specializing in philosophy of mind, philosophy of
psychology, and philosophy of cognitive science. He is the author of Mind and Supermind
(Cambridge University Press, 2004) and Consciousness (2005), and has also edited or
coedited several collections of essays, including The Cambridge Handbook of Cognitive
Science (Cambridge University Press, 2012), The Cambridge Handbook of Artificial
Intelligence (Cambridge University Press, 2014) (both with William Ramsey), and Illusionism
as a Theory of Consciousness (2017).
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973
Philosophy of Mind
Downloaded from https://www.cambridge.org/core. IP address: 201.224.145.68, on 01 Feb 2025 at 14:48:00, subject to the Cambridge Core terms of
use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108955973