0% found this document useful (0 votes)

20 views45 pages

Lecture 3 9.66

9.66

Uploaded by

Gio Villa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views45 pages

Lecture 3 9.66

9.66

Uploaded by

Gio Villa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Class announcements

• Pset 2 due next Monday (Oct 24)

• Recitation this week

– Help with Pset 2
Plan for today

• Patterns of inference in causal models

– Explaining away in perception

• Introduction to MCMC and sampling-based

inference in human cognition
Explaining away in cognition
Reflectance Illumination Busy Doesn’t like me

Luminance Ignoring me

Strong Strong
athlete academics Easy exam Good student

College admission A on exam

Explaining away in vision

colored
Mach
card…
Explaining away in vision

S0: Lighting
The Mach
Illusion...
Explaining away in social inference
Easy exam Good student

A on exam
Explaining away in social inference
Easy exam 1 Good student Easy exam 2 Easy exam 3

A on exam 1 A on exam 2 A on exam 3

Explaining away in social inference
Easy subject

Easy exam 1 Good student 1 Easy exam 2 Easy exam 3

A for student 1 A for student 1 A for student 1

on exam 1 on exam 2 on exam 3

A for student 2 A for student 2 A for student 2

on exam 1 on exam 2 on exam 3

Good student 2
Explaining away in social inference
Easy subject

Easy exam 1 Good student 1 Easy exam 2 Easy exam 3

A for student 1 A for student 1 A for student 1

on exam 1 on exam 2 on exam 3

A for student 2 A for student 2 A for student 2

on exam 1 on exam 2 on exam 3

Good student 2

• Abstract principles and systematic biases in attribution?

Plan for today

• Towards a probabilistic language of thought

– Bayesian networks
– Probabilistic programs

• Patterns of inference in causal models

– Explaining away

• Introduction to MCMC and sampling-based

inference in human cognition
Varieties of Monte Carlo

• Rejection sampling
• MCMC: Metropolis-Hastings (MH)
– MH with prior kernel
– MH with drift kernel
– MH with drift along the posterior gradient
Generate samples, re-weight them to approximate
posterior… but don’t start from scratch each time as in
rejection!
Markov Chain

x x x x x x x x

Transition matrix
T = P(x(t+1)|x(t))

Variables x(t+1) independent of all previous variables given

immediate predecessor x(t) .
Systematic relationship between transition matrix and the
stationary (asymptotic) distribution.
Markov Chain Monte Carlo (MCMC)

x x x x x x x x

Markov chain
Transition matrix
T = P(x(t+1)|x(t))

• States of chain = joint settings of the variables of

interest (“possible worlds”).
• Transition matrix chosen to make some target
conditional distribution (Bayesian posterior) the
stationary distribution.
When Metropolis-Hastings?

• Suppose we can compute P(data|h) and P(h), but

not P(h|data):
P(data | h) P(h)
P(h | data ) =
å P(data | h¢) P(h¢)
h¢

• Or maybe we can only compute relative posteriors

(or likelihood ratios, prior odds)

P(hi | data) P(data | hi ) P(hi )

=
P(h j | data ) P(data | h j ) P(h j )
Metropolis-Hastings algorithm

• Transitions have two parts:

– proposal distribution: Q(h(t+1)| h(t))

– acceptance: take proposals with probability

P(h(t+1)|data) Q(h(t)| h(t +1))

A(h(t+1)| h(t)) = min{ 1, }
P(h |data) Q(h | h )
(t) (t+1) (t)

T (h(t +1) | h(t ) ) µ Q(h(t +1) | h(t ) ) A(h(t +1) | h(t ) )

( t +1)
(h ¹h )
(t )
Metropolis-Hastings MCMC

https://www.youtube.com/watch?v=4I6TaYo9j_Y
Why does MH work?

• A Markov chain with transition probability Tij

converges to the stationary distribution p i whenever
detailed balance holds:
Tij pj
=
T ji p i Satisfied by choosing:

Qij : proposal prob Qij Aij pj

= ìï p j Q ji üï
Aij : acceptance prob Q ji A ji p i Aij = miní1, ý
ïî p i Qij ïþ

ìï p i Qij üï
Aji = miní1, ý
(see Russell and Norvig) ïî p j Q ji ïþ
Burn in

Early dependence on initial state, but chains very similar after enough samples…
Mixing
s*: std. dev. of Gaussian proposal distribution
MH-drift versus Hamiltonian Monte Carlo
• Hamiltonian Monte Carlo makes the kernel sensitive to the
gradient of the posterior, giving more of a directed search
dynamic towards regions of greatest probability

(Duvenaud
Broderick)

https://www.youtube.com/watch?v=Vv3f0QNWvWQ
The magic of MCMC
• Since we only ever need to evaluate the relative
probabilities of two states, we can have very large state
spaces (much of which we rarely reach)
• In fact, our state spaces can be infinite
– common with nonparametric Bayesian models
• State spaces can be implicitly defined, with an infinite
number of states we’ve never seen or imagined …
– natural with probabilistic programs, and program induction
• But the guarantees it provides are asymptotic
– making algorithms that converge in practical amounts of time
is a significant challenge
MCMC and cognition

• MCMC provides one basis for “rational process models”:

models of how the mind and brain implement Bayesian
inference (Marr level 2, 3?) that can predict more detailed
aspects of behavior than traditional rational analyses at the
level of ideal computational theory (Marr level 1).
• MH may be a model for the dynamics of perception and
thinking.
• Sanborn, Griffiths, Shiffrin (2008; 2010) have shown how to
use MCMC algorithms as the basis for experiments with
people, mapping out mental representations.
• MH may be a metaphor for aspects of development.
Priors P(ttotal) based on empirically measured durations or magnitudes
for many real-world events in each class:

Median human judgments of the total duration or magnitude ttotal of

events in each class, given one random observation at a duration or
magnitude t, versus Bayesian predictions (median of P(ttotal|t)).
Individual judgments as samples from
the posterior predictive
(Vul et al., Cog Sci 2009; Cognitive Science 2014)
Proportion of judgments below predicted value

P(ttotal|tpast)

ttotal

Quantile of Bayesian posterior distribution

Individual judgments as samples from
the posterior predictive
(Vul et al., Cog Sci 2009; Cognitive Science 2014)

P(ttotal|tpast)

ttotal

Average over all

prediction tasks:
• movie run times
• movie grosses
• poem lengths
• life spans
• terms in congress
• cake baking times
Posterior sampling in concept learning

These are Feps:

Is this a Fep?

These are not Feps:

Posterior sampling in concept learning

These are Feps:

Is this a Fep?

These are not Feps:

Posterior sampling in concept learning
Rational rules model
(Goodman, Tenenbaum, Feldman, Griffiths, 2008)

Bayesian inference over disjunctions of conjunctions

of features – prior favors simpler hypotheses:
– “X is a Fep if X has round wings”
– “… has round wings and a striped body”
– “… has round wings or a striped body and pointy antenna”
Rational rules model
(Goodman, Tenenbaum, Feldman, Griffiths, 2008)

Training
set

Test
set
Rational rules model
(Goodman, Tenenbaum, Feldman, Griffiths, 2008)
Why sample?
• Cognition has to be extremely flexible.
– Posterior samples are a good target for general-
purpose probabilistic inference algorithms, maybe the
only good target.

• Cognition has to be very fast.

– Even just one or a few posterior samples are very
useful in the settings that matter most for everyday
cognition.
– This is very different from the statistician’s perspective
on sampling, and inference more generally.

Try playing with this on probmods…

Probmods script

// Explore numSamples and numData; grain of judgment (7, 5, binary)

var observedData = ['h', 't', 'h', 'h', 't']

// var observedData = ['h', 't', 'h', 'h', 't', 't', 'h', 't', 't', 'h’]

var maxScale = 7; // 100, 10, 7, 5, 3, 1

var numSamples = 1000; // 100, 10, 5, 3, 1

var weightPosterior = Infer({method: 'rejection', samples: numSamples}, function() {

var coinWeight = sample(Uniform({a: 0, b: 1}))
var coin = Bernoulli({p: coinWeight})
var obsFn = function(datum){observe(coin, datum == 'h')}
mapData({data: observedData}, obsFn)
return coinWeight
})

viz(weightPosterior)
print('Expected heads weight: ‘ + Math.round(maxScale*expectation(weightPosterior)))
“One and done”: Optimal decisions from very
few samples
(Vul, Goodman, Griffiths, Tenenbaum, 2014)

• How many samples to take?

– Trade off increase in decision utility with opportunity
cost of thinking more…
A: action to take
S: state of the world
D: data
U: utility function
P: Bayesian posterior
“One and done”: Optimal decisions from very
few samples
(Vul, Goodman, Griffiths, Tenenbaum, 2014)

• How many samples to take?

– Trade off increase in decision utility with opportunity
cost of thinking more…
A: action to take
S: state of the world
D: data
U: utility function
P: Bayesian posterior

How to choose the number of samples, k?

“One and done”: Optimal decisions from very
few samples
(Vul, Goodman, Griffiths, Tenenbaum, 2014)

• How many samples to take?

– Trade off increase in decision utility with opportunity
cost of thinking more…
A: action to take
S: state of the world
D: data
U: utility function
P: Bayesian posterior

How to choose the number of samples, k?

Assume two possible actions, each of which is correct in some states

of the world but not others. Utility depends only on whether action is
“correct” or “incorrect”. A simple policy is to draw k samples from the
posterior over world states, and choose the action that is best in more
than half of k samples.
“One and done”: Optimal decisions from very
few samples
(Vul, Goodman, Griffiths, Tenenbaum, 2014)
How many samples to take?
– How does choice of k determine expected reward per unit time, when
you have to make many small decisions over a lifetime (or an hour)?

Assume the cost of this policy is proportional to k, plus some fixed action cost.
Assume ”correct” action (Max Exp Util, under true P(S|D)) succeeds with
probability uniform in [0.5, 1].
MCMC and cognitive science

• MH may be a model for the dynamics of perception and

thinking, e.g., bistability (Gershman, Vul, Tenenbaum 2009)
But maybe sometimes even one posterior
sample is too costly?
We simulated the decay of bias in the Normal-Normal case that we are focussing on (Equation 1)
and costerror (a, x) = ka xk. All Markov chains were initialized with the prior mean. Our results
Explaining anchoring and adjustment
show that the mean of the sampling distribution converges geometrically as well (see Figure 1). Thus
the reduction in bias tends to be largest in the first few iterations and decreases quickly from there
on, suggesting a situation of diminishing returns for further iterations: an agent under time pressure
(Lieder, Griffiths, Goodman 2012)
may do well stopping after the initial reduction in bias.

Figure 1: Bias of the mean of the approximation Qt , i.e. |E[X̃t ] E[X|Y 2: where
= y]|
Figure Number Qt , as i? that maximizes the agent’s expected utility as a function of
Xtof⇠iterations
• Example: Half of you close your eyes, then the other half.
a function of the number of iterations t of our Metropolis-Hastings algorithm.
ratio between the cost pershow
this relationship for different posterior distributions whose means are posterior
The five lines
located 1distributions
iteration and the cost per unit error. The five lines correspond to the s
p , · · · , 5 pasaway
in Figure 1.
from the prior mean ( p is the standard deviation of the prior). As the plot shows, the bias decays
• Question 1
geometrically with the number of iterations in all five cases.

2.4 – Is the population of Cleveland bigger or smaller than 200,000?

Optimal Time-Bias Tradeoffs

– What
This subsection combines doreported
the result youinthink thesubsection
the previous population is?and computes
with time costs
the optimal bias-time tradeoffs depending on the ratio of time cost to error cost and on how large
the initial bias is. It suggests that intelligent agents might use these results to choose the number
of MCMC iterations according to their estimate of the initial bias. Formally, we define the optimal
number of iterations i? and resulting bias b? as
i? = arg max E [u(ai , x, t0 + i/v)|Y = y] , where ai ⇠ Qi (6)
i
b? = Bias [Qi? ; P ] (7)
using the variables defined above. If the upper bound in Equation 5 is tight, then the optimal number
We simulated the decay of bias in the Normal-Normal case that we are focussing on (Equation 1)
and costerror (a, x) = ka xk. All Markov chains were initialized with the prior mean. Our results
Explaining anchoring and adjustment
show that the mean of the sampling distribution converges geometrically as well (see Figure 1). Thus
the reduction in bias tends to be largest in the first few iterations and decreases quickly from there
on, suggesting a situation of diminishing returns for further iterations: an agent under time pressure
(Lieder, Griffiths, Goodman 2012)
may do well stopping after the initial reduction in bias.

Figure 1: Bias of the mean of the approximation Qt , i.e. |E[X̃t ] E[X|Y 2: where
= y]|
Figure Number Qt , as i? that maximizes the agent’s expected utility as a function of
Xtof⇠iterations
• Example: Half of you close your eyes, then the other half.
a function of the number of iterations t of our Metropolis-Hastings algorithm.
ratio between the cost pershow
this relationship for different posterior distributions whose means are posterior
The five lines
located 1distributions
iteration and the cost per unit error. The five lines correspond to the s
p , · · · , 5 pasaway
in Figure 1.
from the prior mean ( p is the standard deviation of the prior). As the plot shows, the bias decays
• Question 2
geometrically with the number of iterations in all five cases.

2.4 – Is the population of Cleveland bigger or smaller than 5,000,000?

Optimal Time-Bias Tradeoffs

2.4 – Is the population of Cleveland bigger or smaller than 200,000?

Optimal Time-Bias Tradeoffs

• Sanborn, Griffiths, Shiffrin (2008; 2010) have shown how to

use MCMC algorithms as the basis for experiments with
people, mapping out mental representations.
Measuring people’s priors
(Lewandowsky, Griffiths & Kalish, Cognitive Science 2009)
Iterated
learning:
(cf. human
MCMC)
MCMC and cognitive science
• The Metropolis-Hastings algorithm seems like a good
metaphor for aspects of how children learn and reason
– an algorithm for what doesn’t seem very
“algorithmic”. (cf. “learning rule”, “learning algorithm”)
– Small, random, dumb, local steps
– Takes a long time
– Can get stuck in plateaus or stages
– “Two steps forward, one step back”
– Over time, intuitive theories get consistently better (more
veridical, more powerful, broader scope).
– Everyone reaches basically the same state (though some
take longer than others).
(Ullman & Tenenbaum, Annual Review of Dev Psych 2020)

Case-Based Reasoning Book PDF
100% (1)
Case-Based Reasoning Book PDF
183 pages
Unit 5
No ratings yet
Unit 5
39 pages
Unit-5 ML
100% (1)
Unit-5 ML
14 pages
Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
No ratings yet
Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
485 pages
Reliability of Safety-Critical Systems - Spurous Activation
No ratings yet
Reliability of Safety-Critical Systems - Spurous Activation
33 pages
Ks Trivedi
0% (4)
Ks Trivedi
5 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
AI Greentext Generator
No ratings yet
AI Greentext Generator
4 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
Bayesian Modeling
100% (1)
Bayesian Modeling
305 pages
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
No ratings yet
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
487 pages
P & S Important Questions
100% (1)
P & S Important Questions
8 pages
Uncertain Domain in AI
No ratings yet
Uncertain Domain in AI
20 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Tutorial 08 Part 1
No ratings yet
Tutorial 08 Part 1
95 pages
Improvement of Productivity Using Tromp Curve Meas
No ratings yet
Improvement of Productivity Using Tromp Curve Meas
11 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
MCMC Sampling - Class 2025
No ratings yet
MCMC Sampling - Class 2025
101 pages
AM207 14 Introduction UQ
No ratings yet
AM207 14 Introduction UQ
63 pages
MCMC Brief
100% (1)
MCMC Brief
69 pages
ML - Unit-V-1
No ratings yet
ML - Unit-V-1
42 pages
Chapter 10
No ratings yet
Chapter 10
100 pages
24f 09 Hidden Markov Models
No ratings yet
24f 09 Hidden Markov Models
79 pages
Inherently Interpretable Models 1 of 2
No ratings yet
Inherently Interpretable Models 1 of 2
64 pages
ML Unit 5
No ratings yet
ML Unit 5
65 pages
MCMC Notes
No ratings yet
MCMC Notes
77 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
51 pages
MCMC
No ratings yet
MCMC
70 pages
16 - The Bayesian Mind
No ratings yet
16 - The Bayesian Mind
33 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
09 - Hidden Markov Model
No ratings yet
09 - Hidden Markov Model
78 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
No ratings yet
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
66 pages
ME 4 Sem
No ratings yet
ME 4 Sem
107 pages
CSE291D 10a
No ratings yet
CSE291D 10a
55 pages
Simon Shaw Bayes Theory
No ratings yet
Simon Shaw Bayes Theory
72 pages
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
No ratings yet
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
49 pages
Positive Operator Semigroups: András Bátkai 0dumhwd - Udpdu) Lmdyè Abdelaziz Rhandi
No ratings yet
Positive Operator Semigroups: András Bátkai 0dumhwd - Udpdu) Lmdyè Abdelaziz Rhandi
366 pages
The CBM Optimizer: Exakt
100% (1)
The CBM Optimizer: Exakt
46 pages
Lecture 12
No ratings yet
Lecture 12
42 pages
Cra I U Rosenthal Ann Rev
No ratings yet
Cra I U Rosenthal Ann Rev
40 pages
Amanuel Ai
No ratings yet
Amanuel Ai
28 pages
Classical Detection Theory
No ratings yet
Classical Detection Theory
23 pages
PLSC504 Bayes 2024 Slides
No ratings yet
PLSC504 Bayes 2024 Slides
30 pages
POStagging
No ratings yet
POStagging
72 pages
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
100% (1)
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
31 pages
G H M C N N: Eneralizing Amiltonian Onte Arlo With Eural Etworks
No ratings yet
G H M C N N: Eneralizing Amiltonian Onte Arlo With Eural Etworks
15 pages
My Notes Unit 5
No ratings yet
My Notes Unit 5
12 pages
ML Unit-5 PDF
No ratings yet
ML Unit-5 PDF
20 pages
Lec 26
No ratings yet
Lec 26
21 pages
CSE 473: Ar+ficial Intelligence: Bayes' Nets
No ratings yet
CSE 473: Ar+ficial Intelligence: Bayes' Nets
26 pages
Mcmc-A Comparative Study
No ratings yet
Mcmc-A Comparative Study
29 pages
Unit3pdf 2025 01 14 10 38 08
No ratings yet
Unit3pdf 2025 01 14 10 38 08
4 pages
M.E. Ece
No ratings yet
M.E. Ece
107 pages
SI Nonlin
No ratings yet
SI Nonlin
14 pages
PDF Sampling: Markov Chain Monte Carlo: X N I I
No ratings yet
PDF Sampling: Markov Chain Monte Carlo: X N I I
13 pages
ML Unit 05
No ratings yet
ML Unit 05
14 pages
ML Unit5 QB Solutions
No ratings yet
ML Unit5 QB Solutions
13 pages
Monte Carlo
No ratings yet
Monte Carlo
59 pages
Backgammon Racing
No ratings yet
Backgammon Racing
23 pages
Lec30 GibbsSampling
No ratings yet
Lec30 GibbsSampling
55 pages
A Smart-Dumb/Dumb-Smart Algorithm For Efficient Split-Merge MCMC
No ratings yet
A Smart-Dumb/Dumb-Smart Algorithm For Efficient Split-Merge MCMC
10 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
13 pages
DPNN Tpami2017
No ratings yet
DPNN Tpami2017
9 pages
Bayesian Modelling Tuts-12-15
No ratings yet
Bayesian Modelling Tuts-12-15
4 pages
Learning Probability Distributions Generated by Finite-State Machines
No ratings yet
Learning Probability Distributions Generated by Finite-State Machines
53 pages
Uncertain Domain in AI
No ratings yet
Uncertain Domain in AI
3 pages
Bayesian Inference
No ratings yet
Bayesian Inference
28 pages
03 Markov Chain Monte Carlo
No ratings yet
03 Markov Chain Monte Carlo
4 pages
Neuroscience Course List
No ratings yet
Neuroscience Course List
10 pages
Module Handbook (Uni of Bonn)
No ratings yet
Module Handbook (Uni of Bonn)
126 pages
Inference On Relational Models Using Markov Chain Monte Carlo
No ratings yet
Inference On Relational Models Using Markov Chain Monte Carlo
61 pages
Dot 23194 DS1
No ratings yet
Dot 23194 DS1
90 pages
Demand & Supply Forecasting Techniques PDF
No ratings yet
Demand & Supply Forecasting Techniques PDF
5 pages
Reliability Availability Eng Trivedi Bobbio
No ratings yet
Reliability Availability Eng Trivedi Bobbio
1 page
Markov Chains PDF
No ratings yet
Markov Chains PDF
66 pages
Chan Jeliazkov 2009
No ratings yet
Chan Jeliazkov 2009
20 pages
RP Spiral
No ratings yet
RP Spiral
48 pages
Probabilistic Approach Modeling Pavement Performance Using IRI Data - PORRAS-ALVARADO, J. D. (2014)
No ratings yet
Probabilistic Approach Modeling Pavement Performance Using IRI Data - PORRAS-ALVARADO, J. D. (2014)
15 pages
Kunci Jawaban PMS Asis 9 (Markov)
No ratings yet
Kunci Jawaban PMS Asis 9 (Markov)
13 pages
Specialisation Stochastics AM 2023 en
No ratings yet
Specialisation Stochastics AM 2023 en
20 pages
Activity 15
No ratings yet
Activity 15
6 pages
Lab Manual: Smt. Indira Gandhicollege of Engineering Ghansoli
No ratings yet
Lab Manual: Smt. Indira Gandhicollege of Engineering Ghansoli
37 pages
CS2 B Chapters 4 and 5 - Markov Jump Processes - Questions
No ratings yet
CS2 B Chapters 4 and 5 - Markov Jump Processes - Questions
3 pages
Backtest Overfitting in The Machine Learning Era - A Comparison of Out-of-Sample Testing Methods in A Synthetic Controlled Environment
No ratings yet
Backtest Overfitting in The Machine Learning Era - A Comparison of Out-of-Sample Testing Methods in A Synthetic Controlled Environment
26 pages
Apma 1200
No ratings yet
Apma 1200
2 pages
System Reliability Assessment and Optimization: Methods and Applications 1st Edition Yan-Fu Li & Enrico Zioinstant Download
100% (2)
System Reliability Assessment and Optimization: Methods and Applications 1st Edition Yan-Fu Li & Enrico Zioinstant Download
62 pages
Foundations of Elementary Analysis
From Everand
Foundations of Elementary Analysis
Roshan Trivedi
No ratings yet

Lecture 3 9.66

Uploaded by

Lecture 3 9.66

Uploaded by

Class announcements

• Pset 2 due next Monday (Oct 24)

• Recitation this week

• Patterns of inference in causal models

• Introduction to MCMC and sampling-based

College admission A on exam

A on exam 1 A on exam 2 A on exam 3

Easy exam 1 Good student 1 Easy exam 2 Easy exam 3

A for student 1 A for student 1 A for student 1

A for student 2 A for student 2 A for student 2

Easy exam 1 Good student 1 Easy exam 2 Easy exam 3

A for student 1 A for student 1 A for student 1

A for student 2 A for student 2 A for student 2

• Abstract principles and systematic biases in attribution?

• Towards a probabilistic language of thought

• Patterns of inference in causal models

• Introduction to MCMC and sampling-based

Variables x(t+1) independent of all previous variables given

• States of chain = joint settings of the variables of

• Suppose we can compute P(data|h) and P(h), but

• Or maybe we can only compute relative posteriors

P(hi | data) P(data | hi ) P(hi )

• Transitions have two parts:

– acceptance: take proposals with probability

P(h(t+1)|data) Q(h(t)| h(t +1))

T (h(t +1) | h(t ) ) µ Q(h(t +1) | h(t ) ) A(h(t +1) | h(t ) )

• A Markov chain with transition probability Tij

Qij : proposal prob Qij Aij pj

• MCMC provides one basis for “rational process models”:

Median human judgments of the total duration or magnitude ttotal of

Quantile of Bayesian posterior distribution

Average over all

These are Feps:

These are not Feps:

These are Feps:

These are not Feps:

Bayesian inference over disjunctions of conjunctions

• Cognition has to be very fast.

Try playing with this on probmods…

// Explore numSamples and numData; grain of judgment (7, 5, binary)

var observedData = ['h', 't', 'h', 'h', 't']

var maxScale = 7; // 100, 10, 7, 5, 3, 1

var weightPosterior = Infer({method: 'rejection', samples: numSamples}, function() {

• How many samples to take?

• How many samples to take?

How to choose the number of samples, k?

• How many samples to take?

How to choose the number of samples, k?

Assume two possible actions, each of which is correct in some states

• MH may be a model for the dynamics of perception and

2.4 – Is the population of Cleveland bigger or smaller than 200,000?

2.4 – Is the population of Cleveland bigger or smaller than 5,000,000?

2.4 – Is the population of Cleveland bigger or smaller than 200,000?

• Sanborn, Griffiths, Shiffrin (2008; 2010) have shown how to

You might also like