Probabilistic Programming Introduction
Probabilistic Programming Introduction
Probabilistic Programming Introduction
Thanks to
Daniel M Roy (Cambridge)
1 / 11
2 / 11
C AN WE DO BETTER ?
Example: Graphical Models
Application Papers
1. Write down a graphical model
2. Perform inference using general-purpose software
3. Apply to some new problem
Inference papers
1. Identify common structures in graphical models (e.g. chains)
2. Develop efficient inference method
3. Implement in a general-purpose software package
E XPRESSIVITY
Probabilistic Programs
I
You can specify any computable prior by simply writing down a PP that
generates samples.
4 / 11
5 / 11
0.03
0.8
0.025
p(x)
p(flip) 0.6
0.02
0.015
0.4
0.01
0.2
0.005
0
0
0
10
flip
5 / 11
p(flip) 0.6
0.4
0.2
0
0
flip
5 / 11
N the set of variables that were not interested in, (so well integrate them out).
Probabilistic Programming
I
6 / 11
p(flip) 0.6
0.4
0.2
0
0
flip
7 / 11
0.6
0.4
0.2
0
0
flip
7 / 11
Example
1
2
3
4
5
6
Example
1
2
3
4
5
6
True
Example
1
2
3
4
5
6
True
2.7
Example
1
2
3
4
5
6
Example
1
2
3
4
5
6
True
Example
1
2
3
4
5
6
True
3.2
Example
1
2
3
4
5
6
Example
1
2
3
4
5
6
True
Example
1
2
3
4
5
6
True
2.1
Example
1
2
3
4
5
6
Example
1
2
3
4
5
6
False
Example
1
2
3
4
5
6
False
-1.3
Example
1
2
3
4
5
6
Example
1
2
3
4
5
6
False
Example
1
2
3
4
5
6
False
2.3
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
9 / 11
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
(False,)
9 / 11
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
(False,)
(False, -0.9)
9 / 11
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
(False,)
(False, -0.9)
9 / 11
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
10 / 11
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
(True, 2.9)
10 / 11
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
(True, 2.9)
Nothing to do
10 / 11
C AN WE BE MORE EFFICIENT ?
Metropolis-Hastings
1. Start with a trace
I
(True, 2.3)
(True, 2.9)
Nothing to do
Accept, maybe
10 / 11
K
Y
k=1
11 / 11
e.g. xk
3. Run the program to determine all subsequent choices (xl0 : l > k), reusing
current choices where possible
4. Propose moving from the state (x1 , . . . , xK ) to (x1 , . . . , xk1 , xk0 , . . . , xK0 0 )
{z
} | {z }
|
old choices
new choices
13 / 11
N ONPARAMETRIC MODELS
If we can sample from the prior of a nonparametric model using finite resources
with probability 1, then we can perform inference automatically using the
techniques described thus far
14 / 11
Generative model
(k )k=1...K
iid
N (0, 1)
(k )k=1...K
Dir(/K)
:=
(n )n=1...N
iid
(xi )n=1...N
N (n , 1)
K
X
k k
k=1
mu = randn(K,1)
pi = dirichlet(K, alpha/K)
for n = 1:N
theta = mu(mnrnd(1,pi))
x(n) = theta + randn
end
15 / 11
(k )k=1...K N (0, 1)
(k )k=1...K Dir(/K)
:=
K
X
k=1
Avoiding infinity
I
16 / 11
sticks = [];
atoms = [];
f o r i = 1:n
p = rand;
w h i l e p > sum(sticks)
sticks(end+1) = (1-sum(sticks)) * betarnd(1, alpha);
atoms(end+1) \ = randn;
end
theta(i) = atoms( f i n d (cumsum(sticks)>=p, 1, first));
end
x = theta + randn(n, 1);
17 / 11
18 / 11
Now that we have separated inference and model design, can use any inference
algorithm.
Belief Propagation
Pseudo-likelihood
Mean-field Variational
MCMC
19 / 11
model{
for( i in 1 : N ) {
S[i] ~ dcat(pi[])
mu[i] <- theta[S[i]]
x[i] ~ dpois(mu[i])
for (j in 1 : C) {
SC[i, j] <- equals(j, S[i])}}
# Precision Parameter
alpha~ dgamma(0.1,0.1)
# Constructive DPP
p[1] <- r[1]
for (j in 2 : C) {
p[j] <- r[j] * (1 - r[j - 1]) * p[j -1 ] / r[j - 1]}
p.sum <- sum(p[])
for (j in 1:C){
theta[j] ~ dgamma(A, B)
r[j] ~ dbeta(1, alpha)
# scaling to ensure sum to 1
pi[j] <- p[j] / p.sum }
# hierarchical prior on theta[i] or preset parameters
A ~ dexp(0.1)
B ~dgamma(0.1, 0.1)
# total clusters
K <- sum(cl[])
for (j in 1 : C) {
sumSC[j] <- sum(SC[ , j])
cl[j] <- step(sumSC[j] -1)}}
Data:
list(x=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,
2,3, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6,7,7,7,8,9,9,10, 10,
20 / 11
Bher, MIT-Church
(Goodman, Mansinghka, Roy, Bonawitz and Tenenbaum, 2008)
I
Stochastic Matlab
I
21 / 11
EP in graphical models:
Now works in functional language F#:
23 / 11
24 / 11
25 / 11
AUTOMATED M ODELING
26 / 11
T HEORETICAL D IRECTIONS
Inference in stochastic programs opens up a new branch of computer science, new
generalizations of computability:
I
Main takeaways:
I
27 / 11
C OMPILER D EVELOPMENT
I
1970s: Novice programmers use high-level languages and let compiler work
out details, experts still write assembly.
I
2000s: On most problems, even experts cant write faster assembly than
optimizing compilers.
I
I
28 / 11
2015: Novice grad students use automatic inference engines and let compiler
work out details, experts still write their own inference.
I
2020: On most problems, even experts cant write faster inference than mature
automatic inference engines.
I
I
I
29 / 11
30 / 11
31 / 11
32 / 11
32 / 11