0% found this document useful (0 votes)

19 views18 pages

Walker 2016 SliceSampling

Uploaded by

DanielSaavedra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views18 pages

Walker 2016 SliceSampling

Uploaded by

DanielSaavedra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

INTERNATIONAL CENTRE FOR ECONOMIC RESEARCH

WORKING PAPER SERIES

Stephen G. Walker

SAMPLING THE DIRICHLET MIXTURE MODEL WITH SLICES

Working Paper No. 16/2006

SAMPLING THE DIRICHLET MIXTURE
MODEL WITH SLICES

Stephen G. Walker

Institute of Mathematics, Statistics and Actuarial Science

University of Kent Canterbury, Kent, CT2 7NZ, U.K.

May 2006

Abstract: We provide a new approach to the sampling of the

well known mixture of Dirichlet process model. Recent attention
has focused on retention of the random distribution function in
the model, but sampling algorithms have then suffered from the
countably infinite representation these distributions have. The
key to the algorithm detailed in this paper, which also keeps
the random distribution functions, is the introduction of a latent
variable which allows a finite number, which is known, of objects
to be sampled within each iteration of a Gibbs sampler.

Keywords: Bayesian Nonparametrics, Density estimation, Dirich-

let process, Gibbs sampler, Slice sampling.

Acknowledgements: The author is an EPSRC Advanced Re-

search Fellow and the paper was written during a visit to the
University of Turin funded by ICER.

1
1. Introduction. The aim of this paper is to introduce a new method for
sampling the well known and widely used mixture of Dirichlet process (MDP)
model. There have been a number of recent contributions to the literature on
this problem, notably Ishwaran & Zarepour (2000) and Papaspiliopoulos &
Roberts (2005). These papers have been concerned with sampling the MDP
model while retaining the random distribution functions.
The issue and the causes of the complexities is the countably infiniteness
of the discrete masses from the random distribution functions chosen from the
Dirichlet process prior. Ishwaran & Zarepour (2000) circumvent this with an
approximate method based on a truncation of the distributions. Motivated
by the work of Ishwaran & Zarepour (2000), Papaspiliopoulos & Roberts
(2005) proposed an exact algorithm based on the notion of retrospective
sampling. However, the algorithm itself becomes non-trivial when applied
to the MDP mocdel, and involves setting up a detailed balance criterion
with connecting proposal moves (Green, 1995). On the other hand, we find a
simple trick, based on the slice sampling schemes (Damien et al., 1999), which
deals with the infiniteness. The introduction of latent variables makes finite
the part of the random distribution function required to iterate through a
Gibbs sampler. Moreover, all the conditional distributions are easy to sample
and no accept/reject methods are needed.
The first sampler for the MDP model, based on a Gibbs sampler, was
given in the PhD Thesis of Escobar (1988, 1994). Alternative approaches
have been proposed by MacEachern (1994) and co-authors; for example,
MacEachern & Müller (1998). A recent survey is given in MacEachern (1998),
and other papers in the book of Dey et al. (1998), and by Müller & Qun-
intana (2004). Richardson & Green (1997) provide a comparison with more
traditional mixture models and Neal (2000) also discusses ideas for sampling

2
the MDP model.
Recently, Ishwaran & James (2001) developed a Gibbs sampling scheme
involving more general stick-breaking priors, which is a direct extension of
the Escobar (1998) approach. Escobar’s Gibbs sampler makes use of the
Pólya-urn sampling scheme (Blackwell & MacQueen, 1973) and the idea of
using the Pólya-urn scheme is connected with the procedure of integrating
out of the model the random distribution function from the Dirichlet process.
Recent attempts have avoided this step and retained the random distribu-
tion functions in the algorithms, notably Ishwaran & Zarepour (2001) and
Papaspiliopoulos & Roberts (2005).
In Section 2 we describe the Dirichlet process mixture model and describe
the latent variables of use to the sampling strategy. In Section 3 we will write
down the algorithm for the Gibbs sampler and Section 4 contains a couple
of illustrative examples. Finally, Section 5 concludes with a brief discussion.

2. The Dirichlet Process Model. Let D(c, P0 ) denote a Dirichlet process

prior (Ferguson, 1973) with scale parameter c > 0 and prior probability
measure P0 . So, for example, E(P ) = P0 and
P0 (A){1 − P0 (A)}
Var(P (A)) =
c+1
for all appropriate sets A. The posterior distribution of P given n indepen-
dent and identically distributed samples from P is also a Dirichlet process
with new parameters c + n and
cP0 + nPn
,
c+n
where Pn is the empirical distribution function. However, we will not be
needing this particular result.
It is well known that a random probability measure P can be chosen from
D(c, P0 ) via the following sampling scheme, attributable to Sethuraman &

3
Tiwari (1982), see also Sethuraman (1994), and involving the so-called stick-
breaking prior (see, for example, Freedman, 1963; Connor & Mosimann,
1969). Take v1 , v2 , . . . to be independent and identically distributed beta(1, c)
variables and take θ1 , θ2 , . . . to be independent and identically distributed
from P0 , which we will assume has density g0 with respect to the Lebesgue
measure. Then define
∞
X
P = wj δθj ,
j=1

where w1 = v1 and for j > 1,

Y
w j = vj (1 − vl ).
l<j

Here δθ denotes the measure with a point mass of 1 at θ. The weights are
obtained via what is known as a stick-breaking procedure. Ishwaran & James
(2001) consider a more general model with the vj ∼ beta(aj , bj ) and show
that the sum of weights is 1 almost surely when
∞
X
log(1 + aj /bj ) = +∞.
j=1

While we work with the v’s which lead to the Dirichlet process, our algorithm
for sampling the MDP model can be extended to cover other stick-breaking
prior distributions in a simple way. This will be elaborated on later in the
paper.
The MDP model is based on the idea of constructing absolutely contin-
uous random distribution functions and was first considered in Lo (1984).
The random distribution function chosen from a Dirichlet process is almost
surely discrete (Blackwell, 1973). Consequently, consider the random density
function
Z
fP (y) = N(y|θ) dP (θ).

4
Here N(y|θ) denotes a conditional density function, which will typically be a
normal distribution and the parameters of which are represented by θ. So in
the normal case θ = (µ, σ 2 ). Given the form for P , we can write
X
fw,θ (y) = wj N(y|θj ).
j

The prior distributions for the w and θ have been given earlier.
Our attempt to estimate the model, via Gibbs sampling ideas, is to in-
troduce a latent variable u such that the joint density with of (y, u) given
(w, θ) is given by
X
fw,θ (y, u) = 1(u < wj ) N(y|θj ).
j

Clearly integrating over u with respect to the Lebesgue measure returns us

the desired density fw,θ (y). Hence, the joint density exists and so there will
also exist a marginal density for u. Alternatively we can write
∞
X
fw,θ (y, u) = wj U(u|0, wj ) N (y|θj )
j=1

and so with probability wj , y and u are independent and are, respectively,

normal and uniform distributed. Hence, the marginal density for u is given
by
∞
X ∞
X
fw (u) = wj U(u|0, wj ) = 1(u < wj ).
j=1 j=1

If we let
Aw (u) = {j : wj > u}

then we can equally write

X
fw,θ (y, u) = N(y|θj ).
j∈Aw (u)

5
Note, it is quite clear that Aw (u) is a finite set for all u > 0. The conditional
density of y given u is given by
1 X
fw,θ (y|u) = N(y|θj ),
fw (u) j∈Aw (u)
P
where fw (u) = j 1(u < wj ) is the marginal density for u, being defined on
(0, w∗ ) where w∗ is the largest wj .
The usefulness of the latent variable u will become clear later on. A brief
comment here is that the move from an infinite sum to a finite sum, given u,
is going to make a lot of difference when sampling is involved.
So, given u, we have a finite mixture model with equal weights, all equal
to 1/fw (u). We can now introduce a further indicator latent variable which
will identify the component of the mixture from which y is to be taken.
Therefore, consider the joint density

fw,θ (y, δ = k, u) = N(y|θk )1(k ∈ A(u)).

The complete data likelihood based on a sample of size n is easily seen to be

n
Y
lw,θ ({yi , ui , δi = ki }ni=1 ) = N(yi |θki ) 1(ui < wki ).
i=1

As has been mentioned, we already know the prior distributions for the w
and θ. Though as it happens, we will use the v’s rather than the w’s when
it comes to sampling.

3. The Sampling Algorithm. In order to implement a Gibbs sampler we

require the set of full conditional density functions. For the infinite collection
of variables v and θ, it would seem that we would need to sample the entire
set. But this is not required. We only need to sample a finite set of them at
each stage in order to progress to the next iteration. All un-sampled vj ’s and
θj ’s will be independent samples from the priors; that is beta(1, c) and g0 ,

6
respectively. Let us proceed to consider the full conditional densities; listed
A to E.

A. We will start with the ui ’s. These are easy to find and are the uniform
distributions on the interval
(0, wki ).

B. Next we have θj , and this is easily seen to be the density function given
up to a constant of proportionality by
Y
f (θj | · · ·) ∝ g0 (θj ) N(yi |θj ).
ki =j

If there are no ki equal to j then f (θj | · · ·) = g0 (θj ).

C. Slightly harder, but quite do-able, is the sampling of the vj ’s. For the
joint full conditional density we have
n
Y
f (v| · · ·) ∝ π(v) 1(wki > ui ),
i=1

where π(v) denotes the collection of independent beta variables, and we have
already given the relation between the wj ’s and the vj ’s. Hence,
 
n
Y Y
f (v| · · ·) ∝ π(v) 1 vki (1 − vl ) > ui  .
i=1 l<ki

It is quite evident from this that only the vj ’s for j ≤ k ∗ , where k ∗ is the
maximum of {k1 , . . . , kn }, will be affected; that is, for j > k ∗ , we have
f (vj | · · ·) = beta(1, c). For j ≤ k ∗ we have

f (vj |v−j , · · ·) ∝ beta(vj |1, c)1 (αj < vj < βj ) ,

where ( )
ui
αj = max Q .
ki =j l<j (1 − vl )

7
and ( )
ui
βj = 1 − max Q .
ki >j v ki l<ki ,l6=j (1 − vl )
Then the distribution function, on αj < vj < βj , is given by

(1 − αj )c − (1 − vj )c
F (vj ) =
(1 − αj )c − (1 − βj )c

and so a sample can be taken via the inverse cdf technique. Clearly, it is
now evident that this approach covers more general stick-breaking models;
it is no more difficult to sample a truncated beta variable when we have
vj ∼ beta(aj , bj ) as the priors.

D. We now discuss the sampling of the indicator variables. We clearly have

pr(δi = k| · · ·) ∝ 1(k ∈ Aw (ui )) N(yi |θk ).

Clearly Aw (ui ) is not empty; at least ki ∈ Aw (ui ).

Before providing details on how to sample this, we mention that without
the latent variables ui , the possible choices of δi would be infinite and prob-
lems then arise with the normalising constant. Papaspiliopoulos & Roberts
(2005) attempted to circumvent the problem via retrospective sampling and
the use of a detailed-balance criterion, which is non-trivial. Our approach
is quite easy to implement. The choice of δi is from a finite set, which is
{k : wk > ui }. So we sample as many of the wk ’s until we are sure that we
have all the wk > ui . How do we know this? We are sure there can be no
further k > k i for which wk > ui when we have k i such that
ki
X
w j > 1 − ui .
j=1

So, to cover all the i’s, we find the smallest k ∗ such that
k∗
X
w j > 1 − u∗ ,
j=1

8
where u∗ = min{u1 , . . . , un }. Hence, we now know how many of the wk ’s we
need to sample in order for the chain to proceed; it is {w1 , . . . , wk∗ }. It is that
k ∗ will be necessary to find to implement the algorithm. One needs to know
how many of the wj are larger than u and it is only at k ∗ that one knows for
sure that all have been found. Hence, k ∗ is not a loose approximation; it is
an exact piece of information.
For the prior model it is that,

k ∗ ∼ 1 + Poisson(−c log u∗ ).

See Muliere & Tardella (1998).

E. We can incorporate a prior on c, say π(c). We will sample f (c, w, θ|y, u, δ)

as a block, and will sample this in two stages; first by sampling from f (c|y, u, δ)
and then f (w, θ|c, y, u, δ). We have already described how to sample from
the latter of these. For the former, it is equivalent to the full conditional
density that would arise from the marginal model, that is the one in which
the random distribution functions are removed from the model. Therefore,
as is well known, it is only the δ and the sample size that provides informa-
tion about c. To elaborate on this, the conditional distribution of c depends
only on the number of clusters; that is, the number of distinct ki ’s, call this
d, and that
f (c|d, n) ∝ cd Γ(c) π(c)/Γ(c + n),

where Γ(·) denotes the usual gamma function. A nice way to sample from
this is given in Escobar & West (1995) when π(c) is a gamma distribution.

Hence, all the conditional densities are easy to sample and the Markov chain
we have constructed is automatic. It requires no tuning nor retrospective
steps.

9
For density estimation we would like to sample from the predictive dis-
tribution of
f (yn+1 |y1 , . . . , yn ).

At each iteration we have (wj , θj ) and we sample a θj using the weights. The
idea is to sample a uniform random variable r from the unit interval and
to take that θj for which wj−1 < r < wj , with w0 = 0. If more weights
are required than currently exist then it is straightforward to sample more
as we know the additional vj ’s for j > k ∗ are independent and identically
distributed from beta(1, c) and the additional θj ’s are independent and iden-
tically distributed from g0 . Having taken θj , we draw yn+1 from N(·|θj ).

4. Illustration. Here we present a normal example in which θ = (µ, σ 2 ) and

we will take λ = σ −2 . The prior for the µj ’s will be independent N(0, 1/s) and
the prior for the λj ’s will be independent Ga(², ²). To complement Section 3
we now provide the conditional distributions for µj and λj . We have
Ã !
ξj λj 1
f (µj | · · ·) = N , ,
mj λj + s mj λj + s

where
X
ξj = yi
ki =j

and
X
mj = 1.
ki =j

We also have
f (λj | · · ·) = Ga(² + mj /2, ² + dj /2),

where
X
dj = (yi − θj )2 .
ki =j

10
In the simulated data set example that follows, the code was written using
scilab, which is freely downloadable from the internet.
We sampled 50 random variables independently from the mixture of nor-
mal distributions given by

1 1 1
f (y) = N(y| − 4, 1) + N(y|0, 1) + N(y|8, 1).
3 3 3

Choosing non-informative specifications, we took ² = 0.5, s = 0.1 and the

gamma prior for c to be Ga(0.1, 0.1), the Gibbs sampler was run for 20,000
iterations and at each iteration from 10,000 onwards a predictive sample yn+1
was taken. A histogram of the 50 data points with the density estimator
based on the 10,000 samples of yn+1 is provided in Figure 1. The density
estimator was obtained using the R density routine with bandwidth set to
0.3.
Figure 2 presents the running average for the number of clusters sampled
at each iteration. So it is clear that 10,000 samples is good enough for the
chain to reach stationarity and hence the samples from 10,000 onwards can
be taken as coming from the predictive distribution.

5. Discussion. We have provided a simple and fast way to sample the MDP
model. The key is the introduction of the latent variables which truncate the
weights of the random Dirichlet distributions. It is a highly simple piece of
code to write and is direct in the sense that no accept/reject sampling nor
retrospective sampling is required. It is also remarkably quick to run. It
improves on current approaches in the following way: we know exactly how
many of the wj ’s and θj ’s we need to sample at each iteration - it is k ∗ . This
fundamental result eludes the alternative approaches.
Retaining the random distribution function is useful as it removes the
dependence between the θki ’s which exist in the Pólya-urn model. However,

11
retaining the random distributions leads to problems with the countably
infinite representation. In this paper we deal with it by introducing a latent
variable which makes the representation finite for the purposes of proceeding
with the sampling and allowing sampling from the predictive distribution.
The sampling of the latent variable given the other variables is a uniform
distribution.
In the non-conjugate case, that is when N (y|θ) and g0 (θ) form a non-
conjugate pair and perhaps difficult to sample, then a possible useful solution
is again provided by the latent variable ideas presented in Damien et al.
(1999, Sections 4 & 5).

References

Blackwell, D. (1973). The discreteness of Ferguson selections. Annals of

Statistics 1, 356–358.

Blackwell, D. & MacQueen, J.B. (1973). Ferguson distributions via Pólya-

urn schemes. Annals of Statistics 1, 353–355.

Connor, R.J. & Mosimann, J.E. (1969). Concepts of independence for pro-

portions with a generalization of the Dirichlet distribution. Journal of

the American Statistical Association 64, 194–206.

Damien, P., Wakefield, J.C. & Walker, S.G. (1999). Gibbs sampling for

Bayesian non-conjugate and hierarchical models by using auxiliary vari-

ables. Journal of the Royal Statistical Society, Series B 61, 331–344.

Dey, D., Sinha, D. & Müller, P. (1998). Practical Nonparametric and Semi-

parametric Bayesian Statistics. Lecture Notes in Statistics. Springer,

New York.

12
Escobar, M.D. (1988). Estimating the means of several normal populations

by nonparametric estimation of the distribution of the means. Unpub-

lished Ph.D. dissertation, Department of Statistics, Yale University.

Escobar, M.D. (1994). Estimating normal means with a Dirichlet process

prior. Journal of the American Statistical Association 89, 268–277.

Escobar, M.D. & West, M. (1995). Bayesian density estimation and infer-

ence using mixtures. Journal of the American Statistical Association

90, 577–588.

Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric prob-

lems. Annals of Statistics 1, 209–230.

Freedman, D.A. (1963). On the asymptotic behaviour of Bayes estimates in

the discrete case I. Annals of Mathematical Statistics 34, 1386–1403.

Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computa-

tion and Bayesian model determination. Biometrika 82, 711–732.

Ishwaran, H. & Zarepour, M. (2000). Markov chain Monte Carlo in approx-

imate Dirichlet and beta two parameter process hierarchical models.

Biometrika 87, 371–390.

Ishwaran, H. & James, L. (2001). Gibbs sampling methods for stick-breaking

priors. Journal of the American Statistical Association 96, 161–173.

Lo, A.Y. (1984). On a class of Bayesian nonparametric estimates I. Density

estimates. Annals of Statistics 12, 351–357.

MacEachern, S.N. (1994). Estimating normal means with a conjugate style

Dirichlet process prior. Communications in Statistics: Simulation and

Computation 23, 727–741.

13
MacEachern, S.N. (1998). Computational methods for Mixture of Dirich-

let Process Models. In Practical Nonparametric and Semiparametric

Bayesian Statistics (D.Dey, P.Müller, D.Sinha eds.) 23–43. Springer,
New York.

MacEachern, S.N. and Müller, P. (1998). Estimating mixtures of Dirichlet

process models. Journal of Computational and Graphical Statistics 7,

223–238.

Muliere, P. & Tardella, L. (1998). Approximating distributions of random

functionals of Ferguson-Dirichlet priors. Canadian Journal of Statistics

26, 283–297.

Müller, P. & Quintana, F.A. (2004). Nonparametric Bayesian Data Anal-

ysis.Statistical Science 19, 95–110.

Neal, R.M. (2000). Markov chain sampling methods for Dirichlet process

mixture models. Journal of Computational and Graphical Statistics 9,

249–265.

Richardson, S. & Green, P.J. (1997). On Bayesian analysis of mixtures with

an unknown number of components. Journal of the Royal Statistical

Society, Series B 59, 731–792.

Papaspiliopoulos, O. & Roberts, G.O. (2005). Retrospective Markov chain

Monte Carlo methods for Dirichlet process hierarchical models. Sub-

mitted.

Sethuraman, J. & Tiwari, R. (1982). Convergence of Dirichlet measures

and the interpretation of their parameter. In Proceedings of the third

Purdue symposium on statistical decision theory and related topics.
Gupta, S.S. and Berger, J.O. (Eds.) Academic press, New York.

14
Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statis-

tica Sinica 4, 639–650.

15
0.25
0.20
0.15
Density

0.10
0.05
0.00

−10 −5 0 5 10 15

data

Figure 1: Histogram of data and density estimate of predictive density for

1/3N(-4,1)+1/3N(0,1)+1/3N(8,1)

16
10
8
6
clusters

4
2
0

0 2000 4000 6000 8000 10000

iterations

Figure 2: Running average for the number of clusters up to iteration 10000

Wharton Business Analytics Coursera Quiz
100% (2)
Wharton Business Analytics Coursera Quiz
155 pages
Cs1 Compiler With Index (13.03.2023)
No ratings yet
Cs1 Compiler With Index (13.03.2023)
209 pages
Biostatistics Kmu Final 2020 With Key
67% (6)
Biostatistics Kmu Final 2020 With Key
7 pages
ECE2191 Lecture Notes
No ratings yet
ECE2191 Lecture Notes
106 pages
Stat 130 - Chi-Square Goodnes-Of-Fit Test
100% (3)
Stat 130 - Chi-Square Goodnes-Of-Fit Test
32 pages
Bionic Formula FRM
100% (1)
Bionic Formula FRM
79 pages
FIT5197 2021 S1 Formula Sheet
No ratings yet
FIT5197 2021 S1 Formula Sheet
20 pages
Time Series Analysis
No ratings yet
Time Series Analysis
36 pages
Model-Free Objetive Bayesian Prediction
No ratings yet
Model-Free Objetive Bayesian Prediction
8 pages
Testing For Normality 1st Edition Henry C. Thode Download
No ratings yet
Testing For Normality 1st Edition Henry C. Thode Download
73 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Slice
No ratings yet
Slice
36 pages
Sanku and Tanujit 2014
No ratings yet
Sanku and Tanujit 2014
23 pages
Models Beyond The DP
No ratings yet
Models Beyond The DP
47 pages
MOD 5 (Hypotheis Testing)
No ratings yet
MOD 5 (Hypotheis Testing)
40 pages
Expo Kundu
No ratings yet
Expo Kundu
22 pages
Convergence Rates of Posterior Distribution - Ghosal, Ghosh, VD Vaart
No ratings yet
Convergence Rates of Posterior Distribution - Ghosal, Ghosh, VD Vaart
32 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
Bayesian Analysis of A Stochastic Volatility
No ratings yet
Bayesian Analysis of A Stochastic Volatility
25 pages
Exercise4 Solution
No ratings yet
Exercise4 Solution
20 pages
Stanley Et Al 2016 Latent Profile Analysis Understanding Family Firm Profiles
No ratings yet
Stanley Et Al 2016 Latent Profile Analysis Understanding Family Firm Profiles
19 pages
7 Mle
No ratings yet
7 Mle
31 pages
Kidist PPT - Admas
No ratings yet
Kidist PPT - Admas
21 pages
2 DP Handout
No ratings yet
2 DP Handout
41 pages
3 Prediction Handout
No ratings yet
3 Prediction Handout
25 pages
Walker 2007
No ratings yet
Walker 2007
11 pages
Sampling Marginals
No ratings yet
Sampling Marginals
13 pages
Kalli Et Al (2011)
No ratings yet
Kalli Et Al (2011)
13 pages
Nota Topik 1
No ratings yet
Nota Topik 1
25 pages
Importance Sampling
No ratings yet
Importance Sampling
13 pages
Samplesizearticle
No ratings yet
Samplesizearticle
10 pages
Orderstats Uni, Expo, Norm
No ratings yet
Orderstats Uni, Expo, Norm
24 pages
Distribution of Mutual Information
No ratings yet
Distribution of Mutual Information
9 pages
qt9kb6x0bw Nosplash
No ratings yet
qt9kb6x0bw Nosplash
18 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
Spectral Density Estimation Using P-Spline Priors
No ratings yet
Spectral Density Estimation Using P-Spline Priors
15 pages
MCMC
No ratings yet
MCMC
7 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
R Programming For Statistics by DR. SOURAV DAS
100% (7)
R Programming For Statistics by DR. SOURAV DAS
8 pages
Anderson Darling Minimisation
No ratings yet
Anderson Darling Minimisation
16 pages
Fitting A Mixture Distribution To Data
No ratings yet
Fitting A Mixture Distribution To Data
12 pages
Binder 1982
No ratings yet
Binder 1982
6 pages
Bayesian Regression For A Dirichlet Distributed Response Using Stan
No ratings yet
Bayesian Regression For A Dirichlet Distributed Response Using Stan
13 pages
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
No ratings yet
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
35 pages
Week 11
No ratings yet
Week 11
11 pages
Conditional Monte Carlo Revisited
No ratings yet
Conditional Monte Carlo Revisited
34 pages
LN 13
No ratings yet
LN 13
5 pages
2018 - 01 - 01 A Human Environmental Network Model For Assessing Coastal Global Environment - Small and Xian
No ratings yet
2018 - 01 - 01 A Human Environmental Network Model For Assessing Coastal Global Environment - Small and Xian
9 pages
Exercises Ch9 Type of Errors
No ratings yet
Exercises Ch9 Type of Errors
15 pages
On Input Selection With Reversible Jump Markov Chain Monte Carlo Sampling
No ratings yet
On Input Selection With Reversible Jump Markov Chain Monte Carlo Sampling
10 pages
Adaptive Bayesian Density Regression For High-Dimensional Data
No ratings yet
Adaptive Bayesian Density Regression For High-Dimensional Data
25 pages
Blind Source Separatuion Based On The Weibull Mixture Model: M. El-Sayed Wahed
No ratings yet
Blind Source Separatuion Based On The Weibull Mixture Model: M. El-Sayed Wahed
8 pages
Rarefied Gas Dynamics - DSMC Course
No ratings yet
Rarefied Gas Dynamics - DSMC Course
50 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
DP Tutorial 2
No ratings yet
DP Tutorial 2
44 pages
ProblemSheet1 23
No ratings yet
ProblemSheet1 23
5 pages
Nonlife Actuarial Models: Claim-Severity Distribution
No ratings yet
Nonlife Actuarial Models: Claim-Severity Distribution
62 pages
Bản sao 08 - Two Populations Hypothesis Testing
No ratings yet
Bản sao 08 - Two Populations Hypothesis Testing
9 pages
1973 - Faulkenberry - A Method of Obtaining Prediction Intervals
No ratings yet
1973 - Faulkenberry - A Method of Obtaining Prediction Intervals
4 pages
Dirichlet Process: Yee Whye Teh, University College London
No ratings yet
Dirichlet Process: Yee Whye Teh, University College London
13 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Chapter 09 Excercises Part II
No ratings yet
Chapter 09 Excercises Part II
5 pages
02 Family of Chi Square Distributions
No ratings yet
02 Family of Chi Square Distributions
6 pages
Binomial Distributions B Pharma
No ratings yet
Binomial Distributions B Pharma
9 pages
Understanding T Test 0 1
No ratings yet
Understanding T Test 0 1
5 pages
Problem and Solution Chart: Using Solvin'S Formula
No ratings yet
Problem and Solution Chart: Using Solvin'S Formula
2 pages
Report Jan Proj06
No ratings yet
Report Jan Proj06
9 pages
Dynamic Scaled Sampling For Deterministic Constraints
No ratings yet
Dynamic Scaled Sampling For Deterministic Constraints
9 pages
Managerial Statistics - Quiz 1 (Frequency Distribution)
No ratings yet
Managerial Statistics - Quiz 1 (Frequency Distribution)
2 pages
Gibbs Sampling
No ratings yet
Gibbs Sampling
10 pages
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
No ratings yet
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
6 pages
Two-Point Estimates in Probabilities
No ratings yet
Two-Point Estimates in Probabilities
7 pages
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
No ratings yet
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
24 pages
1992 Smith Gelfand
No ratings yet
1992 Smith Gelfand
6 pages
Running A T-Test in Excel
No ratings yet
Running A T-Test in Excel
3 pages
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
No ratings yet
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
7 pages
One-Sample Kolmogorov-Smirnov Test
No ratings yet
One-Sample Kolmogorov-Smirnov Test
2 pages
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
No ratings yet
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
14 pages
Persistence Analysis Tutorial: Swedge Has The Ability To Take These Factors Into Consideration in A
No ratings yet
Persistence Analysis Tutorial: Swedge Has The Ability To Take These Factors Into Consideration in A
13 pages
Seminar Maschinellem Lernen: An Improved Model Selection Heuristic For AUC
No ratings yet
Seminar Maschinellem Lernen: An Improved Model Selection Heuristic For AUC
19 pages
Variational Problems in Machine Learning and Their Solution With Finite Elements
No ratings yet
Variational Problems in Machine Learning and Their Solution With Finite Elements
11 pages
Finite Mixture Modelling Model Specification, Estimation & Application
No ratings yet
Finite Mixture Modelling Model Specification, Estimation & Application
11 pages
Handout On Mixtures of Densities and Distributions
No ratings yet
Handout On Mixtures of Densities and Distributions
3 pages
A Guide To Control Charts
No ratings yet
A Guide To Control Charts
12 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Ch6 Testbank Handout
No ratings yet
Ch6 Testbank Handout
5 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet

Walker 2016 SliceSampling

Uploaded by

Walker 2016 SliceSampling

Uploaded by

INTERNATIONAL CENTRE FOR ECONOMIC RESEARCH

WORKING PAPER SERIES

SAMPLING THE DIRICHLET MIXTURE MODEL WITH SLICES

Working Paper No. 16/2006

Institute of Mathematics, Statistics and Actuarial Science

Abstract: We provide a new approach to the sampling of the

Keywords: Bayesian Nonparametrics, Density estimation, Dirich-

Acknowledgements: The author is an EPSRC Advanced Re-

2. The Dirichlet Process Model. Let D(c, P0 ) denote a Dirichlet process

where w1 = v1 and for j > 1,

Clearly integrating over u with respect to the Lebesgue measure returns us

and so with probability wj , y and u are independent and are, respectively,

then we can equally write

fw,θ (y, δ = k, u) = N(y|θk )1(k ∈ A(u)).

The complete data likelihood based on a sample of size n is easily seen to be

3. The Sampling Algorithm. In order to implement a Gibbs sampler we

If there are no ki equal to j then f (θj | · · ·) = g0 (θj ).

f (vj |v−j , · · ·) ∝ beta(vj |1, c)1 (αj < vj < βj ) ,

D. We now discuss the sampling of the indicator variables. We clearly have

pr(δi = k| · · ·) ∝ 1(k ∈ Aw (ui )) N(yi |θk ).

Clearly Aw (ui ) is not empty; at least ki ∈ Aw (ui ).

See Muliere & Tardella (1998).

E. We can incorporate a prior on c, say π(c). We will sample f (c, w, θ|y, u, δ)

4. Illustration. Here we present a normal example in which θ = (µ, σ 2 ) and

Choosing non-informative specifications, we took ² = 0.5, s = 0.1 and the

Blackwell, D. (1973). The discreteness of Ferguson selections. Annals of

Blackwell, D. & MacQueen, J.B. (1973). Ferguson distributions via Pólya-

urn schemes. Annals of Statistics 1, 353–355.

portions with a generalization of the Dirichlet distribution. Journal of

Bayesian non-conjugate and hierarchical models by using auxiliary vari-

parametric Bayesian Statistics. Lecture Notes in Statistics. Springer,

by nonparametric estimation of the distribution of the means. Unpub-

Escobar, M.D. (1994). Estimating normal means with a Dirichlet process

prior. Journal of the American Statistical Association 89, 268–277.

ence using mixtures. Journal of the American Statistical Association

Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric prob-

lems. Annals of Statistics 1, 209–230.

Freedman, D.A. (1963). On the asymptotic behaviour of Bayes estimates in

the discrete case I. Annals of Mathematical Statistics 34, 1386–1403.

tion and Bayesian model determination. Biometrika 82, 711–732.

Ishwaran, H. & Zarepour, M. (2000). Markov chain Monte Carlo in approx-

imate Dirichlet and beta two parameter process hierarchical models.

Ishwaran, H. & James, L. (2001). Gibbs sampling methods for stick-breaking

priors. Journal of the American Statistical Association 96, 161–173.

Lo, A.Y. (1984). On a class of Bayesian nonparametric estimates I. Density

estimates. Annals of Statistics 12, 351–357.

MacEachern, S.N. (1994). Estimating normal means with a conjugate style

Dirichlet process prior. Communications in Statistics: Simulation and

let Process Models. In Practical Nonparametric and Semiparametric

MacEachern, S.N. and Müller, P. (1998). Estimating mixtures of Dirichlet

process models. Journal of Computational and Graphical Statistics 7,

Muliere, P. & Tardella, L. (1998). Approximating distributions of random

functionals of Ferguson-Dirichlet priors. Canadian Journal of Statistics

Müller, P. & Quintana, F.A. (2004). Nonparametric Bayesian Data Anal-

ysis.Statistical Science 19, 95–110.

mixture models. Journal of Computational and Graphical Statistics 9,

Richardson, S. & Green, P.J. (1997). On Bayesian analysis of mixtures with

an unknown number of components. Journal of the Royal Statistical

Papaspiliopoulos, O. & Roberts, G.O. (2005). Retrospective Markov chain

Monte Carlo methods for Dirichlet process hierarchical models. Sub-

Sethuraman, J. & Tiwari, R. (1982). Convergence of Dirichlet measures

and the interpretation of their parameter. In Proceedings of the third

tica Sinica 4, 639–650.

Figure 1: Histogram of data and density estimate of predictive density for

0 2000 4000 6000 8000 10000

Figure 2: Running average for the number of clusters up to iteration 10000

You might also like