MonteCarlo PDF
MonteCarlo PDF
Monte Carlo 1
WWW.SYSTAT.COM
For more information about SYSTAT® software products, please visit our WWW site
at http://www.systat.com or contact
Marketing Department
SYSTAT Software, Inc.
1735 Technology Dr., Ste. 430
San Jose, CA 95110
Phone: (800) 797-7401
Fax: (800) 797-7406
Email: [email protected]
General notice: Other product names mentioned herein are used for identification
purposes only and may be trademarks of their respective companies.
The SOFTWARE and documentation are provided with RESTRICTED RIGHTS. Use,
duplication, or disclosure by the Government is subject to restrictions as set forth in
subdivision (c)(1)(ii) of The Rights in Technical Data and Computer Software clause at
52.227-7013. Contractor/manufacturer is SYSTAT Software, Inc., 1735 Technology
Drive, Suite 430, San Jose, CA 95110. USA.
List of Examples v
Monte Carlo 1
Statistical Background. . . . . . . . . . . . . . . . . . . . . . . . . . 2
Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 3
Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 3
Adaptive Rejection Sampling (ARS) . . . . . . . . . . . . . . . . 4
Metropolis-Hastings (M-H) Algorithm. . . . . . . . . . . . . . . 5
Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Rao-Blackwellized Estimates with Gibbs Samples . . . . . . . .11
Precautions to be taken in using IID Monte Carlo and
MCMC features. . . . . . . . . . . . . . . . . . . . . . . . . . .12
Monte Carlo Methods in SYSTAT . . . . . . . . . . . . . . . . . . .13
Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . .13
Univariate Discrete Distributions Dialog Box . . . . . . . . . . .13
Univariate Continuous Distributions Dialog Box . . . . . . . . .14
Multivariate Distributions Dialog Box . . . . . . . . . . . . . . .15
Using Commands . . . . . . . . . . . . . . . . . . . . . . . . . .16
Distribution Notations used in Random Sampling . . . . . . . . .17
Rejection Sampling Dialog Box . . . . . . . . . . . . . . . . . .19
Adaptive Rejection Sampling Dialog Box . . . . . . . . . . . . .20
Using Commands . . . . . . . . . . . . . . . . . . . . . . . . . .21
M-H Algorithm Dialog Box . . . . . . . . . . . . . . . . . . . .22
Gibbs Sampling Dialog Box . . . . . . . . . . . . . . . . . . . .24
Integration Dialog Box . . . . . . . . . . . . . . . . . . . . . . .27
iii
Using Commands . . . . . . . . . . . . . . . . . . . . . . . . . 28
Usage Considerations . . . . . . . . . . . . . . . . . . . . . . . 29
Distribution Notations used in IIDMC and MCMC . . . . . . . 31
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
RANDSAMP:
Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
IIDMC:
IID Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
MCMC:
Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . 69
Index 77
iv
List of Examples
Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
v
Monte Carlo
R.L.Karandikar, T.Krishnan, and M.R.L.N.Panchanana
Monte Carlo methods (Fishman, 1996; Gentle, 1998; Robert and Casella, 2004;
Gamerman and Lopes, 2006) are used to estimate functional of a distribution function
using the generated random samples. SYSTAT provides Random Sampling, IID MC,
and MCMC algorithms to generate random samples from the required target
distribution.
Random Sampling in SYSTAT enables the user to draw a number of samples, each
of a given size, from a distribution chosen from a list of 42 distributions (discrete and
continuous, univariate and multivariate) with given parameters.
To simulate more complex models SYSTAT also provides random sampling from
univariate finite mixtures.
If no method is known for direct generation of random samples from a given
distribution or when the density is not completely specified, then IID Monte Carlo
methods may often be suitable. The IID Monte Carlo algorithms in SYSTAT are
usable only to generate random samples from univariate continuous distributions. IID
Monte Carlo consists of two generic algorithms: Rejection Sampling and Adaptive
Rejection Sampling (ARS). In these methods an envelope (proposal) function for the
target density is used. The proposal density is such that it is feasible to draw a random
sample from it. In Rejection Sampling, the proposal distribution can be selected from
SYSTAT’s list of 20 univariate continuous distributions. In ARS, the algorithm itself
constructs an envelope (proposal) function. The ARS algorithm is applicable only for
log-concave target densities.
A Markov chain Monte Carlo (MCMC) method is used when it is possible to
generate an ergodic Markov chain whose stationary distribution is the required target
distribution. SYSTAT provides two classes of MCMC algorithms: Metropolis−
Hastings (M-H) algorithm and the Gibbs sampling algorithm. With the M-H
1
2
Monte Carlo
algorithm, random samples can be generated from univariate distributions. Three types
of the Metropolis-Hastings algorithm are available in SYSTAT: Random Walk
Metropolis-Hastings algorithm (RWM-H), Independent Metropolis-Hastings
algorithm (IndM-H), and a hybrid Metropolis-Hastings algorithm of the two. The
choice of the proposal distribution in the Metropolis-Hastings algorithms is restricted
to SYSTAT’s list of 20 univariate continuous distributions. The Gibbs Sampling
method provided is limited to the situation where full conditional univariate
distributions are defined from SYSTAT’s library of univariate distributions. It is
advisable for the user to provide a suitable initial value/distribution for the MCMC
algorithms. No convergence diagnostics are provided and it is up to the user to suggest
the burn-in period and gap in the MCMC algorithms.
From the generated random samples, estimates of means of user-given functions of
the random variable under study can be computed along with their variance estimates,
relying on the law of large numbers. A Monte Carlo Integration method can be used in
evaluating the expectation of a functional form. SYSTAT provides two Monte Carlo
Integration methods: Classical Monte Carlo integration and Importance Sampling
procedures.
IID MC and MCMC algorithms of SYSTAT generate random samples from
positive functions only. Samples generated by the Random Sampling, IID MC and
MCMC algorithms can be saved.
The user has a large role to play in the use of the IID MC and MCMC features of
SYSTAT and the success of the computations will depend largely on the user’s
judicious inputs.
Statistical Background
Drawing random samples from a given probability distribution is an important
component of any statistical Monte Carlo simulation exercise. This is usually followed
by statistical computations from the drawn samples, which can be described as Monte
Carlo integration. The random samples drawn can be used for the desired Monte Carlo
integration computations using SYSTAT. SYSTAT provides direct random sampling
facilities from a list of 42 univariate and multivariate discrete and continuous
distributions. Indeed, in statistical practice, one has to draw random samples from
several other distributions, some of which are difficult to draw directly from. The
generic IID Monte Carlo and Markov chain Monte Carlo algorithms that are provided
by SYSTAT will be of help in these contexts. The random sampling facility from the
3
Monte Carlo
Random Sampling
The random sampling procedure can be used to generate random samples from the
distributions that are most commonly used for statistical work. SYSTAT implements,
as far as possible, the most efficient algorithms for generating samples from a given
type of distribution. All these depend on generation of uniform random numbers, based
on the Mersenne-Twister algorithm and Wichmann-Hill algorithm.
Mersenne-Twister (MT) is a pseudorandom number generator developed by
Makoto Matsumoto and Takuji Nishimura (1998). Random seed for the algorithm
can be mentioned by using RSEED= seed, where seed is any integer from 1 to
4294967295 for the MT algorithm and 1 to 30000 for the Wichmann-Hill
algorithm. We recommend the MT option, especially if the random numbers to be
generated in your Monte Carlo studies is fairly large, say more than 10,000.
If you would like to reproduce results involving random number generation from
earlier SYSTAT versions, with old command file or otherwise, make sure that your
random number generation option (under Edit => Options => General => Random
Number Generation) is Wichmann-Hill (and, of course, that your seed is the same as
before).
The list of distributions SYSTAT generates from, expressions for associated functions,
notations used and references to their properties are given in the Volume: Data: Chapter
4: Data Transformations: Distribution Functions. Definitions of multivariate
distributions, notations used and, references to their properties can be found later in this
chapter.
Rejection Sampling
Rejection Sampling is used when direct generation of a random sample from the target
density is difficult or when the density is specified but for a constant as fX(x), but a
related density gY(y) is available from which it is comparatively easy to generate
random samples from. This gY(y) is called the majorizing density function or an
envelope and it should satisfy the condition that MgY(x) ≥ fX(x) for every x, for some
constant 0<M<∞. For the method to work, it should be easy to draw random samples
from the distribution defined by the density function gY(.).
4
Monte Carlo
references may be consulted for details of the ARS algorithm. Since initial points on
the abscissa are essential for the ARS algorithm, SYSTAT requires the user to specify
two starting points to enable it to generate initial points between them for target
distributions whose support is unbounded, left and right bounded, and bounded. For a
target which has unbounded support, the two specified points are starting points; for a
left (right) bounded support, the left (right) point is a bound and the other point is a
starting point used to create initial points between the bound and itself; and for a
bounded support, both the specified points are bounds.
There are limitations in the applicability of the ARS algorithm. It is applicable only to
generate from univariate distributions and that too only with log-concave densities.
However, there is need to generate random samples from non-log-concave densities
and from multivariate densities. In the multivariate case, the target distribution is often
defined indirectly and/or incompletely, depending on the way it arises in the statistical
problem on hand. However, so long as the target distribution is uniquely defined from
the given specifications, it is possible to adopt an iterative random sampling (Monte
Carlo) procedure, which at the point of convergence delivers a random draw from the
target distribution. These iterative Monte Carlo procedures generate a random
sequence with the Markovian property such that the Markov chain is ergodic with a
limiting distribution coinciding with the target distribution, if the procedure is suitably
chosen. There is a whole family of such iterative procedures collectively called MCMC
procedures, with different procedures being suitable for different situations. The
Metropolis-Hastings (M-H) algorithm of SYSTAT generates random samples from
only univariate distributions, although the algorithm in general can generate from
multivariate distributions. Thus, SYSTAT’s M-H algorithm is useful when the target
density is not necessarily log-concave.
In MCMC algorithms, a suitable transition kernel K that satisfies the reversibility
condition K(x,y)π(x) = π(y)K(y,x) with a probability density function π(x) is utilized
to generate random variates from a stationary target function π(x). The Metropolis-
Hastings algorithm (Chib and Greenberg, 1995; Gilks, Richardson, and Spiegelhalter,
1998; Liu, 2001) is a type of MCMC algorithm that constructs a Markov chain by
choosing a transition kernel (continuous-time analog of the transition probability
matrix of the discrete case),
6
Monte Carlo
K MH ( x , y ) = q ( y | x )α ( x , y ) + [1 − ∫ q( y | x )α ( x , y )dy ]δ x ( y)
Rd
[1 − ∫ q( y | x)α ( x, y)dy]
Rd
is the probability that the chain remains at x, and δx(y) denotes a point mass at x.
Since the target function π(x) appears both in the numerator and denominator of
α(x, y), knowledge of the normalization constant for the target function is not needed.
By selecting a suitable proposal density for the target function, a Markov chain is
generated by the following Metropolis-Hastings algorithm.
Metropolis-Hastings Algorithm
Step 1. Generate yt from q(y|x(t))
Step 2. Generate u from uniform(0,1)
Step 3. If u≤α(x(t),yt) then x(t+1) = yt; else x(t+1) = x(t)
Continue the above three steps until the required number of samples are generated.
A sample produced by the Metropolis-Hastings algorithm may not be an independent
identically distributed (i.i.d.) sample, because rejection of a proposed variate may
produce a repetition of target variate x(t) at time (t+1). By selecting random variates
with sufficiently large ‘gaps’ from the output, the M-H algorithm may result in
infrequent occurrence of repeated values. There are several approaches to selecting
proposal functions, resulting in specific types of M-H algorithms. SYSTAT provides
7
Monte Carlo
Gibbs Sampling
Gibbs Sampling (Casella and George, 1992) is a special case of the Metropolis-
Hastings algorithm. It deals with the problem of random sampling from a multivariate
distribution, which is defined in terms of a collection of conditional distributions of its
component random variables or sub-vectors, in such a way that this collection uniquely
defines the target joint distribution. Gibbs sampling can be regarded as component-
wise Metropolis-Hastings algorithm. The formulation of the defining marginal and
conditional distributions often arise naturally in Bayesian problems in terms of the
observable variable distributions and the prior distributions. These then give rise to a
random vector which is the posterior distribution of the observable-cum-parameters.
This posterior distribution is often difficult to work out analytically, necessitating the
development of Monte Carlo procedures like Gibbs Sampling. SYSTAT’s Gibbs
Sampling feature only handles the situation called the case of full conditionals,
wherein the defining collection consists of the conditional distribution of each single
component of the random vector given the values of the rest. SYSTAT considers only
those cases where such conditional distributions are standard distributions available in
its list.
The above Gibbs sampling procedure, starting from x1(0), x2(0)…, xp(0), generates a
‘Gibbs sequence’ x1(1), x2(1),…, xp(1),…, x1(n), x2(n),…, xp(n) after n iterations. This
sequence is a realization of a Markov chain with a stationary distribution, which is the
unique multivariate distribution defined by the full conditionals. Thus for large n, x1(n),
x2(n)…, xp(n) can be considered as a random sample from the target multivariate
distribution. The Gibbs Sampling method is also useful to approximate the marginal
density f(xi), i=1,2,…,p of a joint density function f(x1,x2,…,xp) or its parameters by
9
Monte Carlo
averaging the final conditional densities of each sequence-for that matter, marginal
multivariate densities and their parameters.
Integration
Ef [h( X )] = ∫ h( x) f ( x)dx
can be estimated by
∧ n
I n = 1n ∑ h ( x ) i
i =1
This is called Classical Monte Carlo Integration estimator. By the strong law of
large numbers ˆI n converges almost surely to Ef[h(X)]. The standard error of the
estimate is
2
n ⎡ ∧
⎤
1
∑ ⎢ h( xi ) − I n ⎥
n(n − 1) i =1 ⎣ ⎦
2
n ⎡ ∧
⎤
1
∑ i i g⎥
⎢
n(n − 1) i =1 ⎣⎢
h ( x ) w( x ) − I
⎦⎥
The optimal importance density for minimizing the variance of integration estimator is
h( x ) f ( x )
g * ( x) =
∫ h( z ) f ( z )dz
The integration estimate can also be computed by Importance sampling ratio estimate
n
∑ h( x )w( x ) i i
Iˆw = i =1
n
∑ w( x )
i =1
i
n 2
∑ ⎡⎣h( x ) − Iˆ
i =1
i w
⎤ [ w( xi )]2
⎦
2
⎡ n ⎤
⎢ ∑ w( xi ) ⎥
⎣ i =1 ⎦
The advantage of using the ratio estimate compared to the integration estimate is that
in using the latter we need to know the weight function (i.e., ratio of target and
importance functions) exactly, whereas in the former case, the ratio needs to be known
only up to a multiplicative constant. If the support of importance function g(x) consists
of the support of density function f(x), then the generated samples are i.i.d. and
Importance Sampling estimator converges almost surely to the expectation.
Monte Carlo Integration methods in SYSTAT are invoked only after generating a
random sample using any one of the univariate discrete and univariate continuous
random sampling methods, Rejection Sampling, Adaptive Rejection Sampling, or the
M-H algorithms. SYSTAT computes a Monte Carlo integration estimate and standard
11
Monte Carlo
error. There is a choice between the Classical Monte Carlo Integration method and the
two Importance Sampling methods. Classical Monte Carlo Integration estimate is
evaluated with respect to the density function related to the distribution from which
samples are generated. In the Importance Sampling Integration and Importance
Sampling Ratio procedures, the importance function is the density related to the
distribution from which samples are generated.
1 n
δ rb = ∑ E ⎡ h( x1(t ) ) | x2 ,..., xn ⎤⎦
n t =1 ⎣
instead of the empirical estimator
1 n
∑
n t =1
[
h( x1( t ) ) ]
in Gibbs Sampling method. See Liu et al. (1994), Robert and Casella (2004) for details.
12
Monte Carlo
You may obtain absurd results if suitable inputs are not given in the case of IID Monte
Carlo and MCMC algorithms. Some of the precautions to be taken are:
In Rejection Sampling, the output may not be appropriate if the target function’s
support is not a subset of the support of the proposal function or if the target
function is not dominated by a constant times the proposal density function.
Log-concavity of target function has to be checked by you before using the ARS
algorithm.
If you get a sample that does not cover the entire parameter space of left or right
bounded and bounded target functions using ARS algorithm, you should check
whether the assigned starting points consist of corresponding bounded values.
In M-H algorithms, it is your responsibility to generate an ergodic Markov chain
by selecting a suitable proposal density function.
You should ensure that the expectation of the integrand exists before doing Monte
Carlo Integration.
The time required to generate samples using MCMC algorithms depends, among
other factors, on the burn-in period and gap, and in some situations may be quite
large.
While using SYSTAT’s Gibbs Sampling algorithm to generate random samples from
the distribution of a p-dimensional random vector X = (X1, X2, …, Xp), you should
note that:
The input (defining conditional distributions) consists of only univariate
distributions from SYSTAT’s list of distributions.
The input should give the marginal distributions of each Xi given the rest of the
components of X.
The parameters of the conditional distributions have to be specified in the specified
syntax.
It is your responsibility to ensure that the above inputs satisfy the conditions
required for them to define uniquely the joint distribution of the components of X
as your target distribution.
13
Monte Carlo
Random Sampling
Before using the Random Sampling feature you should study the list of distributions,
the form of the density functions, especially in respect of the parameters and the names
and notations for the parameters, from the volume Data: Chapter 4: Data
Transformations: Distribution Functions. It may also be useful to consult the references
therein for the properties of these distributions and the meanings of the parameters. The
distributions are divided into three groups---univariate discrete, univariate continuous
and, multivariate.
To open the Monte Carlo: Random Sampling: Univariate Discrete Distributions dialog
box, from the menus choose:
Addons
Monte Carlo
Random Sampling
Univariate Discrete…
14
Monte Carlo
Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Distribution. Choose the distribution from the drop-down list. The list consists of nine
univariate discrete distributions: Benford’s Law, Binomial, Discrete uniform,
Geometric, Hypergeometric, Logarithmic series, Negative binomial, Poisson, and
Zipf. Enter the values of the parameters (depending on the distribution selected) in the
boxe(es).
Save file. You can save the output to a specified file.
Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Distribution. Choose the distribution from the drop-down list.The list consists of
twenty eight univariate continuous distributions: Beta, Cauchy, Chi-square, Double
exponential, Erlang, Exponential, F, Gamma, Generalized lambda, Gompertz,
Gumbel, Inverse Gaussian (Wald), Logistic, Loglogistic, Logit normal, Lognormal,
Non-central chi-square, Non-central F, Non-central t, Normal, Pareto, Rayleigh, t,
Smallest extreme value, Studentized range, Triangular, Uniform, and Weibull. Enter
the values of the parameters (depending on the distribution selected) in the box(es).
Save file. You can save the output to a specified file.
To open the Monte Carlo: Random Sampling: Multivariate Distributions dialog box,
from the menus choose:
Addons
Monte Carlo
Random Sampling
Multivariate…
16
Monte Carlo
Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample you want to generate.
Random seed.The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Distribution. Choose the distribution from the drop-down list. The list consists of five
multivariate distribitions: Bivariate exponential, Dirichlet, Multinomial, Multivariate
normal, and Wishart. Enter the values of the parameters (depending on the distribution
selected) in the box(es).
Save file. You can save the output to a specified file.
Using Commands
where low is the smallest value and hi, the largest value; loc is the location parameter
and sc, the scale parameter; shp is the shape parameter and thr, the threshold parameter,
nc is the non-centrality parameter (for univariate non-central distributions), and df is
the degrees of freedom.
Note: * indicates multivariate distributions.
Example: Normal random number with parameters (0, 1)
RANDOMSAMP
UNIVARIATE ZRN(0,1)
To open the Monte Carlo: IIDMC: Rejection Sampling dialog box, from the menus
choose:
Addons
Monte Carlo
IIDMC
Rejection Sampling…
Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Target function. Specify your target function in the required syntax.
Constant. Enter the value that is an upper bound to supremum of ratio of target to
proposal functions.
Proposal. Select a suitable proposal distribution function. The list consists of twenty
univariate continuous distributions: Beta, Cauchy, Chi-square, Double Exponential ,
20
Monte Carlo
To open the Adaptive Rejection Sampling dialog box, from the menus choose:
Addons
Monte Carlo
IIDMC
Adaptive Rejection Sampling…
Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Target function. Specify your target function, which should satisfy the log concavity
condition.
21
Monte Carlo
Support of target. The method at first constructs a proposal function using initial
points on the support of target distribution and extends it depending on the type of the
target function. Bounds and starting points should be given.
Unbounded. Specifies the support of target as unbounded. The two points are
starting points.
Right bounded. Specifies the support of target as right bounded. The left point is a
starting point and the right one is a bound.
Left bounded. Specifies the support of target as left bounded. The right point is a
starting point and the left one is a bound.
Bounded. Specifies the support of target as bounded. The left and right starting
points are bounds.
Left point/bound. Enter a point preferably to the left side of the mode of the target
function.
Right point/bound. Enter a point preferably to the right side of the mode of the
target function.
Save file. You can save the output to a specified file.
Using Commands
The distribution notation for the proposal expression can be chosen from the notations
of Beta, Cauchy, Chi-square, Exponential, F, Gamma, Gompertz, Gumbel, Double
Exponential(Laplace), Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh,
t,Triangular, Uniform, Inverse Gaussian (Wald), and Weibull distributions.
For the Adaptive Rejection Sampling method:
IIDMC
SAVE FILENAME
REJECT ARS(targetexpression, rangeexpression)/SIZE=n1
NSAMPLE= n2 RSEED=n3
22
Monte Carlo
Range expressions for target function in ARS command are listed in the following
table:
To open the M-H Algorithm dialog box, from the menus choose:
Addons
Monte Carlo
MCMC
M-H Algorithm…
23
Monte Carlo
Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Burn-in. Enter the size of random sample to be discarded initially from the chain.
Gap. Enter the difference between the indices of two successive random observations
that can be extracted from the generated sequence.
Target function. Specify your target function.
Algorithm type. Select the algorithm from the following:
Random walk. Generates random sample using RWM-H algorithm.
Independent. Generates random sample using IndM-H algorithm.
Hybrid RWInd. Generates random sample using Hybrid RWInd M-H algorithm.
Support of target. Support of your target distribution can be specified as bounded, left
bounded, right bounded, and unbounded.
Unbounded. Specifies the support of target as unbounded.
Right bounded. Specifies the support of target as right bounded.
24
Monte Carlo
To open the Monte Carlo: MCMC: Gibbs Sampling dialog box, from the menus
choose:
Addons
Monte Carlo
MCMC
Gibbs Sampling…
25
Monte Carlo
Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the multivariate sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Gap. Enter the difference between the indices of two successive random observations
that can be extracted from the generated sequence.
Burn-in. Enter the size of random sample to be discarded initially from the chain.
Use file. Open the data file, where variables in the data file are part of the parameter
expressions of full conditionals.
Full conditionals. Specify the full conditional distributions.
Variables. Enter the variable for which you want to generate random sample.
Distribution. Select the required distribution from the list provided. The list
consists of seven univariate discrete and twenty univariate continuous
distributions. They are Binomial, Discrete uniform, Geometric, Hypergeometric,
Poisson, Negative Binomial, Zipf, Beta, Cauchy, Chi-square, Double Exponential
(Laplace), Exponential, F, Gamma, Gompertz, Gumbel, Inverse Gaussian (Wald),
26
Monte Carlo
To obatin the Monte Carlo: Integration dialog box, from the menus choose:
Addsons
Monte Carlo
Integration…
Density function. Type your density function, which is the numerator of the weight
function in the Importance Sampling method.
28
Monte Carlo
Using Commands
Range expressions for target function in MH command are listed in following table:
The distribution notation for the proposal can be chosen from the notations of Beta,
Cauchy, Chi-square, Exponential, F, Gamma, Gompertz, Gumbel, Double Exponential
(Laplace), Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh, t, Triangular,
Uniform, Inverse Gaussian (Wald), Weibull distributions.
or
MCMC
USE FILENAME
VARIABLE DECLARATIONS
GVAR VARLIST
GIBBS(fcexpname_1,…, fcexpname_k) /SIZE=n1 NSAMPLE=n2
BURNIN=n3 GAP=n4 RSEED=n5
The distribution notations for full conditionals can be chosen from the notations of
Binomial, Discrete Uniform, Geometric, Hypergeometric, Negative Binomial,
Poisson, Zipf, Beta, Cauchy, Chi-square, Exponential, F, Gamma, Gompertz, Gumbel,
Double Exponential (Laplace), Logistic, Logit normal, Lognormal, Normal, Pareto,
Rayleigh, t, Triangular, Uniform, Inverse Gaussian (Wald) and Weibull distributions.
Monte Carlo Integration methods in SYSTAT can be used only after generating random
samples from any one of univariate discrete and univariate continuous random
sampling methods, Rejection Sampling, Adaptive Rejection Sampling and M-H
algorithms.
Usage Considerations
Types of data. Gibbs Sampling and Monte Carlo Integration use rectangular data only.
For the remaining features no input data are needed.
Print options. There are no PLENGTH options.
Quick Graphs. Monte Carlo produces no Quick Graphs. You use the generated file and
produce the graphs you want. For more information. refer examples.
Saving files. Generated samples can be saved in the file mentioned. For all distributions
(except Wishart) case number refers to observation number. For all univariate
distributions column names are s1, s2, …(number after s denotes sample number). For
multivariate distributions, the format of the saved/output file is as follows: Column
name format is “s*v*”, where * after s denotes sample number and * after v denotes
variable number. For Wishart, the leading column “OBS_NO” with elements “o*v*”,
where * after o denotes observation number and * after v denotes variable number. The
output format of Rejection sampling, ARS and M-H algorithms are the same as the
30
Monte Carlo
univariate distributions. For Gibbs Sampling, column name is the name of the variable
with sample number.
By groups. By groups is not relevant.
Case frequencies. Case frequency is not relevant.
Case weights. Case weight is not relevant.
31
Monte Carlo
where low is the smallest value and hi, the largest value; loc is the location parameter
and sc, the scale parameter; shp is the shape parameter and thr, the threshold parameter,
32
Monte Carlo
PDF:
k k
n!
P[ N i = ni , i = 1, 2,...k ] = k ∏ pini , n ≥ 1, ∑ ni = n
∏n !
i =1
i
i =1 i =1
PDF:
where,
F ( x1 , x2 ) = exp{−λ1 x1 − λ2 x2 − λ12 max( x1 , x 2 )}, for 0 < x1 , x2
Note: λ12 positive real (Failure rate 3), sometimes denoted by λ3.
Dirichlet
Parameters: k, P
k : positive integer (>2)
P: k dimensional vector of shape parameters (each component is positive real number).
k
PDF: Each xi is in [0,1] such that ∑x
i =1
i =1
k
Γ(∑ p j ) k
∏x
j =1 p j −1
f ( x) = k j
∏ Γ( p
j =1
j ) j =1
Multivariate normal:
Parameters: p, mu, sigma
p: positive integer (>1)
mu: p x 1 vector of reals
sigma: pxp symmetric positive definite matrix.
34
Monte Carlo
PDF:
−1/ 2 1
f ( x) = (2π ) − p / 2 Σ exp{− ( x − μ )T Σ −1 ( x − μ )}, x ∈ R p
2
Wishart:
Parameters:p, m, sigma, c
p: positive integer(>1)
m: positive integer (>=p)(degree of freedom)
sigma: pxp symmetric positive definite matrix
c : pxp matrix of non-centrality parameters.
m
W = ∑ Yi '*Yi
i =1
PDF:
( m − p −1) / 2 1 m 1
f W ( S ) = w( p, Σ, m, M ' ) S exp[− tr (Σ −1 S )]0 F1 ( ; Σ −1 M ' MΣ −1 S
2 2 4
where M=E(Y), (m by p matrix)
−1
⎛ m ⎞ −m / 2 1
w( p , Σ , m , M ' ) = ⎜ Γ p ( ) ⎟ 2Σ exp[ − tr ( Σ −1 M ' M )],
⎝ 2 ⎠ 2
p
m 1
Γp ( ) = π p ( p −1) / 4
* ∏ Γ[ ( m + 1 − i )]
2 i =1 2
m
and 0 F1 ( ,*) is the hypergeometric function.
2
Expressions in Monte Carlo
For IIDMC and M-H algorithms, the target functions from which random sample is
generated are expressions which involve mathematical functions of a single variable.
The integrand in Monte Carlo Integration and the density function in Importance
Sampling procedures are expressions. In the Gibbs Sampling method, the parameters
of full conditionals are expressions, which may involve variables from a data file and
mathematical functions. For construction of expressions you can use all numeric
functions from SYSTAT’s function library.
36
Monte Carlo
Examples
Example 1
Sampling Distribution of Double Exponential (Laplace) Median
This example generates 500 samples each of size 20 and investigates the distribution
of the sample median by computing the median of each sample.
The input is:
RANDSAMP
UNIVARIATE DERN(2,1) / SIZE=20 NSAMPLE=500 RSEED=23416
Using the generated (500) samples, the distribution of sample median can be obtained.
The input is:
SSAVE 'STATS'
STATISTICS S1 .. S500 /MEDIAN
USE 'STATS'
TRANSPOSE S1..S500
VARLAB COL(1) / 'MEDIAN'
CSTATISTICS COL(1) / MAXIMUM MEAN MINIMUM SD VARIANCE N
SWTEST
BEGIN
DENSITY COL(1) / HIST XMIN=0 XMAX=4
DENSITY COL(1) / NORMAL XMIN=0 XMAX=4
END
200 0.4
150 0.3
100 0.2
50 0.1
0 0.0
0 1 2 3 4
MEDIAN
¦ MEDIAN
-----------------------+-------
N of Cases ¦ 500
Minimum ¦ 1.190
Maximum ¦ 2.665
Arithmetic Mean ¦ 1.994
Standard Deviation ¦ 0.248
Variance ¦ 0.062
Shapiro-Wilk Statistic ¦ 0.994
Shapiro-Wilk p-value ¦ 0.050
We observe that the sampling distribution of the double exponential sample median can
be described to be normal.
Example 2
Simulation of Assembly System
Consider a system having two parallel subsystems (A and B) connected in a series with
another subsystem (C), as shown in the structural diagram. In such a system, work at
"C" can start only after the work at "A" and "B" is completed. The process completion
time for this system is the maximum of the processing times for "A" and "B" plus the
processing time for "C".
38
Monte Carlo
Assume that the system is a production line for a specific product, and that the
processing time distributions for the three subsystems are independent. Let us specify
that:
The production engineer wants to find the distribution of manufacturing time and to
estimate the probability that the manufacturing time is less than 5 units of time.
3000
1000 0.1
0 0.0
0 5 10 15
TIME
¦ PROB
-----------------+------
Arithmetic Mean ¦ 0.979
The output shows the histogram of 10000 simulated manufacturing times; the
estimated probability that manufacturing time is less than 5 time units is 0.979.
Example 3
Generation of Random Sample from Bivariate Exponential (Marshal-Olkin
Model) Distribution
An electronics engineer wants to study the joint distribution of two specific electronic
subsystems in her assembly. From her prior knowledge she knows the mean failure
time for the first subsystem as 1.2 (units) and for the second subsystem, as 1.3 (units).
If some strong shock occurs, then both of these subsystems fail. She also knows the
mean occurrence time for this strong shock as 0.1 (units). Assuming the Marshal-Olkin
model, realization of her assembly failures are carried out and the input is:
RANDSAMP
MULTIVARIATE BERN(1,1,0.1) / SIZE = 10000 NSAMPLE=1
RSEED=542375
CSTATISTICS / MAXIMUM MEAN MINIMUM SD VARIANCE N
PLOT S1V1*S1V2/ BORDER=HIST
GRAPH OFF
CORR
PEARSON S1V1 S1V2
GRAPH ON
40
Monte Carlo
6
S1V1
0
0 1 2 3 4 5 6 7 8 9
S1V2
¦ S1V1 S1V2
-----+--------------
S1V1 ¦ 1.000
S1V2 ¦ 0.047 1.000
Example 4
Evaluating an Integral by Monte Carlo Integration Methods
⎛ πx ⎞
1
This example explains the evaluation of ∫ cos⎜⎝ 2 ⎟⎠dx using Monte Carlo Integration
0
methods.
Using the Classical Monte Carlo Integration method, the integration can be evaluated
by
41
Monte Carlo
1 n ⎛ π . xi ⎞ ,
Iˆn = ∑ cos⎜ ⎟
n i =1 ⎝ 2 ⎠
where xi are generated from the uniform distribution on[0,1].
The input is:
RANDSAMP
UNIVARIATE URN(0,1)/ SIZE=10000 NSAMPLE=1 RSEED=76453782
MCMC
INTEG COS(3.14159*X/2); MC
Importance Sampling, a variance reduction technique can be used to evaluate the given
integration more accurately. An optimal importance function (3/2)(1-x2), which is
proportional to can be used to estimate the above integral by
1 n ⎡ 1 ⎤
. Iˆg = ∑ ⎢cos(πxi 2 ) 2 ⎥
n i =1 ⎣ (3 / 2)(1 − x ) ⎦
Since (3/2)(1-x2) is a log-concave function on (0, 1), the ARS algorithm can be used to
generate random samples from this density and the input is:
FORMAT 9,6
IIDMC
REJECT ARS((3/2)*(1-X^2),BD(0.0,1.0)) /SIZE=5000
RSEED=76453782
MCMC
INTEG FUN='COS(PI*X/2)' DENFUN='1' /IMPSAMPI
FORMAT
Example 5
Rejection Sampling
If α is an integer, random samples from gamma(α, β) can be generated by adding α
number of independent exponential(β)’s. But if α is not an integer, this simple method
is not applicable. Even though we can generate random samples from gamma(α, β)
using SYSTAT’s univariate continuous random sampling procedure, this example
illustrates an alternative method, using Rejection Sampling by considering
uniform(0,15), exponential(2.43), and gamma ([2.43],2.43/[2.43]) distributions
(Robert and Casella, 2004) as proposals in different exercises. [2.43] is the integer part
of 2.43.
¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.020
Maximum ¦ 14.395
Arithmetic Mean ¦ 2.426
Standard Deviation ¦ 1.557
Variance ¦ 2.424
43
Monte Carlo
8000 0.08
7000 0.07
6000 0.06
4000 0.04
3000 0.03
2000 0.02
1000 0.01
0 0.00
0 5 10 15
S1
¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.005
Maximum ¦ 15.603
Arithmetic Mean ¦ 2.429
Standard Deviation ¦ 1.556
Variance ¦ 2.422
44
Monte Carlo
10000 0.10
9000 0.09
8000 0.08
6000 0.06
Count
5000 0.05
4000 0.04
3000 0.03
2000 0.02
1000 0.01
0 0.00
0 5 10 15 20
S1
¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.008
Maximum ¦ 15.376
Arithmetic Mean ¦ 2.430
Standard Deviation ¦ 1.560
Variance ¦ 2.434
45
Monte Carlo
10000 0.10
9000 0.09
8000 0.08
6000 0.06
Count
5000 0.05
4000 0.04
3000 0.03
2000 0.02
1000 0.01
0 0.00
0 5 10 15 20
S1
are generated using gamma, exponential and uniform distributions as proposals. But,
the probability of accepting a sample from target function, 1/M varies with different
proposals.
The probability of accepting a proposal sample as a target variate depends on how close
the product of proposal and constant is to the target function. Observe this by plotting
the target function and product of proposal and constant together.
46
Monte Carlo
Y
0.2 0.30 0.2
When the uniform density function (Figure i) is considered as proposal, most of the
generated points are outside the accepted region. In Figure ii (exponential) and Figure
iii (gamma) the means of both target and proposal functions are the same, but when the
gamma density function is taken as proposal (Figure iii), the product of constant and
proposal is closer to the target function; thus, a generated point from proposal is
accepted as a sample from target function with high probability and hence simulated
values converge to theoretical values (mean=2.43, variance =2.43, and E[X2]=8.3349)
quickly.
47
Monte Carlo
Example 6
Estimating Mean and Variance of a Bounded Posterior Density Function
using RWM-H Algorithm and IndM-H Algorithm
Let the observations {1,1,1,1,1,1,2,2,2,3} be from the (discrete) logarithmic series
distribution with density
θx
p( x | θ ) = , x = 1, 2,3,... and 0 <θ <1
x(− log(1 − θ ))
From a sample of size 10, the logarithmic series distribution with parameter θ leads to
(unnormalized) likelihood function,
θ 15
L(θ ) =
[− log(1 − θ )]10
Let the prior be π(θ )=6θ (1-θ ). Then the posterior up to a multiplicative constant is
This example extracted from Monahan (2001) illustrates the generation of random
samples from the specified posterior using the Random Walk Metropolis-Hastings
algorithm and the Independent Metropolis-Hastings algorithm.
To generate a random sample using the RWM-H algorithm, the selected proposal is
uniform(-0.1, 0.1), which is symmetric around zero with small steps. Since the target
function is bounded between 0 and 1, the value generated by the initial distribution
should lie between 0 and 1 and thus the initial distribution is chosen as uniform(0,1).
For getting samples from the posterior and computing its basic statistics, the input is:
MCMC
MH MHRW((X^16*(1-X))/((-LOG(1-X))^10),BD(0,1), U(-0.1,0.1),
U(0.0,1.0)) / SIZE=100000,
NSAMPLE=1 BURNIN=500 GAP=30 RSEED=237465
5000
4000
3000
Count
2000
1000
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
S1
The mean and variance from the simulated data are 0.528 and 0.019 respectively.
When IndM-H is used, the support of the proposal should contain the support of
the target function; hence the selected proposal in this example is uniform(0,1). For
generating random samples from the posterior and getting its mean and variance,
the input is:
MCMC
MH MHIND((X^16*(1-X))/((-LOG(1-X))^10), BD(0,1), U(0.0,1.0),
U(0.0,1.0)) / SIZE=100000,
NSAMPLE=1 BURNIN=500 GAP=30 RSEED=65736736
CSTATISTICS S1/ MAXIMUM MEAN MINIMUM SD VARIANCE N
DENSITY S1 / KERNEL
49
Monte Carlo
5000
4000
3000
Count
2000
1000
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
S1
The mean and variance of the posterior from simulated data obtained by RWM-H
algorithm and IndM-H algorithm are approximately 0.527 and 0.019 respectively.
Example 7
Generating Bivariate Normal Random Samples by Gibbs Sampling Method
This example explains the generation of a random sample from a bivariate normal
distribution by iteratively generating univariate normal random samples.
The sample generated by the Gibbs Sampling method can be visualized through
different SYSTAT graphs. The input is:
USE GIBBSBVN
PLOT x21*x11 / BORDER=HIST
1
X21
-1
-2
-3
-4
-4 -3 -2 -1 0 1 2 3 4
X11
It can be noticed that the Rao-Blackwellized estimates close to the true values and also
that their variances are smaller than those of the naïve estimates.
Example 8
Gene Frequency Estimation
Rao (1973) illustrated maximum likelihood estimation of gene frequencies of O, A,
and B blood groups through the method of scoring. McLachlan and Krishnan (1997)
used the EM algorithm for the same problem. This example illustrates Bayesian
estimation of these gene frequencies by the Gibbs Sampling method. Consider the
following multinomial model with four cell frequencies and their probabilities with
parameters p, q, r with p+q+r=1. Let n = no+ nA+ nB+nAB.
Data Model
no=176 r2
nA=182 p2+2pr
nB=60 q2+2qr
nAB=17 2pq
Let us consider a hypothetical augmented data for this problem to be nO, nAA, nAO,nBB,
nBO, nAB with a multinomial model {n; (1-p-q)2, p2, 2p(1-p-q), q2, 2q(1-p-q), 2pq}.
With respect to the latter full model, nAA , nBB could be considered as missing data.
MODEL:
X ~ Multinomial6 (435; (1-p-q)2, p2, 2p(1-p-q), q2, 2q(1-p-q), 2pq)
Prior information:
⎛ p2 ⎞
n AA ~ Binomial ⎜⎜ n A , 2 ⎟
⎝ p + 2 p(1 − p − q) ⎟⎠
⎛ q2 ⎞
n BB ~ Binomial ⎜⎜ n B , 2 ⎟
⎝ q + 2q(1 − p − q) ⎟⎠
p ~ (1 − q) Beta (2n AA + n AO + n AB + α , 2nOO + n AO + n BO + γ )
q ~ (1 − p) Beta(2n BB + n BO + n AB + β , 2nOO + n AO + nBO + γ )
For generating random samples from p and q, the generated value from the beta
distribution is to be multiplied with (1-q) and (1-p) respectively. Since it is not possible
in our system to implement this, let us consider
p ~ Beta(2n AA + n AO + n AB + α , 2nOO + n AO + nBO + γ )
q ~ Beta(2nBB + n BO + n AB + β , 2nOO + n AO + nBO + γ )
and whenever p and q appear in other full conditionals, p is replaced by (1-q)p and q is
replaced by (1-p)q. By taking α=2, β=2, and γ =2, the input is:
FORMAT 10, 5
MCMC
NAA=40
NBB=5
P=0.1
Q=0.5
N1=182
N2=60
GVAR NAA,NBB,P,Q
FUNCTION FC1()
{
P1=(((1-Q)*P)^2)/((((1-Q)*P)^2)+(2*((1-Q)*P)*(1-((1-Q)*P)-((1-
P)*Q))))
NAA=NRN(N1,P1)
}
FUNCTION FC2()
{
P2=(((1-P)*Q)^2)/((((1-P)*Q)^2)+(2*((1-P)*Q)*(1-((1-P)*Q)-((1-
Q)*P))))
NBB= NRN(N2,P2)
}
FUNCTION FC3()
{
B1=NAA+182+17+1
53
Monte Carlo
B2=(2*176)+182+60-NAA-NBB+1
P=BRN(B1,B2)
}
FUNCTION FC4()
{
D1=NBB+60+17+1
D2=(2*176)+182+60-NAA-NBB+1
Q= BRN(D1,D2)
}
SAVE GIBBSGENETIC
GIBBS(FC1(),FC2(),FC3(),FC4()) / SIZE=10000 NSAMPLE=1
BURNIN=1000 GAP=1 RSEED=1783
USE GIBBSGENETIC
LET PP=(1-Q1)*P1
LET QQ=(1-P1)*Q1
LET RR=1-PP-QQ
LET RBEP= (1-
QQ)*((NAA1+182+17+2)/((NAA1+182+17+2)+((2*176)+182+60-NAA1-
NBB1+2)))
LET RBEQ=(1-
PP)*((NBB1+60+17+2)/((NBB1+60+17+2)+((2*176)+182+60-NAA1-
NBB1+2)))
LET RBER=1-RBEP-RBEQ
CSTATISTICS PP QQ RR RBEP RBEQ RBER/ MAXIMUM MEAN,MEDIAN MINIMUM
SD VARIANCE N PTILE=2.5 50 97.5
BEGIN
DENSITY PP RBEP/HIST XMIN=0.20 XMAX=0.35 LOC=0,0
DENSITY QQ RBEQ/HIST XMIN=0.05 XMAX=0.13 LOC=0,-3
DENSITY RR RBER/HIST XMIN=0.60 XMAX=0.75 LOC=0,-6
END
FORMAT
¦ RBER
-------------------+--------
N of Cases ¦ 10000
Minimum ¦ 0.61193
Maximum ¦ 0.66619
Median ¦ 0.63965
Arithmetic Mean ¦ 0.63966
Standard Deviation ¦ 0.00732
Variance ¦ 0.00005
Method = CLEVELAND ¦
2.50000% ¦ 0.62487
50.00000% ¦ 0.63965
97.50000% ¦ 0.65386
900
800 0.08
1500
Proportion per Bar
600 0.06
Count
Count
400 0.04
300
500
200 0.02
100
0 0.00 0 0.0
0.20 0.25 0.30 0.35 0.20 0.25 0.30 0.35
PP RBEP
700 0.07
600 0.06
Proportion per Bar
2000 0.2
500 0.05
Count
Count
400 0.04
300 0.03
1000 0.1
200 0.02
100 0.01
0 0.00 0 0.0
0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13
QQ RBEQ
1000 0.10
1500
Proportion per Bar
800 0.08
Count
Count
400 0.04
500
200 0.02
0 0.00 0 0.0
0.60 0.65 0.70 0.75 0.60 0.65 0.70 0.75
RR RBER
55
Monte Carlo
Example 9
Fitting Poisson Gamma Hierarchical Model
This example concerns the finding of failure rates of 10 pumps in a nuclear power
plant. The data set consists of the number of failures and the times of operation for 10
pump systems at the nuclear power plant. The data set is from Gaver and
O’Muircheartaigh (1987).
MODEL:
Assume that the number of failures F ~ Poisson (λiti), where λi is failure rate and ti
is the time of the operation for each pump. The prior densities are
with α=1.8, γ =0.01 and δ=1. The full conditional densities take the form
λi | β , ti , F ~ Gamma ( Fi + α , (ti + β ) −1 )
⎛ ⎛ 10
⎞ ⎞
−1
For getting random sample from the full conditionals and basic statistics, the input is
FORMAT 10, 5
USE PUMPFAILURES
MCMC
LAM1=0.5
LAM2=0.5
LAM3=0.5
LAM4=0.5
LAM5=0.5
LAM6=0.5
LAM7=0.5
LAM8=0.5
56
Monte Carlo
LAM9=0.5
LAM10=0.5
BETA=1.0
GVAR LAM1, LAM2, LAM3, LAM4, LAM5, LAM6, LAM7, LAM8, LAM9,
LAM10, BETA
FUNCTION FC1()
{
LAM1=GRN(DATA(F,1)+1.8 , 1/(DATA(T,1)+BETA))
}
FUNCTION FC2()
{
LAM2=GRN(DATA(F,2)+1.8 , 1/(DATA(T,2)+BETA))
}
FUNCTION FC3()
{
LAM3=GRN(DATA(F,3)+1.8 , 1/(DATA(T,3)+BETA))
}
FUNCTION FC4()
{
LAM4=GRN(DATA(F,4)+1.8 , 1/(DATA(T,4)+BETA))
}
FUNCTION FC5()
{
LAM5=GRN(DATA(F,5)+1.8 , 1/(DATA(T,5)+BETA))
}
FUNCTION FC6()
{
LAM6=GRN(DATA(F,6)+1.8 , 1/(DATA(T,6)+BETA))
}
FUNCTION FC7()
{
LAM7=GRN(DATA(F,7)+1.8 , 1/(DATA(T,7)+BETA))
}
FUNCTION FC8()
{
LAM8=GRN(DATA(F,8)+1.8 , 1/(DATA(T,8)+BETA))
}
FUNCTION FC9()
{
LAM9=GRN(DATA(F,9)+1.8 , 1/(DATA(T,9)+BETA))
}
FUNCTION FC10()
{
LAM10=GRN(DATA(F,10)+1.8 , 1/(DATA(T,10)+BETA))
57
Monte Carlo
}
FUNCTION FC11()
{
BETA=GRN((10*1.8)+0.01,1/(1.0+SUM(LAM1,LAM2,LAM3,LAM4,LAM5,LAM6,L
AM7,LAM8,LAM9,LAM10)))
}
SAVE GIBBSNUCLEARPUMPS
GIBBS(FC1(),FC2(),FC3(),FC4(),FC5(),FC6(),FC7(),FC8(),FC9(),FC10(
),FC11()) / SIZE=10000 NSAMPLE=1 BURNIN=500 GAP=30,
RSEED=746572365
USE GIBBSNUCLEARPUMPS
CSTATISTICS / MAXIMUM MEAN, MEDIAN MINIMUM SD VARIANCE N PTILE=2.5
50 97.5
¦ BETA1
-------------------+--------
N of Cases ¦ 10000
Minimum ¦ 0.73295
Maximum ¦ 6.42968
Median ¦ 2.39187
Arithmetic Mean ¦ 2.46662
Standard Deviation ¦ 0.71612
Variance ¦ 0.51283
Method = CLEVELAND ¦
2.50000% ¦ 1.33448
50.00000% ¦ 2.39187
97.50000% ¦ 4.09004
58
Monte Carlo
Example 10
Fitting Linear Regression using Gibbs Sampler
This example taken from Congdon (2001) illustrates a Bayesian Linear Regression of
December rainfall on November rainfall based on data for ten years. The data set is
from Lee (1997), where Y is December rainfall and X is November rainfall.
α + β (xi − x)
MODEL:
Assume that Yi ~ Normal (θ i , σ2), where θ i = .
Priors:
α ~ Normal ( μ 1 , σ 1 )
β ~ Normal ( μ 2 , σ 2 )
τ = σ − 2 ~ Gamma (γ , δ −1 )
⎛ ⎛ n 2⎞ ⎞
⎜ ( 1 1 ) ⎜ ∑ yi σ ⎟
μ σ +
2
⎟
α | β , σ −2 ~ Normal ⎜ ⎝ i −1 ⎠, 1 ⎟
⎜
⎜
( n σ ) + (1 σ1 )
2 2
(n σ ) + (1 σ 1 ) ⎟
2 2 ⎟
⎝ ⎠
⎛ ⎞
β | α , σ −2 ~ Normal ⎜
(
⎜ ( μ1 σ 12 ) + ∑ yi ( xi − x) σ 2 ) 1 ⎟
⎟
( )
,
( )
⎜ ∑ xi − x σ + (1 σ 2 )
⎜
(∑ ( x − x ) σ ) 2 ⎟
2
+ (1 σ 2 ) ⎟
2 2 2
2
i
⎝ ⎠
⎛ ⎤ ⎞
−1
⎡ n ⎡
( ( ))
2
⎤
⎜
n ⎢ ∑ ⎢⎣ yi − α + β xi − x ⎥⎦ ⎥ ⎟
σ −2 | α , β ~ Gamma ⎜ + γ , ⎢ i =1
+δ ⎥ ⎟
⎜2 ⎢ 2 ⎥ ⎟
⎜⎜ ⎢⎣ ⎥⎦ ⎟⎟
⎝ ⎠
59
Monte Carlo
Series Plot
70
60
50
ALPHA1
40
30
20
10
0 2000 4000 6000 8000 10000 12000
Case
TPLOT BETA1
Series Plot
1.0
0.5
BETA1
0.0
-0.5
-1.0
0 2000 4000 6000 8000 10000 12000
Case
61
Monte Carlo
TPLOT SIGSQ
Series Plot
3000
2000
SIGSQ
1000
0
0 2000 4000 6000 8000 10000 12000
Case
By posterior predictive simulation, the December rainfall can be predicted based on the
new November rainfall 46.1.
Computation
Algorithms
Algorithms used here for random sampling from specified distributions may be found
in Devroye (1986), Bratley et al. (1987), Chhikara and Folks (1989), Fishman (1996),
Gentle (1998), Evans et. al. (2000), Karian and Dudewicz (2000), Ross (2002), and
Hörmann et al. (2004). For some distributions, the inverse CDF method (analytical or
numerical) is used whereas for others special methods are used. The Adaptive
Rejection Sampling method uses the algorithm developed by Gilks (1992), Gilks and
Wild (1993), and Robert and Casella (2004).
References
Athreya, K. B., Delampady, M., and Krishnan, T. (2003). Markov chain Monte Carlo
methods. Resonance, 8 (4), 17--26; 8(7), 63--75; 8(10), 8--19; 8(12), 18--32..
Bratley, P., Fox, B. L., and Schrage, L.E. (1987). A guide to simulation. 2nd ed, New York:
Springer-Verlag.
Casella, G. and George, E. I. (1992). Explaining the Gibbs Sampler. The American
Statistician, 46, 167-174.
Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The
American Statistician, 49, 327-335.
Chhikara, R. S. and Folks, J. L. (1989). The inverse Gaussian distribution: Theory,
methodology, and applications. New York: Marcel Dekker.
Congdon, P. (2001). Bayesian statistical modeling. New York: John Wiley & Sons.
Devroye, L. (1986). Non- uniform random variate generation. New York: Springer-Verlag
*Evans, M., Hastings, N., and Peacock, B. (2000). Statistical distributions. 3rd ed. New
York: John Wiley & Sons.
Fishman, G. S. (1996). Monte Carlo: Concepts, algorithms, and applications. New York:
Springer-Verlag.
Gamerman, D. and Lopes, H. F. (2006). Markov chain Monte Carlo: stochostic simulation
for Bayesian inference, 2nd ed. Boca Raton, FL: Chapman & Hall / CRC.
Gaver, D. P. and O’Muircheartaigh, I. G. (1987) Robust empirical Bayes analysis of event
rates. Technometrics, 29, 1-15
Gentle, J. E. (1998). Random number generation and Monte Carlo methods. New York:
Springer-Verlag.
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo
63
Monte Carlo
UNIVARIATE command
Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in Random Sampling of Monte Carlo.
66
Language Reference - Monte Carlo
Examples:
For the normal distribution with parameters location = 0 and scale =1.2
UNIVARIATE ZRN(0,1.2)
MULTIVARIATE command
Examples:
For bivariate exponential distribution with parameters (1,1,0.5)
MULTIVARIATE BERN(1,1,0.5)
Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in Random Sampling of Monte Carlo.
SAVE command
SAVE filename
IIDMC:
IID Monte Carlo
IIDMC consists of two random sampling procedures: Rejection Sampling and Adaptive
Rejection Sampling. The random samples drawn can be used for the desired Monte
Carlo integration computations. The Classical Monte Carlo Integration and Importance
Sampling methods are not random sampling procedures, but they estimate (intractable)
integrals through empirical sampling (refer INTEG command in MCMC).
Setup:
* IIDMC
SAVE
HOT * REJECT
Note: GENERATE command used in the previous version is no more required. RJS and
ARS commands are functions under REJECT command. PROPOSAL command is also
not required, since it is the second argument of RJS function.
REJECT command
Specifies a target function in expression form for which random samples are needed.
'constant' is an upper bound to supremum of ratio of target and proposal functions and
should be a positive real number.
Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in IIDMC and MCMC of Monte Carlo.
or
Specifies a target function in expression form for which random samples are needed
and support of target distribution with starting points. The starting points are to be in
ascending order.
68
Language Reference - Monte Carlo
Range expressions for target expression in ARS function are listed in the following
table:
Example:
For adaptive rejection sampling from unbounded target function "exp(-x^2/2)" using -
1 and 1 as starting values
SAVE command
SAVE filename
MCMC:
Markov Chain Monte Carlo
MCMC generates random samples from a target distribution by constructing an ergodic
Markov chain. The M-H Algorithm and Gibbs Sampling method are two types of
MCMC algorithms. Three types of M-H Algorithm: RWM-H, IndM-H and Hybrid
RWInd M-H algorithms are provided. Fixed Scan Gibbs Sampling iteratively generates
random samples from full conditionals. Monte Carlo Integration methods are not
applicable to Gibbs Sampling procedure in SYSTAT.
Setup:
M-H Algorithm Gibbs Sampling
* MCMC * MCMC
SAVE USE
HOT * MH SAVE
HOT * INTEG * VARIABLE DECLARATIONS
* GVAR
FUNCTION
HOT * GIBBS
MH command
Specifies various algorithms to be used for random sample generation. Function names
indicate various types of M-H algorithms.
70
Language Reference - Monte Carlo
target_exp
Specifies a target function in expression form from which random samples are needed.
proposal_exp or initialvalue_exp
The distributions for proposal or initial values can be chosen from Beta, Cauchy, Chi-
square, Exponential, F, Gamma, Gompertz, Gumbel, Double Exponential (Laplace),
Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh, t, Triangular, Uniform,
Inverse Gaussian (Wald) and Weibull distributions.
Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in IIDMC and MCMC of Monte Carlo.
Examples:
To generate random samples using M-H random walk algorithm from unbounded
target function "exp(-(x^2)/2)" with Cauchy(0,1) as the proposal distribution and an
initial value from uniform(-1,1) distribution:
MH MHRW(exp(-(x^2)/2),UB,C(0,1),U(-1,1))
To generate random samples using M-H independent algorithm from bounded target
function "((x+2)^125)*((1-x)^38)*(x^34)" with uniform(0,1) the proposal distribution
and an initial value from uniform(-1,1) distribution:
MH MHIND(((x+2)^125)*((1-x)^38)*(x^34),BD( 0, 1), U(0,1), U(0,1))
INTEG command
INTEG expression
Examples:
INTEG x^2 ; MC
Note: Monte Carlo Integration methods in SYSTAT can be used only after generating
random samples from any one of the univariate discrete and univariate continuous
random sampling methods, Rejection Sampling, Adaptive Rejection Sampling and
M-H algorithms.
72
Language Reference - Monte Carlo
GVARIABLE command
GVARIABLE varlist
Specifies variables used for Gibbs sampling algorithm. Output will be given for
variables in varlist specified after GVARIABLE.
Examples:
GVARIABLE a, b
VARIABLE DECLARATIONS
Specifies initial values for Gibbs variables declared using GVARIABLE command.
Examples:
For setting initial values for Gibbs variables a = 0 and b = 0
a=0
b=0
FUNCTION command
FUNCTION function_name( )
{
List of statements
}
Function name must start with a letter. List of statements involve simple assignment(s)
to temporary variables and/or Gibbs variables.
73
Language Reference - Monte Carlo
Example:
FUNCTION FC1()
{
mu=0
sigma=1
a = zrn(mu, sigma)
}
GIBBS command
GIBBS( fname_1(),…,fname_k())
Specifies the list of all full conditional distributions specified using FUNCTION
command. Alternatively, you can specify all your expressions as arguments of GIBBS
(see example listed below). SYSTAT assumes one-one correspondence between
varlist in GVARIABLE command and GIBBS arguments.
Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in IIDMC and MCMC of Monte Carlo.
74
Language Reference - Monte Carlo
Example:
For generation of a random sample from a bivariate normal distribution with mean
vector [0 , 0] and correlation 0.98 using Gibbs sampling.
MCMC
X1=0.0
X2=0.0
sigma=0.1990
GVAR X1, X2
FUNCTION FC1()
{
X1= ZRN(0.98*X2, sigma )
}
FUNCTION FC1()
{
X1= ZRN(0.98*X2, sigma)
}
GIBBS (FC1(),FC2()) /SIZE=10000 NSAMPE=1 BURNIN=500 GAP=50
RSEED=231
In the above example, the full conditionals are simple to express as functions, so we
can give them directly as arguments of GIBBS:
MCMC
X1=0.0
X2=0.0
GVAR X1,X2
GIBBS ( ZRN(0.98*X2, 0.1990), ZRN(0.98*X1,
0.1990)/SIZE=10000 NSAMPE=1 BURNIN=500 GAP=50,
RSEED=231
USE command
Reads a data file named filename. You do not have to enclose the filename and path in
quotation marks unless the path or name contains spaces. In the absence of a
designated path, the software searches for the file in the directories defined by FPATH
for input data files (USE), temporary data files (WORK), and output data files (SAVE).
The date and time of a file's creation appear in the output when that file is used.
75
Language Reference - Monte Carlo
/ NAMES suppresses the date and time information, displaying only the names
of the variables in the data file.
NONAMES neither the variable names nor the file’s date and time are displayed.
COMMENT displays the file comments after the variable names.
DICTIONARY displays the file comments, variable names, and variable comments.
MATRIX or MAT reads the file as a matrix with specified name.
= matixname
MTYPE= NUMERIC reads all numeric or string variable(s) as a matrix. By default
or SYSTAT reads only numeric columns.
STRING
ROWNAME uses var (or var$) to name the rows of matrix.
=var or var$
COLNAME uses var (or var$) to name the columns of matrix.
=var or var$
SAVE command
SAVE filename
For IIDMC and M-H algorithms, the target functions from which random sample is
generated are expressions which involve mathematical functions of a single variable.
The integrand in Monte Carlo Integration and the density function in Importance
Sampling procedures are expressions. In the Gibbs Sampling method, the parameters
of full conditionals are expressions, which may involve variables from a data file and
mathematical functions.
Index
77
78
Index