0% found this document useful (0 votes)
156 views84 pages

MonteCarlo PDF

Uploaded by

Mr. K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views84 pages

MonteCarlo PDF

Uploaded by

Mr. K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

SYSTAT 12

Monte Carlo 1

WWW.SYSTAT.COM
For more information about SYSTAT® software products, please visit our WWW site
at http://www.systat.com or contact

Marketing Department
SYSTAT Software, Inc.
1735 Technology Dr., Ste. 430
San Jose, CA 95110
Phone: (800) 797-7401
Fax: (800) 797-7406
Email: [email protected]

Windows is a registered trademark of Microsoft Corporation.

General notice: Other product names mentioned herein are used for identification
purposes only and may be trademarks of their respective companies.

The SOFTWARE and documentation are provided with RESTRICTED RIGHTS. Use,
duplication, or disclosure by the Government is subject to restrictions as set forth in
subdivision (c)(1)(ii) of The Rights in Technical Data and Computer Software clause at
52.227-7013. Contractor/manufacturer is SYSTAT Software, Inc., 1735 Technology
Drive, Suite 430, San Jose, CA 95110. USA.

SYSTAT® 12 Monte Carlo-1


Copyright © 2007 by SYSTAT Software, Inc.
SYSTAT Software, Inc.
1735 Technology Dr., Ste. 430
San Jose, CA 95110
All rights reserved.
Printed in the United States of America.

No part of this publication may be reproduced, stored in a retrieval system, or


transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without the prior written permission of the publisher.
1234567890 05 04 03 02 01 00
Contents

List of Examples v

Monte Carlo 1

Statistical Background. . . . . . . . . . . . . . . . . . . . . . . . . . 2
Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 3
Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 3
Adaptive Rejection Sampling (ARS) . . . . . . . . . . . . . . . . 4
Metropolis-Hastings (M-H) Algorithm. . . . . . . . . . . . . . . 5
Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Rao-Blackwellized Estimates with Gibbs Samples . . . . . . . .11
Precautions to be taken in using IID Monte Carlo and
MCMC features. . . . . . . . . . . . . . . . . . . . . . . . . . .12
Monte Carlo Methods in SYSTAT . . . . . . . . . . . . . . . . . . .13
Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . .13
Univariate Discrete Distributions Dialog Box . . . . . . . . . . .13
Univariate Continuous Distributions Dialog Box . . . . . . . . .14
Multivariate Distributions Dialog Box . . . . . . . . . . . . . . .15
Using Commands . . . . . . . . . . . . . . . . . . . . . . . . . .16
Distribution Notations used in Random Sampling . . . . . . . . .17
Rejection Sampling Dialog Box . . . . . . . . . . . . . . . . . .19
Adaptive Rejection Sampling Dialog Box . . . . . . . . . . . . .20
Using Commands . . . . . . . . . . . . . . . . . . . . . . . . . .21
M-H Algorithm Dialog Box . . . . . . . . . . . . . . . . . . . .22
Gibbs Sampling Dialog Box . . . . . . . . . . . . . . . . . . . .24
Integration Dialog Box . . . . . . . . . . . . . . . . . . . . . . .27

iii
Using Commands . . . . . . . . . . . . . . . . . . . . . . . . . 28
Usage Considerations . . . . . . . . . . . . . . . . . . . . . . . 29
Distribution Notations used in IIDMC and MCMC . . . . . . . 31
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Language Reference - Monte Carlo 65

RANDSAMP:
Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
IIDMC:
IID Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
MCMC:
Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . 69

Index 77

iv
List of Examples

Sampling Distribution of Double Exponential (Laplace) Median . . . . . . . 36

Simulation of Assembly System . . . . . . . . . . . . . . . . . . . . . . . . 37

Generation of Random Sample from Bivariate


Exponential (Marshal-Olkin Model) Distribution . . . . . . . . . . . . . . . 39

Evaluating an Integral by Monte Carlo Integration Methods . . . . . . . . . 40

Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Estimating Mean and Variance of a Bounded Posterior Density


Function using RWM-H Algorithm and IndM-H Algorithm . . . . . . . . . 47

Generating Bivariate Normal Random Samples by Gibbs Sampling Method . 49

Gene Frequency Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Fitting Poisson Gamma Hierarchical Model . . . . . . . . . . . . . . . . . . 55

Fitting Linear Regression using Gibbs Sampler . . . . . . . . . . . . . . . . 58

v
Monte Carlo
R.L.Karandikar, T.Krishnan, and M.R.L.N.Panchanana

Monte Carlo methods (Fishman, 1996; Gentle, 1998; Robert and Casella, 2004;
Gamerman and Lopes, 2006) are used to estimate functional of a distribution function
using the generated random samples. SYSTAT provides Random Sampling, IID MC,
and MCMC algorithms to generate random samples from the required target
distribution.
Random Sampling in SYSTAT enables the user to draw a number of samples, each
of a given size, from a distribution chosen from a list of 42 distributions (discrete and
continuous, univariate and multivariate) with given parameters.
To simulate more complex models SYSTAT also provides random sampling from
univariate finite mixtures.
If no method is known for direct generation of random samples from a given
distribution or when the density is not completely specified, then IID Monte Carlo
methods may often be suitable. The IID Monte Carlo algorithms in SYSTAT are
usable only to generate random samples from univariate continuous distributions. IID
Monte Carlo consists of two generic algorithms: Rejection Sampling and Adaptive
Rejection Sampling (ARS). In these methods an envelope (proposal) function for the
target density is used. The proposal density is such that it is feasible to draw a random
sample from it. In Rejection Sampling, the proposal distribution can be selected from
SYSTAT’s list of 20 univariate continuous distributions. In ARS, the algorithm itself
constructs an envelope (proposal) function. The ARS algorithm is applicable only for
log-concave target densities.
A Markov chain Monte Carlo (MCMC) method is used when it is possible to
generate an ergodic Markov chain whose stationary distribution is the required target
distribution. SYSTAT provides two classes of MCMC algorithms: Metropolis−
Hastings (M-H) algorithm and the Gibbs sampling algorithm. With the M-H

1
2
Monte Carlo

algorithm, random samples can be generated from univariate distributions. Three types
of the Metropolis-Hastings algorithm are available in SYSTAT: Random Walk
Metropolis-Hastings algorithm (RWM-H), Independent Metropolis-Hastings
algorithm (IndM-H), and a hybrid Metropolis-Hastings algorithm of the two. The
choice of the proposal distribution in the Metropolis-Hastings algorithms is restricted
to SYSTAT’s list of 20 univariate continuous distributions. The Gibbs Sampling
method provided is limited to the situation where full conditional univariate
distributions are defined from SYSTAT’s library of univariate distributions. It is
advisable for the user to provide a suitable initial value/distribution for the MCMC
algorithms. No convergence diagnostics are provided and it is up to the user to suggest
the burn-in period and gap in the MCMC algorithms.
From the generated random samples, estimates of means of user-given functions of
the random variable under study can be computed along with their variance estimates,
relying on the law of large numbers. A Monte Carlo Integration method can be used in
evaluating the expectation of a functional form. SYSTAT provides two Monte Carlo
Integration methods: Classical Monte Carlo integration and Importance Sampling
procedures.
IID MC and MCMC algorithms of SYSTAT generate random samples from
positive functions only. Samples generated by the Random Sampling, IID MC and
MCMC algorithms can be saved.
The user has a large role to play in the use of the IID MC and MCMC features of
SYSTAT and the success of the computations will depend largely on the user’s
judicious inputs.

Statistical Background
Drawing random samples from a given probability distribution is an important
component of any statistical Monte Carlo simulation exercise. This is usually followed
by statistical computations from the drawn samples, which can be described as Monte
Carlo integration. The random samples drawn can be used for the desired Monte Carlo
integration computations using SYSTAT. SYSTAT provides direct random sampling
facilities from a list of 42 univariate and multivariate discrete and continuous
distributions. Indeed, in statistical practice, one has to draw random samples from
several other distributions, some of which are difficult to draw directly from. The
generic IID Monte Carlo and Markov chain Monte Carlo algorithms that are provided
by SYSTAT will be of help in these contexts. The random sampling facility from the
3
Monte Carlo

standard distributions is a significant resource, which can be used effectively in these


generic IID and Markov chain Monte Carlo procedures.

Random Sampling

The random sampling procedure can be used to generate random samples from the
distributions that are most commonly used for statistical work. SYSTAT implements,
as far as possible, the most efficient algorithms for generating samples from a given
type of distribution. All these depend on generation of uniform random numbers, based
on the Mersenne-Twister algorithm and Wichmann-Hill algorithm.
„ Mersenne-Twister (MT) is a pseudorandom number generator developed by
Makoto Matsumoto and Takuji Nishimura (1998). Random seed for the algorithm
can be mentioned by using RSEED= seed, where seed is any integer from 1 to
4294967295 for the MT algorithm and 1 to 30000 for the Wichmann-Hill
algorithm. We recommend the MT option, especially if the random numbers to be
generated in your Monte Carlo studies is fairly large, say more than 10,000.
If you would like to reproduce results involving random number generation from
earlier SYSTAT versions, with old command file or otherwise, make sure that your
random number generation option (under Edit => Options => General => Random
Number Generation) is Wichmann-Hill (and, of course, that your seed is the same as
before).
The list of distributions SYSTAT generates from, expressions for associated functions,
notations used and references to their properties are given in the Volume: Data: Chapter
4: Data Transformations: Distribution Functions. Definitions of multivariate
distributions, notations used and, references to their properties can be found later in this
chapter.

Rejection Sampling

Rejection Sampling is used when direct generation of a random sample from the target
density is difficult or when the density is specified but for a constant as fX(x), but a
related density gY(y) is available from which it is comparatively easy to generate
random samples from. This gY(y) is called the majorizing density function or an
envelope and it should satisfy the condition that MgY(x) ≥ fX(x) for every x, for some
constant 0<M<∞. For the method to work, it should be easy to draw random samples
from the distribution defined by the density function gY(.).
4
Monte Carlo

Rejection Sampling Algorithm


Step 1. Draw y from gY(.)
Step 2. Draw u from Uniform(0,1).
Step 3. If u ≤ fX(y)/MgY(y) then accept y as the desired realization;else return to
Step 1
Step 4. Repeat until one y is accepted
Repeat this algorithm to select the desired number of samples. For distributions with
finite support and bounded density, gY(.) can always be chosen as uniform.
A SYSTAT user can select a proposal density from a list of univariate continuous
density functions and a positive number M < ∞. If target and proposal functions are
probability density functions, and M (1 < M < ∞) is the value of a majorizing constant,
then M can be interpreted as the average number of proposal variates required to obtain
one sample unit from the target function. The probability of accepting a chosen
proposal variate is large if the value of M is small. The requirements that must be
satisfied in rejection sampling are:
„ the supremum of the ratio of target and proposal functions should be bounded,
„ the constant M should be an upper bound to this supremum ratio, and
„ the support of the target distribution must be a subset of the support of the proposal
distribution.
The success of random sample generation from a target function using the Rejection
Sampling algorithm depends mainly on the choice of the proposal density function and
the majorizing constant M.

Adaptive Rejection Sampling (ARS)

When Rejection Sampling is intended to be used to generate random samples from a


target function, finding an appropriate proposal density function and a majorizing
constant M may not be easy. In the Adaptive Rejection Sampling (Gilks, 1992; Gilks
and Wild, 1992; Robert and Casella, 2004) method, the proposal density function is
constructed as an (polygonal) envelope of the target function (on the log scale). The
target function should be log-concave for this method to work. The envelope is updated
whenever a proposal variate is rejected, so that the envelope moves closer to the target
function by increasing the probability of accepting a candidate variate. The above
5
Monte Carlo

references may be consulted for details of the ARS algorithm. Since initial points on
the abscissa are essential for the ARS algorithm, SYSTAT requires the user to specify
two starting points to enable it to generate initial points between them for target
distributions whose support is unbounded, left and right bounded, and bounded. For a
target which has unbounded support, the two specified points are starting points; for a
left (right) bounded support, the left (right) point is a bound and the other point is a
starting point used to create initial points between the bound and itself; and for a
bounded support, both the specified points are bounds.

Metropolis-Hastings (M-H) Algorithm

There are limitations in the applicability of the ARS algorithm. It is applicable only to
generate from univariate distributions and that too only with log-concave densities.
However, there is need to generate random samples from non-log-concave densities
and from multivariate densities. In the multivariate case, the target distribution is often
defined indirectly and/or incompletely, depending on the way it arises in the statistical
problem on hand. However, so long as the target distribution is uniquely defined from
the given specifications, it is possible to adopt an iterative random sampling (Monte
Carlo) procedure, which at the point of convergence delivers a random draw from the
target distribution. These iterative Monte Carlo procedures generate a random
sequence with the Markovian property such that the Markov chain is ergodic with a
limiting distribution coinciding with the target distribution, if the procedure is suitably
chosen. There is a whole family of such iterative procedures collectively called MCMC
procedures, with different procedures being suitable for different situations. The
Metropolis-Hastings (M-H) algorithm of SYSTAT generates random samples from
only univariate distributions, although the algorithm in general can generate from
multivariate distributions. Thus, SYSTAT’s M-H algorithm is useful when the target
density is not necessarily log-concave.
In MCMC algorithms, a suitable transition kernel K that satisfies the reversibility
condition K(x,y)π(x) = π(y)K(y,x) with a probability density function π(x) is utilized
to generate random variates from a stationary target function π(x). The Metropolis-
Hastings algorithm (Chib and Greenberg, 1995; Gilks, Richardson, and Spiegelhalter,
1998; Liu, 2001) is a type of MCMC algorithm that constructs a Markov chain by
choosing a transition kernel (continuous-time analog of the transition probability
matrix of the discrete case),
6
Monte Carlo

K MH ( x , y ) = q ( y | x )α ( x , y ) + [1 − ∫ q( y | x )α ( x , y )dy ]δ x ( y)
Rd

satisfying the reversibility condition q(y|x)α(x,y)π(x)=q(x|y)α(y,x)π(y) and has target


function π(.) as its stationary density function, where the acceptance probability is:

The function q(.) is a proposal density function,

[1 − ∫ q( y | x)α ( x, y)dy]
Rd

is the probability that the chain remains at x, and δx(y) denotes a point mass at x.
Since the target function π(x) appears both in the numerator and denominator of
α(x, y), knowledge of the normalization constant for the target function is not needed.
By selecting a suitable proposal density for the target function, a Markov chain is
generated by the following Metropolis-Hastings algorithm.

Metropolis-Hastings Algorithm
Step 1. Generate yt from q(y|x(t))
Step 2. Generate u from uniform(0,1)
Step 3. If u≤α(x(t),yt) then x(t+1) = yt; else x(t+1) = x(t)
Continue the above three steps until the required number of samples are generated.
A sample produced by the Metropolis-Hastings algorithm may not be an independent
identically distributed (i.i.d.) sample, because rejection of a proposed variate may
produce a repetition of target variate x(t) at time (t+1). By selecting random variates
with sufficiently large ‘gaps’ from the output, the M-H algorithm may result in
infrequent occurrence of repeated values. There are several approaches to selecting
proposal functions, resulting in specific types of M-H algorithms. SYSTAT provides
7
Monte Carlo

three such types: Random Walk Metropolis-Hastings algorithm, Independent


Metropolis-Hastings algorithm, and a Hybrid Random Walk and Independent
Metropolis-Hastings algorithm.
Independent Metropolis-Hastings algorithm. Under a proposal distribution q(y|x)
using an independence chain, the probability of moving a point y is independent of the
current position x of the chain, i.e., q(y|x) = q(y). The acceptance probability can be
written as α(x,y) = min{w(y)/w(x), 1}, where w(x) = π(x)/q(x) can be considered a
weight function that can be used in an important sampling process, if the generated
variates are from q(y|x). It is suggested that the proposal be selected in such a way that
the weight function is bounded. In that case, the generated Markov chain is uniformly
ergodic. It must be ensured that the support of the proposal distribution covers the
support of the target distribution.
Random Walk Metropolis-Hastings algorithm. Under a proposal distribution q(y|x)
using a random walk chain, the new value y equals the current value plus ε t. A random
variate ε t follows the distribution q(.) and is independent of the current value, i.e.,
q(y|x) = q(y-x). The acceptance probability can be written as α(x,y) = min{π(y)/π(x),
1}, where the generated variates are from the proposal distribution. When the proposal
density is continuous and positive around zero, the generated RWM-H chain is ergodic.
It is recommended that a symmetric proposal distribution around zero be used. The
performance of the algorithm depends on the scale of the proposal distribution. A
proposal with small steps has a high acceptance rate, but the generated chain mixes
very slowly. A proposal with large steps does not move frequently resulting in a low
acceptance rate and slow mixing. The scale of the proposal should be chosen so that
both of the above cases are avoided. If the range of the target function is finite, then
boundary points can be treated as reflecting barriers of a random walk in the algorithm.
Hybrid Random Walk and Independent Metropolis-Hastings algorithm. When there
are two Markov chains with the same stationary distribution, a new Markov chain will
be generated by choosing one of the chains with equal probability 0.5 at each step.
Thus using Markov chains generated by random walk and independent Metropolis-
Hastings algorithms by the above device, a new hybrid chain is generated. This has the
advantage that, if for the given target function, even if one of the two chains is well-
behaved, then the hybrid chain is well-behaved.
8
Monte Carlo

Gibbs Sampling

Gibbs Sampling (Casella and George, 1992) is a special case of the Metropolis-
Hastings algorithm. It deals with the problem of random sampling from a multivariate
distribution, which is defined in terms of a collection of conditional distributions of its
component random variables or sub-vectors, in such a way that this collection uniquely
defines the target joint distribution. Gibbs sampling can be regarded as component-
wise Metropolis-Hastings algorithm. The formulation of the defining marginal and
conditional distributions often arise naturally in Bayesian problems in terms of the
observable variable distributions and the prior distributions. These then give rise to a
random vector which is the posterior distribution of the observable-cum-parameters.
This posterior distribution is often difficult to work out analytically, necessitating the
development of Monte Carlo procedures like Gibbs Sampling. SYSTAT’s Gibbs
Sampling feature only handles the situation called the case of full conditionals,
wherein the defining collection consists of the conditional distribution of each single
component of the random vector given the values of the rest. SYSTAT considers only
those cases where such conditional distributions are standard distributions available in
its list.

Gibbs Sampling Algorithm


Let x = ( x1, x2,…, xp )
Given x(t)= ( x1(t), x2(t),…, xp(t) )
generate x1(t+1) from the conditional distribution f1(x1| x2(t),…, xp(t))),
generate x2(t+1) from the conditional distribution f2(x2| x1(t+1), x3(t),…, xp(t)),
. . . . . and
generate xp (t+1) from the conditional distribution fp(xp| x1(t+1), x2(t+1),…, xp-1(t+1))

The above Gibbs sampling procedure, starting from x1(0), x2(0)…, xp(0), generates a
‘Gibbs sequence’ x1(1), x2(1),…, xp(1),…, x1(n), x2(n),…, xp(n) after n iterations. This
sequence is a realization of a Markov chain with a stationary distribution, which is the
unique multivariate distribution defined by the full conditionals. Thus for large n, x1(n),
x2(n)…, xp(n) can be considered as a random sample from the target multivariate
distribution. The Gibbs Sampling method is also useful to approximate the marginal
density f(xi), i=1,2,…,p of a joint density function f(x1,x2,…,xp) or its parameters by
9
Monte Carlo

averaging the final conditional densities of each sequence-for that matter, marginal
multivariate densities and their parameters.

Integration

In Monte Carlo Integration, an integral or equivalently an expectation of a random


variable is approximated by the sample mean of a function of simulated random
samples. Suppose a random sample {x1, x2,…,xn} is generated from a distribution with
density f(x) of random variable X and h(X) is a function of X, then

Ef [h( X )] = ∫ h( x) f ( x)dx

can be estimated by
∧ n
I n = 1n ∑ h ( x ) i
i =1

This is called Classical Monte Carlo Integration estimator. By the strong law of
large numbers ˆI n converges almost surely to Ef[h(X)]. The standard error of the
estimate is

2
n ⎡ ∧

1
∑ ⎢ h( xi ) − I n ⎥
n(n − 1) i =1 ⎣ ⎦

Importance Sampling (Geweke, 1989; Hesterberg, 1995) is another way to estimate


Ef[h(X)] by drawing an independent sample x1,x2,…,xn from a distribution of a given
importance density g(x), with g(x) > 0 and h ( x )f ( x ) ≠ 0
The integral can be estimated by
∧ n
I g = 1 ∑ h( x )w( x ) i i
n i =1

where w(xi)=f(xi)/g(xi), i=1,2,...,n, is a weight function, with standard error


10
Monte Carlo

2
n ⎡ ∧

1
∑ i i g⎥

n(n − 1) i =1 ⎣⎢
h ( x ) w( x ) − I
⎦⎥

The optimal importance density for minimizing the variance of integration estimator is

h( x ) f ( x )
g * ( x) =
∫ h( z ) f ( z )dz

The integration estimate can also be computed by Importance sampling ratio estimate
n

∑ h( x )w( x ) i i
Iˆw = i =1
n

∑ w( x )
i =1
i

and the corresponding standard error is

n 2
∑ ⎡⎣h( x ) − Iˆ
i =1
i w
⎤ [ w( xi )]2

2
⎡ n ⎤
⎢ ∑ w( xi ) ⎥
⎣ i =1 ⎦
The advantage of using the ratio estimate compared to the integration estimate is that
in using the latter we need to know the weight function (i.e., ratio of target and
importance functions) exactly, whereas in the former case, the ratio needs to be known
only up to a multiplicative constant. If the support of importance function g(x) consists
of the support of density function f(x), then the generated samples are i.i.d. and
Importance Sampling estimator converges almost surely to the expectation.
Monte Carlo Integration methods in SYSTAT are invoked only after generating a
random sample using any one of the univariate discrete and univariate continuous
random sampling methods, Rejection Sampling, Adaptive Rejection Sampling, or the
M-H algorithms. SYSTAT computes a Monte Carlo integration estimate and standard
11
Monte Carlo

error. There is a choice between the Classical Monte Carlo Integration method and the
two Importance Sampling methods. Classical Monte Carlo Integration estimate is
evaluated with respect to the density function related to the distribution from which
samples are generated. In the Importance Sampling Integration and Importance
Sampling Ratio procedures, the importance function is the density related to the
distribution from which samples are generated.

Rao-Blackwellized Estimates with Gibbs Samples

Let us consider a statistical problem of drawing inferences about a parameter θ of the


probability density function f(x|θ), based on a sample {x1,x2,…,xn}. By the sufficiency
principle, a statistic Τ is sufficient for θ if the conditional distribution of the sample
given the statistic does not involve the parameter θ . By Rao--Blackwell theorem, the
conditional expectation of the estimator given a sufficient statistic is an improved
estimator; i.e., when δ(x1,x2,…,xn) is an estimator of θ with finite variance,
Var {E[δ(x1,x2,…,xn)| T=t]}≤Var {δ(x1,x2,…,xn)}. Using a conditional expectation
with respect to another estimator as an improved estimator is often called Rao-
Blackwellization, even if the conditioning statistic is not a sufficient statistic. This
leads to the use of the Rao-Blackwellized estimator

1 n
δ rb = ∑ E ⎡ h( x1(t ) ) | x2 ,..., xn ⎤⎦
n t =1 ⎣
instead of the empirical estimator

1 n

n t =1
[
h( x1( t ) ) ]
in Gibbs Sampling method. See Liu et al. (1994), Robert and Casella (2004) for details.
12
Monte Carlo

Precautions to be taken in using IID Monte Carlo and MCMC features

You may obtain absurd results if suitable inputs are not given in the case of IID Monte
Carlo and MCMC algorithms. Some of the precautions to be taken are:
„ In Rejection Sampling, the output may not be appropriate if the target function’s
support is not a subset of the support of the proposal function or if the target
function is not dominated by a constant times the proposal density function.
„ Log-concavity of target function has to be checked by you before using the ARS
algorithm.
„ If you get a sample that does not cover the entire parameter space of left or right
bounded and bounded target functions using ARS algorithm, you should check
whether the assigned starting points consist of corresponding bounded values.
„ In M-H algorithms, it is your responsibility to generate an ergodic Markov chain
by selecting a suitable proposal density function.
„ You should ensure that the expectation of the integrand exists before doing Monte
Carlo Integration.
„ The time required to generate samples using MCMC algorithms depends, among
other factors, on the burn-in period and gap, and in some situations may be quite
large.
While using SYSTAT’s Gibbs Sampling algorithm to generate random samples from
the distribution of a p-dimensional random vector X = (X1, X2, …, Xp), you should
note that:
„ The input (defining conditional distributions) consists of only univariate
distributions from SYSTAT’s list of distributions.
„ The input should give the marginal distributions of each Xi given the rest of the
components of X.
„ The parameters of the conditional distributions have to be specified in the specified
syntax.
„ It is your responsibility to ensure that the above inputs satisfy the conditions
required for them to define uniquely the joint distribution of the components of X
as your target distribution.
13
Monte Carlo

Monte Carlo Methods in SYSTAT

Random Sampling

Before using the Random Sampling feature you should study the list of distributions,
the form of the density functions, especially in respect of the parameters and the names
and notations for the parameters, from the volume Data: Chapter 4: Data
Transformations: Distribution Functions. It may also be useful to consult the references
therein for the properties of these distributions and the meanings of the parameters. The
distributions are divided into three groups---univariate discrete, univariate continuous
and, multivariate.

Univariate Discrete Distributions Dialog Box

To open the Monte Carlo: Random Sampling: Univariate Discrete Distributions dialog
box, from the menus choose:
Addons
Monte Carlo
Random Sampling
Univariate Discrete…
14
Monte Carlo

Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Distribution. Choose the distribution from the drop-down list. The list consists of nine
univariate discrete distributions: Benford’s Law, Binomial, Discrete uniform,
Geometric, Hypergeometric, Logarithmic series, Negative binomial, Poisson, and
Zipf. Enter the values of the parameters (depending on the distribution selected) in the
boxe(es).
Save file. You can save the output to a specified file.

Univariate Continuous Distributions Dialog Box

To open the Monte Carlo: Random Sampling: Univariate Continuous Distributions


dialog box, from the menus choose:
Addons
Monte Carlo
Random Sampling
Univariate Continuous…
15
Monte Carlo

Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Distribution. Choose the distribution from the drop-down list.The list consists of
twenty eight univariate continuous distributions: Beta, Cauchy, Chi-square, Double
exponential, Erlang, Exponential, F, Gamma, Generalized lambda, Gompertz,
Gumbel, Inverse Gaussian (Wald), Logistic, Loglogistic, Logit normal, Lognormal,
Non-central chi-square, Non-central F, Non-central t, Normal, Pareto, Rayleigh, t,
Smallest extreme value, Studentized range, Triangular, Uniform, and Weibull. Enter
the values of the parameters (depending on the distribution selected) in the box(es).
Save file. You can save the output to a specified file.

Multivariate Distributions Dialog Box

To open the Monte Carlo: Random Sampling: Multivariate Distributions dialog box,
from the menus choose:
Addons
Monte Carlo
Random Sampling
Multivariate…
16
Monte Carlo

Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample you want to generate.
Random seed.The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Distribution. Choose the distribution from the drop-down list. The list consists of five
multivariate distribitions: Bivariate exponential, Dirichlet, Multinomial, Multivariate
normal, and Wishart. Enter the values of the parameters (depending on the distribution
selected) in the box(es).
Save file. You can save the output to a specified file.

Using Commands

For Univariate Discrete and Continuous random sampling:


RANDSAMP
SAVE filename
UNIVARIATE distribution notation (parameterlist)/SIZE=n1
NSAMPLE=n2 RSEED=n3

The distribution notation consists of parameter values as its arguments.


17
Monte Carlo

For multivariate random sampling:


RANDSAMP
SAVE filename
MULTIVARIATE distribution notation (parameterlist)/
SIZE=n1 NSAMPLE=n2 RSEED=n3

The distribution notation consists of parameter values as its arguments.

Distribution Notations used in Random Sampling

Distribution Name Distribution Parameter(s)


Notation
Benford’s law BLRN (B)
Binomial NRN (n,p)
Discrete uniform DURN (N)
Geometric GERN (p)
Hypergeometric HRN (N,m,n)
Logarithmic series LSRN (theta)
Negative binomial NBRN (k,p)
Poisson PRN (lambda)
Zipf ZIRN (shp)
Beta BRN (shp1,shp2)
Cauchy CRN (loc,sc)
Chi-square XRN (df)
Double exponential (Laplace) DERN (loc,sc)
Erlang ERRN (shp,sc)
Exponential ERN (loc,sc)
F FRN (df1,df2)
Gamma GRN (shp,sc)
Generalized lambda GLRN (lambda1,lambda2,lambda3,lambda4)
Gompertz GORN (b,c)
Gumbel GURN (loc,sc)
Inverse Gaussian (Wald) IGRN (loc,sc)
Logistic LRN (loc,sc)
Logit normal ENRN (loc,sc)
Loglogistic LORN (logsc, shp)
Lognormal LNRN (loc,sc)
Non-central chi-square NXRN (df1,nc)
18
Monte Carlo

Distribution Name Distribution Parameter(s)


Notation
Non-central F NFRN (df1,df2,nc)
Non-central t NTRN (df.nc)
Normal ZRN (loc,sc)
Pareto PARN (sc,shp)
Rayleigh RRN (sc)
Smallest extreme value SERN (loc,sc)
Studentized range SRN (k,df)
t TRN (df)
Triangular TRRN (a,b,c)
Weibull WRN (sc,shp)
Uniform URN (a,b)
* Bivariate exponential BERN (lambda1, lambda2, lambda3)
* Dirichlet DIRN (k.P)
* Multinomial (k,p) MRN (k,P,n)
* Multivariate normal ZPRN (p,mu,sigma)
* Wishart WIRN (p, df,sigma,c)

where low is the smallest value and hi, the largest value; loc is the location parameter
and sc, the scale parameter; shp is the shape parameter and thr, the threshold parameter,
nc is the non-centrality parameter (for univariate non-central distributions), and df is
the degrees of freedom.
Note: * indicates multivariate distributions.
Example: Normal random number with parameters (0, 1)
RANDOMSAMP
UNIVARIATE ZRN(0,1)

For parameter description of multivariate distributions please refer section PDFs of


Multivariate Distributions after Usage Considerations.
19
Monte Carlo

Rejection Sampling Dialog Box

To open the Monte Carlo: IIDMC: Rejection Sampling dialog box, from the menus
choose:
Addons
Monte Carlo
IIDMC
Rejection Sampling…

Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Target function. Specify your target function in the required syntax.
Constant. Enter the value that is an upper bound to supremum of ratio of target to
proposal functions.
Proposal. Select a suitable proposal distribution function. The list consists of twenty
univariate continuous distributions: Beta, Cauchy, Chi-square, Double Exponential ,
20
Monte Carlo

Exponential, F, Gamma, Gompertz, Gumbel, Inverse Gaussian , Logistic, Logit


normal, Lognormal, Normal, Pareto, Rayleigh, t, Triangular, Uniform, and Weibull.
Save file. You can save the output to a specified file.

Adaptive Rejection Sampling Dialog Box

To open the Adaptive Rejection Sampling dialog box, from the menus choose:
Addons
Monte Carlo
IIDMC
Adaptive Rejection Sampling…

Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Target function. Specify your target function, which should satisfy the log concavity
condition.
21
Monte Carlo

Support of target. The method at first constructs a proposal function using initial
points on the support of target distribution and extends it depending on the type of the
target function. Bounds and starting points should be given.
„ Unbounded. Specifies the support of target as unbounded. The two points are
starting points.
„ Right bounded. Specifies the support of target as right bounded. The left point is a
starting point and the right one is a bound.
„ Left bounded. Specifies the support of target as left bounded. The right point is a
starting point and the left one is a bound.
„ Bounded. Specifies the support of target as bounded. The left and right starting
points are bounds.
„ Left point/bound. Enter a point preferably to the left side of the mode of the target
function.
„ Right point/bound. Enter a point preferably to the right side of the mode of the
target function.
Save file. You can save the output to a specified file.

Using Commands

For the Rejection Sampling method:


IIDMC
SAVE FILENAME
REJECT RJS(targetexpression, proposalexpression, constant)/
SIZE=n1 NSAMPLE= n2 RSEED=n3

The distribution notation for the proposal expression can be chosen from the notations
of Beta, Cauchy, Chi-square, Exponential, F, Gamma, Gompertz, Gumbel, Double
Exponential(Laplace), Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh,
t,Triangular, Uniform, Inverse Gaussian (Wald), and Weibull distributions.
For the Adaptive Rejection Sampling method:
IIDMC
SAVE FILENAME
REJECT ARS(targetexpression, rangeexpression)/SIZE=n1
NSAMPLE= n2 RSEED=n3
22
Monte Carlo

Range expressions for target function in ARS command are listed in the following
table:

Range expression Description


UB(a,b) Specifies the target as unbounded. The two points are starting
points.
LB(a,b) Specifies the target function as left bounded. The left point is a
bound as well as starting point whereas the right point is a start-
ing point only.
RB(a,b) Specifies the target function as right bounded. The right point is
a bound as well as starting point whereas the left point is a start-
ing point only.
BD(a,b) Specifies the support of target as bounded. The left and right
starting points are also bounds.

M-H Algorithm Dialog Box

In the Monte Carlo: MCMC: Metropolis-Hastings algorithm, specify the particular


type of algorithm you want to use: Random walk, Independent or hybrid.

To open the M-H Algorithm dialog box, from the menus choose:
Addons
Monte Carlo
MCMC
M-H Algorithm…
23
Monte Carlo

Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Burn-in. Enter the size of random sample to be discarded initially from the chain.
Gap. Enter the difference between the indices of two successive random observations
that can be extracted from the generated sequence.
Target function. Specify your target function.
Algorithm type. Select the algorithm from the following:
„ Random walk. Generates random sample using RWM-H algorithm.
„ Independent. Generates random sample using IndM-H algorithm.
„ Hybrid RWInd. Generates random sample using Hybrid RWInd M-H algorithm.

Support of target. Support of your target distribution can be specified as bounded, left
bounded, right bounded, and unbounded.
„ Unbounded. Specifies the support of target as unbounded.
„ Right bounded. Specifies the support of target as right bounded.
24
Monte Carlo

„ Left bounded. Specifies the support of target as left bounded.


„ Bounded. Specifies the support of target as bounded.
„ Left bound. Enter the left bound to the target, which has its support as left
bounded/bounded.
„ Right bound. Enter the right bound to the target, which has its support as right
bounded/bounded.
Initial value. Specifies initial value to the Markov chain to be generated. Select a
distribution from the drop-down list that has the same support as that of the target
distribution.
Proposal. An appropriate proposal distribution should be selected from the list of
twenty univariate continuous distributions: Beta, Cauchy, Chi-square, Double
Exponential (Laplace), Exponential, F, Gamma, Gompertz, Gumbel, Inverse Gaussian
(Wald), Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh, t, Triangular,
Uniform, and Weibull.
„ Random Walk. Select one from the given univariate continuous distributions list,
when Random Walk Metropolis-Hastings algorithm or Hybrid Random Walk
Independent Metropolis-Hastings algorithm is selected. When the proposal density
is continuous and positive around zero, the generated RWM-H chain is ergodic. It
is recommended that a symmetric proposal distribution around zero be used.
„ Independent. Select one from the given univariate continuous distributions list,
when Independent Metropolis-Hastings algorithm or Hybrid Random Walk
Independent Metropolis-Hastings algorithm is selected. It is suggested that the
proposal be selected in such a way that the weight function is bounded. In that case,
the generated Markov chain is uniformly ergodic. It must be ensured that the
support of the proposal distribution covers the support of the target distribution.
Save file. You can save the output to a specified file.

Gibbs Sampling Dialog Box

To open the Monte Carlo: MCMC: Gibbs Sampling dialog box, from the menus
choose:
Addons
Monte Carlo
MCMC
Gibbs Sampling…
25
Monte Carlo

Number of samples. Enter the number of samples that you want to generate.
Sample size. Enter the size of the multivariate sample that you want to generate.
Random seed. The default random number generator is the Mersenne-Twister
algorithm. For the seed, specify any integer from 1 to 4294967295 for the MT
algorithm and 1 to 30000 for the Wichmann-Hill algorithm; otherwise SYSTAT uses a
seed based on system time.
Gap. Enter the difference between the indices of two successive random observations
that can be extracted from the generated sequence.
Burn-in. Enter the size of random sample to be discarded initially from the chain.
Use file. Open the data file, where variables in the data file are part of the parameter
expressions of full conditionals.
Full conditionals. Specify the full conditional distributions.
„ Variables. Enter the variable for which you want to generate random sample.
„ Distribution. Select the required distribution from the list provided. The list
consists of seven univariate discrete and twenty univariate continuous
distributions. They are Binomial, Discrete uniform, Geometric, Hypergeometric,
Poisson, Negative Binomial, Zipf, Beta, Cauchy, Chi-square, Double Exponential
(Laplace), Exponential, F, Gamma, Gompertz, Gumbel, Inverse Gaussian (Wald),
26
Monte Carlo

Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh, t, Triangular,


Uniform, and Weibull.
„ Parameter. Specify the expression or number for the parameter related to the
distribution.
„ Initial Value. Enter the initial value of each variable.

Save file. You can save the output to a specified file.


27
Monte Carlo

Integration Dialog Box

To obatin the Monte Carlo: Integration dialog box, from the menus choose:
Addsons
Monte Carlo
Integration…

Integrand. Specify the function for which integration estimate is required.


Method. Select the integration method you want:
„ Classical Monte Carlo. Computes classical Monte Carlo integration estimate.
„ Importance sampling integration. Computes Importance Sampling integration
estimate.
„ Importance sampling ratio. Computes Importance Sampling ratio estimate.

Density function. Type your density function, which is the numerator of the weight
function in the Importance Sampling method.
28
Monte Carlo

Using Commands

For the Metropolis-Hastings Algorithm:


MCMC
SAVE FILENAME
MH MHRW(target_exp, range_exp , proposal_exp,initialvalue_exp)/
SIZE= n1 NSAMPLE=n2 BURNIN=n3 GAP=n4 RSEED=n5
or
MHIND(target_exp, range_exp, proposal_exp, initialvalue_exp)/
SIZE= n1 NSAMPLE=n2 BURNIN=n3 GAP=n4 RSEED=n5
or
MHHY(target_exp, range_exp , rw_ proposal_exp ,
ind_ proposal_exp, initialvalue_exp)/
SIZE= n1 NSAMPLE=n2 BURNIN=n3 GAP=n4 RSEED=n5

Range expressions for target function in MH command are listed in following table:

Range expression Description


UB Specifies the target as unbounded.
LB(a) Specifies the target function as left bounded at a.
RB(b) Specifies the target function as right bounded at b.
BD(a,b) Specifies the support of target as bounded between a and b.

The distribution notation for the proposal can be chosen from the notations of Beta,
Cauchy, Chi-square, Exponential, F, Gamma, Gompertz, Gumbel, Double Exponential
(Laplace), Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh, t, Triangular,
Uniform, Inverse Gaussian (Wald), Weibull distributions.

For the Gibbs Sampling method:


MCMC
USE FILENAME
VARIABLE DECLARATIONS
GVAR VARLIST
FUNCTION fname_1()
{
LIST OF STATEMENTS
}
.
.
.
FUNCTION fname_k()
{
LIST OF STATEMENTS
}
GIBBS(fname_1(),…,fname_k()) /SIZE=n1 NSAMPLE=n2 BURNIN=n3
GAP=n4 RSEED=n5
29
Monte Carlo

or
MCMC
USE FILENAME
VARIABLE DECLARATIONS
GVAR VARLIST
GIBBS(fcexpname_1,…, fcexpname_k) /SIZE=n1 NSAMPLE=n2
BURNIN=n3 GAP=n4 RSEED=n5

The distribution notations for full conditionals can be chosen from the notations of
Binomial, Discrete Uniform, Geometric, Hypergeometric, Negative Binomial,
Poisson, Zipf, Beta, Cauchy, Chi-square, Exponential, F, Gamma, Gompertz, Gumbel,
Double Exponential (Laplace), Logistic, Logit normal, Lognormal, Normal, Pareto,
Rayleigh, t, Triangular, Uniform, Inverse Gaussian (Wald) and Weibull distributions.

For the Monte Carlo Integration method:


INTEG expression ; MC or IMPSAMPI DENFUN=expression or IMPSAMPR
DENFUN = expression

Monte Carlo Integration methods in SYSTAT can be used only after generating random
samples from any one of univariate discrete and univariate continuous random
sampling methods, Rejection Sampling, Adaptive Rejection Sampling and M-H
algorithms.

Usage Considerations
Types of data. Gibbs Sampling and Monte Carlo Integration use rectangular data only.
For the remaining features no input data are needed.
Print options. There are no PLENGTH options.
Quick Graphs. Monte Carlo produces no Quick Graphs. You use the generated file and
produce the graphs you want. For more information. refer examples.
Saving files. Generated samples can be saved in the file mentioned. For all distributions
(except Wishart) case number refers to observation number. For all univariate
distributions column names are s1, s2, …(number after s denotes sample number). For
multivariate distributions, the format of the saved/output file is as follows: Column
name format is “s*v*”, where * after s denotes sample number and * after v denotes
variable number. For Wishart, the leading column “OBS_NO” with elements “o*v*”,
where * after o denotes observation number and * after v denotes variable number. The
output format of Rejection sampling, ARS and M-H algorithms are the same as the
30
Monte Carlo

univariate distributions. For Gibbs Sampling, column name is the name of the variable
with sample number.
By groups. By groups is not relevant.
Case frequencies. Case frequency is not relevant.
Case weights. Case weight is not relevant.
31
Monte Carlo

Distribution Notations used in IIDMC and MCMC


Rejection
Distribution Sampling Parameter
M-H Algorithm
Uniform U (a,b)
Normal Z (loc,sc)
t T (df)
F F (df1,df2)
Chi-square X (df)
Gamma G (shp,sc)
Beta B (shp1,shp2)
Exponential E (loc,sc)
Logistic L (loc,sc)
Studentized Range (k,df)
Weibull W (sc,shp)
Cauchy C (loc,sc)
Double Exponential DE (loc,sc)
Gompertz GO (b,c)
Gumbel GU (loc,sc)
Inverse Gaussian IG (loc,sc)
(Wald)
Logit Normal EN (loc,sc)
Lognormal LN (loc,sc)
Pareto PA (sc,shp)
Rayleigh R (sc)
Triangular TR (a,b,c)
Binomial (n,p)
Poisson lambda
Discrete Uniform (N)
Geometric (p)
Hypergeometric (N,m,n)
Negative Binomial (k,p)
Zipf (shp)

where low is the smallest value and hi, the largest value; loc is the location parameter
and sc, the scale parameter; shp is the shape parameter and thr, the threshold parameter,
32
Monte Carlo

nc is the non-centrality parameter (for univariate non-central distributions), and df is


the degrees of freedom.
Note: In Gibbs sampling distribution notations are the same as in random sampling (all
univariate distribution).

PDFs of Multivariate Distributions


Multinomial:
Parameters: k, P, n
k: number of cells (occurrences),
P: probability vector for k cells
n: number of independent trials ,
Note: n is optional; if not specified takes default value 1.
Positive integers: n, k (>2), Cell probabilities pi’s , i=1,2,…,k, should add to 1 (pi’s in
(0,1))

PDF:

k k
n!
P[ N i = ni , i = 1, 2,...k ] = k ∏ pini , n ≥ 1, ∑ ni = n
∏n !
i =1
i
i =1 i =1

Note: Case “k=2” is binomial distribution and “k=1” degenerate.

Bivariate exponential: (Marshal-Olkind Model)


Parameters:λ 1, λ2, λ 12
λ1: positive real (Failure rate 1)
λ2: positive real (Failure rate 2)
λ12: positive real (Failure rate 3)
33
Monte Carlo

PDF:

⎧λ2 (λ1 + λ12 ) F ( x1 , x2 ), for 0 < x2 < x1



f ( x1 , x2 ) = ⎨λ1 (λ2 + λ12 ) F ( x1 , x2 ), for 0 < x1 < x2
⎪ λ F ( x, x), for x = x = x > 0
⎩ 12 1 2

where,
F ( x1 , x2 ) = exp{−λ1 x1 − λ2 x2 − λ12 max( x1 , x 2 )}, for 0 < x1 , x2

Note: λ12 positive real (Failure rate 3), sometimes denoted by λ3.

Dirichlet
Parameters: k, P
k : positive integer (>2)
P: k dimensional vector of shape parameters (each component is positive real number).
k
PDF: Each xi is in [0,1] such that ∑x
i =1
i =1

k
Γ(∑ p j ) k

∏x
j =1 p j −1
f ( x) = k j

∏ Γ( p
j =1
j ) j =1

Note: Case k=2 is beta distribution.

Multivariate normal:
Parameters: p, mu, sigma
p: positive integer (>1)
mu: p x 1 vector of reals
sigma: pxp symmetric positive definite matrix.
34
Monte Carlo

PDF:

−1/ 2 1
f ( x) = (2π ) − p / 2 Σ exp{− ( x − μ )T Σ −1 ( x − μ )}, x ∈ R p
2

Wishart:
Parameters:p, m, sigma, c
p: positive integer(>1)
m: positive integer (>=p)(degree of freedom)
sigma: pxp symmetric positive definite matrix
c : pxp matrix of non-centrality parameters.

Let Y1’, Y2’,…,Ym’ be independent p variate normal with parameters (mu)i,


i=1,2,…,m, and same sigma, then:

m
W = ∑ Yi '*Yi
i =1

has non-central Wishart distribution with parameters (m, sigma, c) where

c = ( E (Y ))'*(E (Y )) * Σ −1 and Y is (m by p) matrix with ith row as Yi.


35
Monte Carlo

PDF:

W=Y’*Y. Probability density of W matrix is given as

( m − p −1) / 2 1 m 1
f W ( S ) = w( p, Σ, m, M ' ) S exp[− tr (Σ −1 S )]0 F1 ( ; Σ −1 M ' MΣ −1 S
2 2 4
where M=E(Y), (m by p matrix)
−1
⎛ m ⎞ −m / 2 1
w( p , Σ , m , M ' ) = ⎜ Γ p ( ) ⎟ 2Σ exp[ − tr ( Σ −1 M ' M )],
⎝ 2 ⎠ 2

p
m 1
Γp ( ) = π p ( p −1) / 4
* ∏ Γ[ ( m + 1 − i )]
2 i =1 2

m
and 0 F1 ( ,*) is the hypergeometric function.
2
Expressions in Monte Carlo

For IIDMC and M-H algorithms, the target functions from which random sample is
generated are expressions which involve mathematical functions of a single variable.
The integrand in Monte Carlo Integration and the density function in Importance
Sampling procedures are expressions. In the Gibbs Sampling method, the parameters
of full conditionals are expressions, which may involve variables from a data file and
mathematical functions. For construction of expressions you can use all numeric
functions from SYSTAT’s function library.
36
Monte Carlo

Examples

Example 1
Sampling Distribution of Double Exponential (Laplace) Median
This example generates 500 samples each of size 20 and investigates the distribution
of the sample median by computing the median of each sample.
The input is:
RANDSAMP
UNIVARIATE DERN(2,1) / SIZE=20 NSAMPLE=500 RSEED=23416

Using the generated (500) samples, the distribution of sample median can be obtained.
The input is:
SSAVE 'STATS'
STATISTICS S1 .. S500 /MEDIAN
USE 'STATS'
TRANSPOSE S1..S500
VARLAB COL(1) / 'MEDIAN'
CSTATISTICS COL(1) / MAXIMUM MEAN MINIMUM SD VARIANCE N
SWTEST
BEGIN
DENSITY COL(1) / HIST XMIN=0 XMAX=4
DENSITY COL(1) / NORMAL XMIN=0 XMAX=4
END

The output is as follows; COL(1) contains the sample medians.


37
Monte Carlo

SAMPLING DISTRIBUTION OF LAPLACE(2,1) MEDIAN

200 0.4

150 0.3

Proportion per Bar


Count

100 0.2

50 0.1

0 0.0
0 1 2 3 4
MEDIAN

¦ MEDIAN
-----------------------+-------
N of Cases ¦ 500
Minimum ¦ 1.190
Maximum ¦ 2.665
Arithmetic Mean ¦ 1.994
Standard Deviation ¦ 0.248
Variance ¦ 0.062
Shapiro-Wilk Statistic ¦ 0.994
Shapiro-Wilk p-value ¦ 0.050

We observe that the sampling distribution of the double exponential sample median can
be described to be normal.

Example 2
Simulation of Assembly System
Consider a system having two parallel subsystems (A and B) connected in a series with
another subsystem (C), as shown in the structural diagram. In such a system, work at
"C" can start only after the work at "A" and "B" is completed. The process completion
time for this system is the maximum of the processing times for "A" and "B" plus the
processing time for "C".
38
Monte Carlo

Assume that the system is a production line for a specific product, and that the
processing time distributions for the three subsystems are independent. Let us specify
that:

A~ Exponential (0.2, 0.7)


B~ Uniform (0.2, 1.2)
C~ Normal (2, 0.3)

The production engineer wants to find the distribution of manufacturing time and to
estimate the probability that the manufacturing time is less than 5 units of time.

The input is:


RANDSAMP
UNIVARIATE ERN (0.2, 0.7)/SIZE = 10000 NSAMPLE=1
RSEED=123
LET S2=URN (0.2, 1.2)
LET S3=ZRN (2, 0.3)
LET TIME=MAX (S1, S2) +S3
DENSITY TIME / HIST
LET PROB = (TIME <=5)
CSTATISTICS PROB/MEAN
39
Monte Carlo

The output is:

3000

Proportion per Bar


2000 0.2
Count

1000 0.1

0 0.0
0 5 10 15
TIME

¦ PROB
-----------------+------
Arithmetic Mean ¦ 0.979

The output shows the histogram of 10000 simulated manufacturing times; the
estimated probability that manufacturing time is less than 5 time units is 0.979.

Example 3
Generation of Random Sample from Bivariate Exponential (Marshal-Olkin
Model) Distribution
An electronics engineer wants to study the joint distribution of two specific electronic
subsystems in her assembly. From her prior knowledge she knows the mean failure
time for the first subsystem as 1.2 (units) and for the second subsystem, as 1.3 (units).
If some strong shock occurs, then both of these subsystems fail. She also knows the
mean occurrence time for this strong shock as 0.1 (units). Assuming the Marshal-Olkin
model, realization of her assembly failures are carried out and the input is:
RANDSAMP
MULTIVARIATE BERN(1,1,0.1) / SIZE = 10000 NSAMPLE=1
RSEED=542375
CSTATISTICS / MAXIMUM MEAN MINIMUM SD VARIANCE N
PLOT S1V1*S1V2/ BORDER=HIST
GRAPH OFF
CORR
PEARSON S1V1 S1V2
GRAPH ON
40
Monte Carlo

The output is:


¦ S1V1 S1V2
-------------------+--------------
N of Cases ¦ 10000 10000
Minimum ¦ 0.000 0.000
Maximum ¦ 8.336 8.163
Arithmetic Mean ¦ 0.908 0.905
Standard Deviation ¦ 0.914 0.907
Variance ¦ 0.835 0.823

6
S1V1

0
0 1 2 3 4 5 6 7 8 9
S1V2

Number of Observations: 10000

Pearson Correlation Matrix

¦ S1V1 S1V2
-----+--------------
S1V1 ¦ 1.000
S1V2 ¦ 0.047 1.000

Example 4
Evaluating an Integral by Monte Carlo Integration Methods
⎛ πx ⎞
1

This example explains the evaluation of ∫ cos⎜⎝ 2 ⎟⎠dx using Monte Carlo Integration
0
methods.

Using the Classical Monte Carlo Integration method, the integration can be evaluated
by
41
Monte Carlo

1 n ⎛ π . xi ⎞ ,
Iˆn = ∑ cos⎜ ⎟
n i =1 ⎝ 2 ⎠
where xi are generated from the uniform distribution on[0,1].
The input is:

RANDSAMP
UNIVARIATE URN(0,1)/ SIZE=10000 NSAMPLE=1 RSEED=76453782
MCMC
INTEG COS(3.14159*X/2); MC

The output is:


Classical Monte-Carlo Integration estimates for S1

Empirical Mean : 0.635


Standard Error: 0.003

Importance Sampling, a variance reduction technique can be used to evaluate the given
integration more accurately. An optimal importance function (3/2)(1-x2), which is
proportional to can be used to estimate the above integral by

1 n ⎡ 1 ⎤
. Iˆg = ∑ ⎢cos(πxi 2 ) 2 ⎥
n i =1 ⎣ (3 / 2)(1 − x ) ⎦
Since (3/2)(1-x2) is a log-concave function on (0, 1), the ARS algorithm can be used to
generate random samples from this density and the input is:
FORMAT 9,6
IIDMC
REJECT ARS((3/2)*(1-X^2),BD(0.0,1.0)) /SIZE=5000
RSEED=76453782
MCMC
INTEG FUN='COS(PI*X/2)' DENFUN='1' /IMPSAMPI
FORMAT

The output is:


Importance Sampling Integration estimates for S1

Weighted Mean: 0.636293


Standard Error: 0.000449

The Importance Sampling Integration estimate is an improved one compared to the


Classical Monte Carlo Integration Estimate.
42
Monte Carlo

Example 5
Rejection Sampling
If α is an integer, random samples from gamma(α, β) can be generated by adding α
number of independent exponential(β)’s. But if α is not an integer, this simple method
is not applicable. Even though we can generate random samples from gamma(α, β)
using SYSTAT’s univariate continuous random sampling procedure, this example
illustrates an alternative method, using Rejection Sampling by considering
uniform(0,15), exponential(2.43), and gamma ([2.43],2.43/[2.43]) distributions
(Robert and Casella, 2004) as proposals in different exercises. [2.43] is the integer part
of 2.43.

„ Generating random sample from using the


uniform density function as proposal, computing basic statistics from this sample
and approximating E(X2) using Monte Carlo Integration method.

The input is:


IIDMC
REJECT RJS((1/(EXP(LGM(2.43))))*EXP(-X)*X^1.43,U(0,15),4.7250) ,
/SIZE=100000 NSAMPLE=1 RSEED=3245425
MCMC
INTEG X^2 ; MC
CSTATISTICS S1 / MAXIMUM MEAN MINIMUM SD VARIANCE N
DENSITY S1 / HIST

The output is:


Classical Monte-Carlo Integration estimates for S1

Empirical Mean: 8.308


Standard Error: 0.036

¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.020
Maximum ¦ 14.395
Arithmetic Mean ¦ 2.426
Standard Deviation ¦ 1.557
Variance ¦ 2.424
43
Monte Carlo

8000 0.08

7000 0.07

6000 0.06

Proportion per Bar


5000 0.05
Count

4000 0.04

3000 0.03

2000 0.02

1000 0.01

0 0.00
0 5 10 15
S1

„ Generating a random sample from using an


exponential density function as proposal, computing basic statistics from this
sample and approximating E(X2) using Monte Carlo Integration method.

The input is:


IIDMC
REJECT RJS((1/(EXP(LGM(2.43))))*EXP(-X)*X^1.43, E(0,2.43), 1.6338),
/ SIZE=100000 NSAMPLE=1 RSEED=534652
MCMC
INTEG X^2;MC
CSTATISTICS S1/ MAXIMUM MEAN MINIMUM SD VARIANCE N
DENSITY S1 / HIST

The output is:


Classical Monte-Carlo Integration estimates for S1

Empirical Mean: 8.323


Standard Error: 0.036

¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.005
Maximum ¦ 15.603
Arithmetic Mean ¦ 2.429
Standard Deviation ¦ 1.556
Variance ¦ 2.422
44
Monte Carlo

10000 0.10

9000 0.09

8000 0.08

Proportion per Bar


7000 0.07

6000 0.06
Count

5000 0.05

4000 0.04

3000 0.03

2000 0.02

1000 0.01

0 0.00
0 5 10 15 20
S1

„ Generating random sample from using


a gamma density function as proposal, computing basic statistics from this sample
and approximating E(X2) using Monte Carlo Integration method.

The input is:


IIDMC
REJECT RJS((1/(EXP(LGM(2.43))))*EXP(-X)*X^1.43, G(2,1.2150),1.1102),
/SIZE=100000 NSAMPLE=1 RSEED=236837468
MCMC
INTEG X^2 ; MC
CSTATISTICS S1/ MAXIMUM MEAN MINIMUM SD VARIANCE N
DENSITY S1 / HIST

The output is:


Classical Monte-Carlo Integration estimates for S1

Empirical Mean: 8.339


Standard Error: 0.036

¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.008
Maximum ¦ 15.376
Arithmetic Mean ¦ 2.430
Standard Deviation ¦ 1.560
Variance ¦ 2.434
45
Monte Carlo

10000 0.10

9000 0.09

8000 0.08

Proportion per Bar


7000 0.07

6000 0.06
Count

5000 0.05

4000 0.04

3000 0.03

2000 0.02

1000 0.01

0 0.00
0 5 10 15 20
S1

Random samples from the density function

are generated using gamma, exponential and uniform distributions as proposals. But,
the probability of accepting a sample from target function, 1/M varies with different
proposals.

Proposals Acceptance Probability


Uniform 0.21164021
Exponential 0.61207002
Gamma 0.90073861

The probability of accepting a proposal sample as a target variate depends on how close
the product of proposal and constant is to the target function. Observe this by plotting
the target function and product of proposal and constant together.
46
Monte Carlo

The input is:


BEGIN
FPLOT Y= GDF(X,2.43,1); XMIN=0 XMAX=10 YMIN=0 YMAX=0.4 HEIGHT=2
WIDTH=2 LOC=0,0 TITLE='UNIFORM PROPOSAL'
FPLOT Y= 4.7250*UDF(X,0,15); XMIN=0 XMAX=10 YMIN=0 YMAX=0.4
COLOR=1 HEIGHT=2 WIDTH=2 LOC=0,0
FPLOT Y= GDF(X,2.43,1); XMIN=0 XMAX=10 YMIN=0 YMAX=0.6 HEIGHT=2
WIDTH=2 LOC=3,0 TITLE='EXPONENTIAL PROPOSAL'
FPLOT Y= 1.63382*EDF(X,0, 2.43); XMIN=0 XMAX=10 YMIN=0 YMAX=0.6
COLOR=1 HEIGHT=2 WIDTH=2 LOC=3,0
FPLOT Y= GDF(X,2.43,1); XMIN=0 XMAX=10 YMIN=0 YMAX=0.4 HEIGHT=2
WIDTH=2 LOC=6,0 TITLE='GAMMA PROPOSAL'
FPLOT Y= 1.1102*GDF(X,2,1.2150); XMIN=0 XMAX=10 YMIN=0 YMAX=0.4
COLOR=1 HEIGHT=2 WIDTH=2 LOC=6,0
END

The output is:

UNIFORM PROPOSAL EXPONENTIAL PROPOSAL GAMMA PROPOSAL

0.4 0.60 0.4

0.3 0.45 0.3


Y

Y
0.2 0.30 0.2

0.1 0.15 0.1

0.0 0.00 0.0


0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
X X X

Figure i Figure ii Figure iii

When the uniform density function (Figure i) is considered as proposal, most of the
generated points are outside the accepted region. In Figure ii (exponential) and Figure
iii (gamma) the means of both target and proposal functions are the same, but when the
gamma density function is taken as proposal (Figure iii), the product of constant and
proposal is closer to the target function; thus, a generated point from proposal is
accepted as a sample from target function with high probability and hence simulated
values converge to theoretical values (mean=2.43, variance =2.43, and E[X2]=8.3349)
quickly.
47
Monte Carlo

Example 6
Estimating Mean and Variance of a Bounded Posterior Density Function
using RWM-H Algorithm and IndM-H Algorithm
Let the observations {1,1,1,1,1,1,2,2,2,3} be from the (discrete) logarithmic series
distribution with density
θx
p( x | θ ) = , x = 1, 2,3,... and 0 <θ <1
x(− log(1 − θ ))
From a sample of size 10, the logarithmic series distribution with parameter θ leads to
(unnormalized) likelihood function,

θ 15
L(θ ) =
[− log(1 − θ )]10
Let the prior be π(θ )=6θ (1-θ ). Then the posterior up to a multiplicative constant is

θ 16 (1 − θ ) for 0 < θ < 1.


[− log(1 − θ )]10

This example extracted from Monahan (2001) illustrates the generation of random
samples from the specified posterior using the Random Walk Metropolis-Hastings
algorithm and the Independent Metropolis-Hastings algorithm.
To generate a random sample using the RWM-H algorithm, the selected proposal is
uniform(-0.1, 0.1), which is symmetric around zero with small steps. Since the target
function is bounded between 0 and 1, the value generated by the initial distribution
should lie between 0 and 1 and thus the initial distribution is chosen as uniform(0,1).
For getting samples from the posterior and computing its basic statistics, the input is:

MCMC
MH MHRW((X^16*(1-X))/((-LOG(1-X))^10),BD(0,1), U(-0.1,0.1),
U(0.0,1.0)) / SIZE=100000,
NSAMPLE=1 BURNIN=500 GAP=30 RSEED=237465

CSTATISTICS S1/ MAXIMUM MEAN MINIMUM SD VARIANCE N


DENSITY S1 /KERNEL
48
Monte Carlo

The output is:


¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.062
Maximum ¦ 0.966
Arithmetic Mean ¦ 0.528
Standard Deviation ¦ 0.136
Variance ¦ 0.019

5000

4000

3000
Count

2000

1000

0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
S1

The mean and variance from the simulated data are 0.528 and 0.019 respectively.
„ When IndM-H is used, the support of the proposal should contain the support of
the target function; hence the selected proposal in this example is uniform(0,1). For
generating random samples from the posterior and getting its mean and variance,
the input is:
MCMC
MH MHIND((X^16*(1-X))/((-LOG(1-X))^10), BD(0,1), U(0.0,1.0),
U(0.0,1.0)) / SIZE=100000,
NSAMPLE=1 BURNIN=500 GAP=30 RSEED=65736736
CSTATISTICS S1/ MAXIMUM MEAN MINIMUM SD VARIANCE N
DENSITY S1 / KERNEL
49
Monte Carlo

The output is:


¦ S1
-------------------+-------
N of Cases ¦ 100000
Minimum ¦ 0.066
Maximum ¦ 0.966
Arithmetic Mean ¦ 0.527
Standard Deviation ¦ 0.137
Variance ¦ 0.019

5000

4000

3000
Count

2000

1000

0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
S1

The mean and variance of the posterior from simulated data obtained by RWM-H
algorithm and IndM-H algorithm are approximately 0.527 and 0.019 respectively.

Example 7
Generating Bivariate Normal Random Samples by Gibbs Sampling Method
This example explains the generation of a random sample from a bivariate normal
distribution by iteratively generating univariate normal random samples.

The input is:


MCMC
SAVE GIBBSBVN
X1=0.0
X2=0.0
GVAR X1,X2
GIBBS (ZRN(0.98*X2, 0.1990), ZRN(0.98*X1, 0.1990))/SIZE=10000
NSAMPLE=1 BURNIN=500 GAP=50, RSEED=231
50
Monte Carlo

The sample generated by the Gibbs Sampling method can be visualized through
different SYSTAT graphs. The input is:
USE GIBBSBVN
PLOT x21*x11 / BORDER=HIST

The output is:

1
X21

-1

-2

-3

-4
-4 -3 -2 -1 0 1 2 3 4
X11

By iteratively generating samples from full conditional distributions, samples from a


multivariate distribution are generated. The scatter plot shows the bivariate density
function of X1 and X2. The histograms on the border of the scatter plot are the
univariate marginal distributions of X1 and X2.
Estimates of various parameters associated with the marginal distributions of X1
and X2 are obtained as descriptive statistics from the generated sample and also by
Rao-Blackwellization as follows.

The input is:


LET RBEX11=0.98*X21
LET RBEX21=0.98*X11
CSTATISTICS/ MAXIMUM MEAN MEDIAN MINIMUM SD VARIANCE N,
PTILE=2.5 50 97.5
51
Monte Carlo

The output is:


¦ X11 X21 RBEX11 RBEX21
-------------------+----------------------------------
N of Cases ¦ 10000 10000 10000 10000
Minimum ¦ -3.680 -3.615 -3.542 -3.607
Maximum ¦ 3.985 3.726 3.652 3.906
Median ¦ 0.014 0.008 0.007 0.014
Arithmetic Mean ¦ -0.007 -0.010 -0.010 -0.007
Standard Deviation ¦ 0.989 0.988 0.968 0.970
Variance ¦ 0.979 0.976 0.938 0.940
Method = CLEVELAND ¦
2.500% ¦ -1.931 -1.937 -1.899 -1.893
50.000% ¦ 0.014 0.008 0.007 0.014
97.500% ¦ 1.929 1.918 1.880 1.890

It can be noticed that the Rao-Blackwellized estimates close to the true values and also
that their variances are smaller than those of the naïve estimates.

Example 8
Gene Frequency Estimation
Rao (1973) illustrated maximum likelihood estimation of gene frequencies of O, A,
and B blood groups through the method of scoring. McLachlan and Krishnan (1997)
used the EM algorithm for the same problem. This example illustrates Bayesian
estimation of these gene frequencies by the Gibbs Sampling method. Consider the
following multinomial model with four cell frequencies and their probabilities with
parameters p, q, r with p+q+r=1. Let n = no+ nA+ nB+nAB.

Data Model
no=176 r2
nA=182 p2+2pr
nB=60 q2+2qr
nAB=17 2pq

Let us consider a hypothetical augmented data for this problem to be nO, nAA, nAO,nBB,
nBO, nAB with a multinomial model {n; (1-p-q)2, p2, 2p(1-p-q), q2, 2q(1-p-q), 2pq}.
With respect to the latter full model, nAA , nBB could be considered as missing data.

MODEL:
X ~ Multinomial6 (435; (1-p-q)2, p2, 2p(1-p-q), q2, 2q(1-p-q), 2pq)
Prior information:

(p, q, r) ~ Dirichlet (α, β, γ )


52
Monte Carlo

The full conditional densities take the form:

⎛ p2 ⎞
n AA ~ Binomial ⎜⎜ n A , 2 ⎟
⎝ p + 2 p(1 − p − q) ⎟⎠
⎛ q2 ⎞
n BB ~ Binomial ⎜⎜ n B , 2 ⎟
⎝ q + 2q(1 − p − q) ⎟⎠
p ~ (1 − q) Beta (2n AA + n AO + n AB + α , 2nOO + n AO + n BO + γ )
q ~ (1 − p) Beta(2n BB + n BO + n AB + β , 2nOO + n AO + nBO + γ )

For generating random samples from p and q, the generated value from the beta
distribution is to be multiplied with (1-q) and (1-p) respectively. Since it is not possible
in our system to implement this, let us consider
p ~ Beta(2n AA + n AO + n AB + α , 2nOO + n AO + nBO + γ )
q ~ Beta(2nBB + n BO + n AB + β , 2nOO + n AO + nBO + γ )

and whenever p and q appear in other full conditionals, p is replaced by (1-q)p and q is
replaced by (1-p)q. By taking α=2, β=2, and γ =2, the input is:
FORMAT 10, 5
MCMC
NAA=40
NBB=5
P=0.1
Q=0.5
N1=182
N2=60
GVAR NAA,NBB,P,Q
FUNCTION FC1()
{
P1=(((1-Q)*P)^2)/((((1-Q)*P)^2)+(2*((1-Q)*P)*(1-((1-Q)*P)-((1-
P)*Q))))
NAA=NRN(N1,P1)
}
FUNCTION FC2()
{
P2=(((1-P)*Q)^2)/((((1-P)*Q)^2)+(2*((1-P)*Q)*(1-((1-P)*Q)-((1-
Q)*P))))
NBB= NRN(N2,P2)
}
FUNCTION FC3()
{
B1=NAA+182+17+1
53
Monte Carlo

B2=(2*176)+182+60-NAA-NBB+1
P=BRN(B1,B2)
}
FUNCTION FC4()
{
D1=NBB+60+17+1
D2=(2*176)+182+60-NAA-NBB+1
Q= BRN(D1,D2)
}
SAVE GIBBSGENETIC
GIBBS(FC1(),FC2(),FC3(),FC4()) / SIZE=10000 NSAMPLE=1
BURNIN=1000 GAP=1 RSEED=1783

USE GIBBSGENETIC
LET PP=(1-Q1)*P1
LET QQ=(1-P1)*Q1
LET RR=1-PP-QQ
LET RBEP= (1-
QQ)*((NAA1+182+17+2)/((NAA1+182+17+2)+((2*176)+182+60-NAA1-
NBB1+2)))
LET RBEQ=(1-
PP)*((NBB1+60+17+2)/((NBB1+60+17+2)+((2*176)+182+60-NAA1-
NBB1+2)))
LET RBER=1-RBEP-RBEQ
CSTATISTICS PP QQ RR RBEP RBEQ RBER/ MAXIMUM MEAN,MEDIAN MINIMUM
SD VARIANCE N PTILE=2.5 50 97.5
BEGIN
DENSITY PP RBEP/HIST XMIN=0.20 XMAX=0.35 LOC=0,0
DENSITY QQ RBEQ/HIST XMIN=0.05 XMAX=0.13 LOC=0,-3
DENSITY RR RBER/HIST XMIN=0.60 XMAX=0.75 LOC=0,-6
END
FORMAT

The output is:


¦ PP QQ RR RBEP RBEQ
-------------------+------------------------------------------------
N of Cases ¦ 10000 10000 10000 10000 10000
Minimum ¦ 0.19704 0.06190 0.60542 0.24135 0.08475
Maximum ¦ 0.31144 0.12549 0.70509 0.28952 0.10793
Median ¦ 0.25359 0.08980 0.65584 0.26456 0.09552
Arithmetic Mean ¦ 0.25407 0.09003 0.65589 0.26470 0.09564
Standard Deviation ¦ 0.01622 0.00944 0.01469 0.00700 0.00296
Variance ¦ 0.00026 0.00009 0.00022 0.00005 0.00001
Method = CLEVELAND ¦
2.50000% ¦ 0.22335 0.07230 0.62616 0.25075 0.09009
50.00000% ¦ 0.25359 0.08980 0.65584 0.26456 0.09552
97.50000% ¦ 0.28793 0.10883 0.68477 0.27891 0.10165
54
Monte Carlo

¦ RBER
-------------------+--------
N of Cases ¦ 10000
Minimum ¦ 0.61193
Maximum ¦ 0.66619
Median ¦ 0.63965
Arithmetic Mean ¦ 0.63966
Standard Deviation ¦ 0.00732
Variance ¦ 0.00005
Method = CLEVELAND ¦
2.50000% ¦ 0.62487
50.00000% ¦ 0.63965
97.50000% ¦ 0.65386

1000 0.10 2000 0.2

900

800 0.08
1500
Proportion per Bar

Proportion per Bar


700

600 0.06
Count

Count

500 1000 0.1

400 0.04

300
500
200 0.02

100

0 0.00 0 0.0
0.20 0.25 0.30 0.35 0.20 0.25 0.30 0.35
PP RBEP

800 0.08 3000

700 0.07

600 0.06
Proportion per Bar

Proportion per Bar

2000 0.2
500 0.05
Count

Count

400 0.04

300 0.03
1000 0.1
200 0.02

100 0.01

0 0.00 0 0.0
0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13
QQ RBEQ

1200 0.12 2000 0.2

1000 0.10
1500
Proportion per Bar

Proportion per Bar

800 0.08
Count

Count

600 0.06 1000 0.1

400 0.04
500
200 0.02

0 0.00 0 0.0
0.60 0.65 0.70 0.75 0.60 0.65 0.70 0.75
RR RBER
55
Monte Carlo

Maximum likelihood estimates of p, q and r evaluated by the scoring method or the


EM algorithm are 0.26444, 0.09317 and 0.64239. With the available prior information,
the estimates of p, q and r are approximated by the Gibbs Sampling method. The
empirical estimates of p, q and r are 0.25407, 0.09003 and 0.65589 respectively. Rao-
Blackwellized estimates are 0.26470, 0.09564 and 0.63966 respectively.

Example 9
Fitting Poisson Gamma Hierarchical Model
This example concerns the finding of failure rates of 10 pumps in a nuclear power
plant. The data set consists of the number of failures and the times of operation for 10
pump systems at the nuclear power plant. The data set is from Gaver and
O’Muircheartaigh (1987).

MODEL:
Assume that the number of failures F ~ Poisson (λiti), where λi is failure rate and ti
is the time of the operation for each pump. The prior densities are

λi ~ gamma(α , β ) and β ~ gamma(γ , δ )

with α=1.8, γ =0.01 and δ=1. The full conditional densities take the form

λi | β , ti , F ~ Gamma ( Fi + α , (ti + β ) −1 )
⎛ ⎛ 10
⎞ ⎞
−1

β | λ1 , λ2 ,...λ10 ~ Gamma ⎜10α + γ , ⎜ δ + ∑ λi ⎟ ⎟


⎜ ⎝ ⎠ ⎠⎟
⎝ i =1

For getting random sample from the full conditionals and basic statistics, the input is
FORMAT 10, 5
USE PUMPFAILURES
MCMC
LAM1=0.5
LAM2=0.5
LAM3=0.5
LAM4=0.5
LAM5=0.5
LAM6=0.5
LAM7=0.5
LAM8=0.5
56
Monte Carlo

LAM9=0.5
LAM10=0.5
BETA=1.0
GVAR LAM1, LAM2, LAM3, LAM4, LAM5, LAM6, LAM7, LAM8, LAM9,
LAM10, BETA
FUNCTION FC1()
{
LAM1=GRN(DATA(F,1)+1.8 , 1/(DATA(T,1)+BETA))
}
FUNCTION FC2()
{
LAM2=GRN(DATA(F,2)+1.8 , 1/(DATA(T,2)+BETA))
}
FUNCTION FC3()
{
LAM3=GRN(DATA(F,3)+1.8 , 1/(DATA(T,3)+BETA))
}
FUNCTION FC4()
{
LAM4=GRN(DATA(F,4)+1.8 , 1/(DATA(T,4)+BETA))
}
FUNCTION FC5()
{
LAM5=GRN(DATA(F,5)+1.8 , 1/(DATA(T,5)+BETA))
}
FUNCTION FC6()
{
LAM6=GRN(DATA(F,6)+1.8 , 1/(DATA(T,6)+BETA))
}
FUNCTION FC7()
{
LAM7=GRN(DATA(F,7)+1.8 , 1/(DATA(T,7)+BETA))
}
FUNCTION FC8()
{
LAM8=GRN(DATA(F,8)+1.8 , 1/(DATA(T,8)+BETA))
}
FUNCTION FC9()
{
LAM9=GRN(DATA(F,9)+1.8 , 1/(DATA(T,9)+BETA))
}
FUNCTION FC10()
{
LAM10=GRN(DATA(F,10)+1.8 , 1/(DATA(T,10)+BETA))
57
Monte Carlo

}
FUNCTION FC11()
{
BETA=GRN((10*1.8)+0.01,1/(1.0+SUM(LAM1,LAM2,LAM3,LAM4,LAM5,LAM6,L
AM7,LAM8,LAM9,LAM10)))
}
SAVE GIBBSNUCLEARPUMPS
GIBBS(FC1(),FC2(),FC3(),FC4(),FC5(),FC6(),FC7(),FC8(),FC9(),FC10(
),FC11()) / SIZE=10000 NSAMPLE=1 BURNIN=500 GAP=30,
RSEED=746572365
USE GIBBSNUCLEARPUMPS
CSTATISTICS / MAXIMUM MEAN, MEDIAN MINIMUM SD VARIANCE N PTILE=2.5
50 97.5

The output is:


¦ LAM11 LAM21 LAM31 LAM41 LAM51
-------------------+------------------------------------------------
N of Cases ¦ 10000 10000 10000 10000 10000
Minimum ¦ 0.01045 0.00283 0.01681 0.03883 0.03731
Maximum ¦ 0.20646 0.67833 0.31899 0.27406 2.15602
Median ¦ 0.06688 0.13669 0.09991 0.12071 0.58231
Arithmetic Mean ¦ 0.07024 0.15379 0.10456 0.12331 0.62866
Standard Deviation ¦ 0.02682 0.09199 0.03971 0.03135 0.29212
Variance ¦ 0.00072 0.00846 0.00158 0.00098 0.08533
Method = CLEVELAND ¦
2.50000% ¦ 0.02811 0.02798 0.04097 0.06965 0.18876
50.00000% ¦ 0.06688 0.13669 0.09991 0.12071 0.58231
97.50000% ¦ 0.13234 0.38348 0.19548 0.19103 1.31557

¦ LAM61 LAM71 LAM81 LAM91 LAM101


-------------------+------------------------------------------------
N of Cases ¦ 10000 10000 10000 10000 10000
Minimum ¦ 0.24171 0.01499 0.02261 0.07778 0.72224
Maximum ¦ 1.26120 4.43899 4.42489 4.63179 3.52014
Median ¦ 0.60204 0.71832 0.70962 1.20703 1.81756
Arithmetic Mean ¦ 0.61282 0.84003 0.82617 1.30140 1.84679
Standard Deviation ¦ 0.13603 0.54447 0.52776 0.58168 0.39485
Variance ¦ 0.01850 0.29645 0.27853 0.33836 0.15591
Method = CLEVELAND ¦
2.50000% ¦ 0.37598 0.15513 0.14817 0.43988 1.15991
50.00000% ¦ 0.60204 0.71832 0.70962 1.20703 1.81756
97.50000% ¦ 0.90423 2.20541 2.13166 2.67503 2.69876

¦ BETA1
-------------------+--------
N of Cases ¦ 10000
Minimum ¦ 0.73295
Maximum ¦ 6.42968
Median ¦ 2.39187
Arithmetic Mean ¦ 2.46662
Standard Deviation ¦ 0.71612
Variance ¦ 0.51283
Method = CLEVELAND ¦
2.50000% ¦ 1.33448
50.00000% ¦ 2.39187
97.50000% ¦ 4.09004
58
Monte Carlo

Example 10
Fitting Linear Regression using Gibbs Sampler
This example taken from Congdon (2001) illustrates a Bayesian Linear Regression of
December rainfall on November rainfall based on data for ten years. The data set is
from Lee (1997), where Y is December rainfall and X is November rainfall.

α + β (xi − x)
MODEL:
Assume that Yi ~ Normal (θ i , σ2), where θ i = .
Priors:
α ~ Normal ( μ 1 , σ 1 )
β ~ Normal ( μ 2 , σ 2 )
τ = σ − 2 ~ Gamma (γ , δ −1 )

The full conditional densities take the form

⎛ ⎛ n 2⎞ ⎞
⎜ ( 1 1 ) ⎜ ∑ yi σ ⎟
μ σ +
2

α | β , σ −2 ~ Normal ⎜ ⎝ i −1 ⎠, 1 ⎟


( n σ ) + (1 σ1 )
2 2
(n σ ) + (1 σ 1 ) ⎟
2 2 ⎟

⎝ ⎠
⎛ ⎞
β | α , σ −2 ~ Normal ⎜
(
⎜ ( μ1 σ 12 ) + ∑ yi ( xi − x) σ 2 ) 1 ⎟

( )
,
( )
⎜ ∑ xi − x σ + (1 σ 2 )

(∑ ( x − x ) σ ) 2 ⎟
2
+ (1 σ 2 ) ⎟
2 2 2
2
i
⎝ ⎠
⎛ ⎤ ⎞
−1
⎡ n ⎡
( ( ))
2


n ⎢ ∑ ⎢⎣ yi − α + β xi − x ⎥⎦ ⎥ ⎟
σ −2 | α , β ~ Gamma ⎜ + γ , ⎢ i =1
+δ ⎥ ⎟
⎜2 ⎢ 2 ⎥ ⎟
⎜⎜ ⎢⎣ ⎥⎦ ⎟⎟
⎝ ⎠
59
Monte Carlo

By taking prior distribution parameters as μ1=0, μ2=0, σ1=10000 , σ2=1000, γ =0.001


and δ=0.001, getting random samples from the full conditionals and computing basic
statistics, the input is:
MCMC
USE RAINFALL
ALPHA=29
BETA=0.5
TAU=0.5
GVAR ALPHA, BETA, TAU
FUNCTION CF1()
{
ALPHA=ZRN(((0/(10000))+((CSUM(Y))*(TAU)))/((10*(TAU))+(1/10000)),
SQR(1/((10*(TAU))+(1/10000))))
}
FUNCTION CF2()
{
BETA=ZRN(((0/(1000))+((CSUM(Y*(X-CMEAN(X))))*TAU))/(((CSUM((X-
CMEAN(X))^2))*TAU)+(1/1000) ) ,SQR(1/(((CSUM((X-
CMEAN(X))^2))*TAU)+(1/1000))))
}
FUNCTION CF3()
{
TAU=GRN((10/2)+0.001, 1/(((1/2)*(CSUM((Y-ALPHA-(BETA*(X-
CMEAN(X))))^2)))+0.001))
}
SAVE GIBBSYORKRAIN
GIBBS( CF1(),CF2(),CF3()) /SIZE=10000 NSAMPLE=1 BURNIN=1000 GAP=1
RSEED=53478
USE GIBBSYORKRAIN
LET SIGSQ=1/TAU1
CSTATISTICS ALPHA1 BETA1 SIGSQ/MAXIMUM MEAN, MEDIAN MINIMUM SD
VARIANCE N PTILE=2.5 50 97.5

The output is:


¦ ALPHA1 BETA1 SIGSQ
-------------------+-----------------------------------
N of Cases ¦ 10000 10000 10000
Minimum ¦ 17.23346 -0.86588 42.89717
Maximum ¦ 63.37492 0.63053 2529.65534
Median ¦ 40.57834 -0.16655 209.43770
Arithmetic Mean ¦ 40.56741 -0.16385 254.30380
Standard Deviation ¦ 5.00478 0.13991 172.74856
Variance ¦ 25.04786 0.01958 2.98421E+004
Method = CLEVELAND ¦
2.50000% ¦ 30.43290 -0.44318 88.06745
50.00000% ¦ 40.57834 -0.16655 209.43770
97.50000% ¦ 50.49352 0.11603 723.58007

The estimates of α, β and σ2 from simulated values are respectively 40.56741,


-0.16385 and 254.30380. Using Time Series plots, we can study the behavior of the
simulated samples.
60
Monte Carlo

The input is:


SERIES
TPLOT ALPHA1

Series Plot

70

60

50
ALPHA1

40

30

20

10
0 2000 4000 6000 8000 10000 12000
Case
TPLOT BETA1

Series Plot

1.0

0.5
BETA1

0.0

-0.5

-1.0
0 2000 4000 6000 8000 10000 12000
Case
61
Monte Carlo

TPLOT SIGSQ

Series Plot

3000

2000
SIGSQ

1000

0
0 2000 4000 6000 8000 10000 12000
Case
By posterior predictive simulation, the December rainfall can be predicted based on the
new November rainfall 46.1.

The input is:


LET THETANEW= ALPHA1+BETA1*(46.1-57.8)
LET YNEW= ZRN(THETANEW, SQR(SIGSQ))
CSTATISTICS YNEW/MAXIMUM MEAN MEDIAN MINIMUM SD VARIANCE N
PTILE=2.5 50 97.5

The output is:


¦ YNEW
-------------------+--------
N of Cases ¦ 10000
Minimum ¦ -49.074
Maximum ¦ 137.631
Median ¦ 42.547
Arithmetic Mean ¦ 42.467
Standard Deviation ¦ 16.871
Variance ¦ 284.639
Method = CLEVELAND ¦
2.500% ¦ 8.341
50.000% ¦ 42.547
97.500% ¦ 76.677

The prediction of December rainfall is 42.567 with standard deviation 16.871.


62
Monte Carlo

Computation

Algorithms

Algorithms used here for random sampling from specified distributions may be found
in Devroye (1986), Bratley et al. (1987), Chhikara and Folks (1989), Fishman (1996),
Gentle (1998), Evans et. al. (2000), Karian and Dudewicz (2000), Ross (2002), and
Hörmann et al. (2004). For some distributions, the inverse CDF method (analytical or
numerical) is used whereas for others special methods are used. The Adaptive
Rejection Sampling method uses the algorithm developed by Gilks (1992), Gilks and
Wild (1993), and Robert and Casella (2004).

References
Athreya, K. B., Delampady, M., and Krishnan, T. (2003). Markov chain Monte Carlo
methods. Resonance, 8 (4), 17--26; 8(7), 63--75; 8(10), 8--19; 8(12), 18--32..
Bratley, P., Fox, B. L., and Schrage, L.E. (1987). A guide to simulation. 2nd ed, New York:
Springer-Verlag.
Casella, G. and George, E. I. (1992). Explaining the Gibbs Sampler. The American
Statistician, 46, 167-174.
Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The
American Statistician, 49, 327-335.
Chhikara, R. S. and Folks, J. L. (1989). The inverse Gaussian distribution: Theory,
methodology, and applications. New York: Marcel Dekker.
Congdon, P. (2001). Bayesian statistical modeling. New York: John Wiley & Sons.
Devroye, L. (1986). Non- uniform random variate generation. New York: Springer-Verlag
*Evans, M., Hastings, N., and Peacock, B. (2000). Statistical distributions. 3rd ed. New
York: John Wiley & Sons.
Fishman, G. S. (1996). Monte Carlo: Concepts, algorithms, and applications. New York:
Springer-Verlag.
Gamerman, D. and Lopes, H. F. (2006). Markov chain Monte Carlo: stochostic simulation
for Bayesian inference, 2nd ed. Boca Raton, FL: Chapman & Hall / CRC.
Gaver, D. P. and O’Muircheartaigh, I. G. (1987) Robust empirical Bayes analysis of event
rates. Technometrics, 29, 1-15
Gentle, J. E. (1998). Random number generation and Monte Carlo methods. New York:
Springer-Verlag.
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo
63
Monte Carlo

integration. Econometrica, 57, 1317-1339.


Gilks, W. R. (1992). Derivative-free adaptive rejection sampling for Gibbs sampling. In
Bayesian Statistics 4, (eds.: Bernardo, J., Berger, J., Dawid, A. P., and Smith, A. F. M.)
London: Oxford University Press, 641-649.
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied
Statistics, 41, 337-348.
Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (1998). Markov chain Monte Carlo
in practice. London: Chapman & Hall / CRC.
Hesterberg, T. (1995). Weighted average importance sampling and defensive mixture
distributions. Technometrics, 37, 185-194.
Hörmann, W., Leydold, J., and Derflinger, G. (2004). Autometic random variate
generation. Berlin: Springer-Verlag
*Johnson, N. L., Kemp, A. W., and Kotz, S. (2005). Univariate discrete distributions. 3rd
ed. New York: John Wiley & Sons.
*Johnson, N. L., Kotz, S., and, Balakrishnan, N. (1994). Univariate continuous
distributions. Vol. 1, 2nd ed. New York: John Wiley & Sons.
*Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995). Univariate continuous
distributions. Vol. 2, 2nd ed. New York: John Wiley & Sons.
Karian, Z. A. and Dudewicz, E. J. (2000). Fitting statistical distributions: The generalized
lambda distribution and generalized bootstrap methods. Florida: CRC Press.
Lee, P. M. (1997). Bayesian statistics: An introduction, 2nd ed. London: Edward Arnold.
Liu, J. S. (2001). Monte Carlo strategies in scientific computing. New York: Springer-
Verlag.
Liu, J. S., Wong, W. H., and Kong, A. (1994). Covariance structure of the gibbs sampler
with applications to the comparisions of the estimators and augmentation schemes.
Biometrika, 81, 27-40.
Matsumoto, M. and Nishimura, T. (1998). Mersenne Twister: A 623-dimensionally
equidistributed uniform pseudorandom number generator, ACM Transactions on
Modeling and Computer Simulation, 8,3-30.
McLachlan, G. J. and Krishnan, T. (1997). The EM algorithm and extensions. New York:
John Wiley & Sons.
Monahan, J. F. (2001). Numerical methods of statistics. Cambridge: Cambridge University
Press.
Rao, C. R. (1973). Linear statistical inference and its applications. New York: John Wiley
& Sons.
Robert, C. P. and Casella, G. (2004). Monte Carlo statistical methods, 2nd ed. New York:
Springer-Verlag.
Ross, S. M. (2002). Simulation, 3rd ed. San Diego, CA: Academic Press.
(* indicates additional references)
65
Language Reference - Monte Carlo

Language Reference - Monte Carlo


RANDSAMP:
Random Sampling
RANDSAMP module generates random samples from a specified distribution with
specified parameters. The distribution that is specified may be one of 42 UNIVARIATE
discrete, UNIVARIATE continuous, and MULTIVARIATE distributions. The random
samples drawn can be used for the desired Monte Carlo integration computations using
INTEG. Classical Monte Carlo Integration and Importance Sampling methods are not
Random Sampling procedures, but they estimate (intractable) integrals through
empirical sampling (refer INTEG command in MCMC).
Setup:
Univariate discrete and continuous Multivariate
* RANDSAMP * RANDSAMP
SAVE SAVE
HOT * UNIVARIATE HOT * MULTIVARIATE

Note: GENERATE command used in the previous version is no more required.

UNIVARIATE command

UNIVARIATE distribution notation(parameter list)

Specifies a distribution with corresponding parameter values as arguments.

/ SIZE= n1 size of the random sample to be generated. The default is 1.


NSAMPLE=n2 number of samples (columns) each of a specified size to be generated.
The default is 1.
RSEED=n3 random seed.

Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in Random Sampling of Monte Carlo.
66
Language Reference - Monte Carlo

Examples:

For the normal distribution with parameters location = 0 and scale =1.2

UNIVARIATE ZRN(0,1.2)

For the binomial distribution with n=16 and p=0.5

UNIVARIATE NRN(16, 0.5)

MULTIVARIATE command

MULTIVARIATE distribution notation(parameter list)

Specifies a distribution with corresponding parameter values as arguments.

/ SIZE= n1 size of the random sample to be generated. The default is 1.


NSAMPLE=n2 number of samples (columns) each of a specified size to be generated.
The default is 1.
RSEED=n3 random seed.

Examples:
For bivariate exponential distribution with parameters (1,1,0.5)
MULTIVARIATE BERN(1,1,0.5)

Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in Random Sampling of Monte Carlo.

SAVE command

SAVE filename

Saves generated samples to a file specified with name filename.


67
Language Reference - Monte Carlo

IIDMC:
IID Monte Carlo
IIDMC consists of two random sampling procedures: Rejection Sampling and Adaptive
Rejection Sampling. The random samples drawn can be used for the desired Monte
Carlo integration computations. The Classical Monte Carlo Integration and Importance
Sampling methods are not random sampling procedures, but they estimate (intractable)
integrals through empirical sampling (refer INTEG command in MCMC).
Setup:
* IIDMC
SAVE
HOT * REJECT

Note: GENERATE command used in the previous version is no more required. RJS and
ARS commands are functions under REJECT command. PROPOSAL command is also
not required, since it is the second argument of RJS function.

REJECT command

Specifies rejection sampling (RJS) or adaptive rejection sampling (ARS) methods to be


used for random sample generation.

REJECT RJS(targetexpression, proposalexpression, constant)

Specifies a target function in expression form for which random samples are needed.
'constant' is an upper bound to supremum of ratio of target and proposal functions and
should be a positive real number.

Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in IIDMC and MCMC of Monte Carlo.

or

REJECT ARS(targetexpression, rangeexpression)

Specifies a target function in expression form for which random samples are needed
and support of target distribution with starting points. The starting points are to be in
ascending order.
68
Language Reference - Monte Carlo

Range expressions for target expression in ARS function are listed in the following
table:

Range expression Description


UB(a,b) Specifies the target as unbounded. The two points are starting
points.
LB(a,b) Specifies the target function as left bounded. The left point is a
bound as well as starting point whereas the right point is a start-
ing point only.
RB(a,b) Specifies the target function as right bounded. The right point is
a bound as well as starting point whereas the left point is a start-
ing point only.
BD(a,b) Specifies the support of target as bounded. The left and right
starting points are also bounds.

/ SIZE= n1 size of the random sample to be generated. The default is 1.


NSAMPLE=n2 number of samples (columns) each of a specified size to be generated.
The default is 1.
RSEED=n3 random seed.

Example:

For rejection sampling from target function "(x^1.7)*((1-x)^5.3)" using proposal


uniform(0, 1) and constant 0.0206

REJECT RJS((x^1.7)*((1-x)^5.3), 0.0206)

For adaptive rejection sampling from unbounded target function "exp(-x^2/2)" using -
1 and 1 as starting values

REJECT ARS( EXP(-x^2/2), UB( -1, 1))

SAVE command

SAVE filename

Saves generated samples to a file specified by name filename.


69
Language Reference - Monte Carlo

MCMC:
Markov Chain Monte Carlo
MCMC generates random samples from a target distribution by constructing an ergodic
Markov chain. The M-H Algorithm and Gibbs Sampling method are two types of
MCMC algorithms. Three types of M-H Algorithm: RWM-H, IndM-H and Hybrid
RWInd M-H algorithms are provided. Fixed Scan Gibbs Sampling iteratively generates
random samples from full conditionals. Monte Carlo Integration methods are not
applicable to Gibbs Sampling procedure in SYSTAT.
Setup:
M-H Algorithm Gibbs Sampling
* MCMC * MCMC
SAVE USE
HOT * MH SAVE
HOT * INTEG * VARIABLE DECLARATIONS
* GVAR
FUNCTION
HOT * GIBBS

Note: GENERATE command used in the previous version is no more required.


INITSAMP, PROPOSAL commands are arguments of various M-H algorithm functions
under MH command. You can use VARIABLE DECLARATIONS, GVAR and FUNCTION
commands to specify full conditionals in (FULLCOND command in the previous
version). GIBBS sampling algorithm. ESTIMATE command in Monte Carlo integration
in the previous version is no more required.

MH command

MH MHRW(target_exp, range_exp , proposal_exp, initialvalue_exp)


or
MHIND(target_exp, range_exp, proposal_exp, initialvalue_exp)
or
MHHY(target_exp, range_exp , rw_ proposal_exp , ind_ proposal_exp, initialvalue_exp)

Specifies various algorithms to be used for random sample generation. Function names
indicate various types of M-H algorithms.
70
Language Reference - Monte Carlo

Function name Algorithm


MHRW M-H random walk
MHIND M-H independent
MHHY M-H hybrid of MHRW and MHIND

target_exp

Specifies a target function in expression form from which random samples are needed.

Range expression Description


UB Specifies the target as unbounded.
LB(a) Specifies the target function as left bounded at a.
RB(b) Specifies the target function as right bounded at b.
BD(a,b) Specifies the support of target as bounded between a and b.

proposal_exp or initialvalue_exp

The distributions for proposal or initial values can be chosen from Beta, Cauchy, Chi-
square, Exponential, F, Gamma, Gompertz, Gumbel, Double Exponential (Laplace),
Logistic, Logit normal, Lognormal, Normal, Pareto, Rayleigh, t, Triangular, Uniform,
Inverse Gaussian (Wald) and Weibull distributions.

Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in IIDMC and MCMC of Monte Carlo.

/ SIZE = n1 size of the random sample to be generated. The default is 1.


NSAMPLE = n2 number of samples (columns) each of a specified size to be generated.
The default is 1.
BURNIN = n3 size of random samples to be discarded initially from the chain. The
default is 500.
GAP = n4 the difference between two successive samples that are extracted from the
generated sample. The default is 30.
RSEED = n5 random seed.
71
Language Reference - Monte Carlo

Examples:
To generate random samples using M-H random walk algorithm from unbounded
target function "exp(-(x^2)/2)" with Cauchy(0,1) as the proposal distribution and an
initial value from uniform(-1,1) distribution:
MH MHRW(exp(-(x^2)/2),UB,C(0,1),U(-1,1))

To generate random samples using M-H independent algorithm from bounded target
function "((x+2)^125)*((1-x)^38)*(x^34)" with uniform(0,1) the proposal distribution
and an initial value from uniform(-1,1) distribution:
MH MHIND(((x+2)^125)*((1-x)^38)*(x^34),BD( 0, 1), U(0,1), U(0,1))

INTEG command

INTEG expression

Specifies an integrand function for Monte Carlo integration.

; MC classical Monte Carlo integration.


IMPSAMPI DENFUN = expression Importance Sampling integration method
IMPSAMPR DENFUN = expression Importance Sampling ratio method

Expression after DENFUN is the numerator of weight function in Importance Sampling


method.

Examples:

For classical Monte Carlo integration of integrand “x^2”

INTEG x^2 ; MC

For Importance Sampling integration of integrand “COS((x)/2)” with weight function 1.

INTEG COS((X)/2) ; IMPSAMPI DENFUN=1

Note: Monte Carlo Integration methods in SYSTAT can be used only after generating
random samples from any one of the univariate discrete and univariate continuous
random sampling methods, Rejection Sampling, Adaptive Rejection Sampling and
M-H algorithms.
72
Language Reference - Monte Carlo

GVARIABLE command

GVARIABLE varlist
Specifies variables used for Gibbs sampling algorithm. Output will be given for
variables in varlist specified after GVARIABLE.

Examples:
GVARIABLE a, b

VARIABLE DECLARATIONS

Specifies initial values for Gibbs variables declared using GVARIABLE command.

Examples:
For setting initial values for Gibbs variables a = 0 and b = 0
a=0
b=0

FUNCTION command

Function specifies full conditional distributions in Gibbs sampling algorithm. Each


univariate full conditional distribution for all Gibbs variables must be defined using the
unique function.

FUNCTION function_name( )
{
List of statements
}
Function name must start with a letter. List of statements involve simple assignment(s)
to temporary variables and/or Gibbs variables.
73
Language Reference - Monte Carlo

Example:
FUNCTION FC1()
{
mu=0
sigma=1
a = zrn(mu, sigma)
}

GIBBS command

GIBBS( fname_1(),…,fname_k())

Specifies the list of all full conditional distributions specified using FUNCTION
command. Alternatively, you can specify all your expressions as arguments of GIBBS
(see example listed below). SYSTAT assumes one-one correspondence between
varlist in GVARIABLE command and GIBBS arguments.

/ SIZE = n1 size of the random sample to be generated. The default is 1.


NSAMPLE = n2 number of samples (columns) each of a specified size to be generated.
The default is 1.
BURNIN = n3 size of random samples to be discarded initially from the chain. The
default is 500.
GAP = n4 the difference between two successive samples that are extracted from the
generated sample. The default is 30.
RSEED = n5 random seed.

Note: The list of distributions along with distribution notation is given in section
Distribution Notations used in IIDMC and MCMC of Monte Carlo.
74
Language Reference - Monte Carlo

Example:

For generation of a random sample from a bivariate normal distribution with mean
vector [0 , 0] and correlation 0.98 using Gibbs sampling.

MCMC
X1=0.0
X2=0.0
sigma=0.1990
GVAR X1, X2
FUNCTION FC1()
{
X1= ZRN(0.98*X2, sigma )
}
FUNCTION FC1()
{
X1= ZRN(0.98*X2, sigma)
}
GIBBS (FC1(),FC2()) /SIZE=10000 NSAMPE=1 BURNIN=500 GAP=50
RSEED=231

In the above example, the full conditionals are simple to express as functions, so we
can give them directly as arguments of GIBBS:
MCMC
X1=0.0
X2=0.0
GVAR X1,X2
GIBBS ( ZRN(0.98*X2, 0.1990), ZRN(0.98*X1,
0.1990)/SIZE=10000 NSAMPE=1 BURNIN=500 GAP=50,
RSEED=231

USE command

(HOT) USE filename

Reads a data file named filename. You do not have to enclose the filename and path in
quotation marks unless the path or name contains spaces. In the absence of a
designated path, the software searches for the file in the directories defined by FPATH
for input data files (USE), temporary data files (WORK), and output data files (SAVE).
The date and time of a file's creation appear in the output when that file is used.
75
Language Reference - Monte Carlo

/ NAMES suppresses the date and time information, displaying only the names
of the variables in the data file.
NONAMES neither the variable names nor the file’s date and time are displayed.
COMMENT displays the file comments after the variable names.
DICTIONARY displays the file comments, variable names, and variable comments.
MATRIX or MAT reads the file as a matrix with specified name.
= matixname
MTYPE= NUMERIC reads all numeric or string variable(s) as a matrix. By default
or SYSTAT reads only numeric columns.
STRING
ROWNAME uses var (or var$) to name the rows of matrix.
=var or var$
COLNAME uses var (or var$) to name the columns of matrix.
=var or var$

SAVE command

SAVE filename

Saves generated samples to a file specified by name filename.

Expressions in Monte Carlo

For IIDMC and M-H algorithms, the target functions from which random sample is
generated are expressions which involve mathematical functions of a single variable.
The integrand in Monte Carlo Integration and the density function in Importance
Sampling procedures are expressions. In the Gibbs Sampling method, the parameters
of full conditionals are expressions, which may involve variables from a data file and
mathematical functions.
Index

A Hypergeometric 14, 17, 25, 29, 31


adaptive rejection sampling 4 Inverse Gaussian 15, 17, 20, 24, 25,
ARS 28, 29, 31
in REJECT 67 Logarithmic series 14, 17
B Logistic 15, 17, 20, 24, 26, 28, 29, 31
burn-in 12 Logit normal 15, 17, 20, 24, 26, 28,
D 29, 31
Distributions Loglogistic 15, 17
Benford’s law 14, 17 Lognormal 15, 17, 20, 24, 26, 28, 29,
Beta 15, 17, 19, 24, 25, 28, 29, 31 31
Binomial 14, 17, 25, 29, 31 Multinomial 16, 18, 32
Bivariate exponential 16, 18, 32 Multivariate normal 16, 18, 33
Cauchy 15, 17, 19, 24, 25, 28, 29, 31 Negative binomial 14, 17, 25, 29, 31
Chi-square 15, 17, 19, 24, 25, 28, 29, Non-central chi-square 15, 17
31 Non-central F 15, 18
Dirichlet 16, 18, 33 Non-central t 15, 18
Discrete uniform 14, 17, 25, 29, 31 Normal 15, 18, 20, 24, 26, 28, 29, 31
Double exponential 15, 17, 19, 24, Pareto 15, 18, 20, 24, 26, 28, 29, 31
25, 28, 29, 31 Poisson 14, 17, 25, 29, 31
Erlang 15, 17 Rayleigh 15, 18, 20, 24, 26, 28, 29,
Exponential 15, 17, 20, 24, 25, 28, 31
29, 31 Smallest extreme value 15, 18
F 15, 17, 20, 24, 25, 28, 29, 31 Studentized range 15, 18, 31
Gamma 15, 17, 20, 24, 25, 28, 29, 31 t 15, 18, 20, 24, 26, 28, 29, 31
Generalized lambda 15, 17 Triangular 15, 18, 20, 24, 26, 28, 29,
Geometric 14, 17, 25, 29, 31 31
Gompertz 15, 17, 20, 24, 25, 28, 29, Uniform 15, 18, 20, 24, 26, 28, 29, 31
31 Weibull 15, 18, 20, 24, 26, 28, 29, 31
Gumbel 15, 17, 20, 24, 25, 28, 29, 31 Wishart 16, 18, 34

77
78
Index

Zipf 14, 17, 25, 29, 31 MULTIVARIATE command 66


F R
full conditionals random sampling 3
in Gibbs Sampling 8 Mersenne-Twister 3
FUNCTION command 72 multivariate 16
G univariate continuous 13
gap 12 univariate discrete 13
GIBBS command 73 RANDSAMP command 65
GVARIABLE command 72 REJECT command 67
I rejection sampling 3
IIDMC 67 RJS
INTEG command 71 in REJECT 67
integration 9 S
M SAVE command 68, 75
Markov Chain Monte Carlo SAVE command 66
Gibbs sampling 8 U
M-H algorithm 5 UNIVARIATE command 65
MCMC 69 V
MH command 69 VARIABLE DECLARATIONS 72
Monte Carlo 1
adaptive rejection sampling 4
commands 16, 21, 28
examples 36, 39, 40, 47, 49, 51, 55,
58
expressions 35
Gibbs sampling 8
integration 9
Metropolis-Hastings 5
overview 1
precautions 12
Quick Graphs 29
random sampling 3
rejection sampling 3
usage 29
Multivariate 33

You might also like