Lec28 StratifiedSampling
Lec28 StratifiedSampling
Email: [email protected]
URL: https://www.zabaras.com/
Following closely:
C. Robert, G. Casella, Monte Carlo Statistical Methods (Ch.. 1, 2, 3.1, & 3.2) (google books, slides, video)
J. S. Liu, MC Strategies in Scientific Computing (Chapters 1 & 2)
J-M Marin and C. P. Robert, Bayesian Core (Chapter 2)
Statistical Computing & Monte Carlo Methods, A. Doucet (course notes, 2007)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2
Conditional Monte Carlo
Let H ( X ) H ( x ) p( x )dx be some expected performance measure of a computer simulation
model, where 𝑿 is the input random variable (vector) with a pdf 𝑝(𝒙) and 𝐻(𝑿) is the sample
performance measure (output random variable).
Suppose that there is a random variable (or vector), 𝒀 ~ 𝑔(𝒚), such that the conditional expectation
𝔼[𝐻(𝑿) | 𝒀 = 𝒚 ] can be computed analytically.
Since H ( X ) H ( X ) | Y , it follows that 𝔼 [𝑯(𝑿) | 𝒀 ] is an unbiased estimator of ℓ.
Furthermore, it is readily seen
Var H X | Y Var H ( X )
Thus using the random variable 𝔼[𝑯(𝑿) | 𝒀 ] instead of 𝑯(𝑿), leads to variance reduction.
The equation above is derived from
Var U Var U |V Var U |V
for any pair of random variables (𝑈, 𝑉).
The Algorithm requires that a random variable 𝒀 be found, such that 𝔼[𝐻(𝑿) |𝒀 = 𝒚] is
known analytically for all 𝒚. Moreover, for the Algorithm to be of practical use, the following
conditions must be met:
(a) 𝒀 should be easy to generate.
(b) 𝔼[𝐻(𝑿)|𝒀 = 𝒚] should be readily computable for all values 𝒚.
(c) 𝔼[𝑉𝑎𝑟(𝐻(𝑿)|𝒀)] should be large relative to 𝑉𝑎𝑟(𝔼[𝐻(𝑿)|𝒀 ]).
𝑅 is a random variable with a given distribution and the {𝑋𝑖 } are i.i.d. with 𝑋𝑖 ~ 𝐹 and
independent of 𝑅. Let 𝐹𝑟 be the cdf of the random variable 𝑆𝑟, for fixed 𝑅 = 𝑟.
Noting that
r r
F ( x) Pr X i x F x X i
r
i 1 i 2
We obtain
R
R
SR x | X i F x X i
i 2 i 2
As an estimator of ℓ based on conditioning, we can take
1 N
Rk
c F x X ki
N k 1 i 2
where i2 Var H ( X ) | Y i
How the strata should be chosen depends very much on the problem at hand. However, for a
given particular choice of the strata, {𝑁𝑖 } can be obtained in an optimal manner.
This theorem asserts that the minimal variance of ℓ 𝑠 is attained for sample sizes 𝑁𝑖 that are
proportional to 𝑝𝑖𝜎𝑖 .
Although the 𝑝𝑖’s are assumed to be known, the {𝜎𝑖 } are usually unknown. In practice, one
would estimate the {𝜎𝑖 } from "pilot" runs and then proceed to estimate the optimal sample
sizes, 𝑁𝑖∗ , from the equation above.
A simple stratification procedure, which can achieve variance reduction without requiring prior
knowledge of 𝜎𝑖2 and 𝐻(𝑿), is presented next.
𝑉𝑎𝑟 ℓ 𝑠 ≤ 𝑉𝑎𝑟 ℓ
𝑚
𝑚
𝑝𝑖2 𝜎𝑖2 1
Substituting 𝑁𝑖 = 𝑝𝑖 𝑁 in 𝑉𝑎𝑟 ℓ =
𝑠 𝑠 =
yields 𝑉𝑎𝑟 ℓ 𝑝𝑖 𝜎𝑖2
𝑁𝑖 𝑁
𝑖=1 𝑖=1
The result now follows from
𝑚
In the special case of equal weights (𝑝𝑖 = 1/𝑚 and 𝑁𝑖 = 𝑁/𝑚), the estimator ℓ s =
1 𝑁𝑖
σ𝑚 𝑝
𝑖=1 𝑖 σ𝑗=1 𝐻 𝑿𝑖𝑗 reduces to
𝑁𝑖
𝑚 𝑁 Τ𝑚
1
ℓ 𝑠 = 𝐻(𝑿𝑖𝑗 )
𝑁
𝑖=1 𝑗=1
p( x )
where : pm ( x ) Dm ( x ) (conditional PDF ),
Zm
M
Z m Pr x Dm p ( x ) dx , D Dm , Dm are disjoint
Dm m 1
is as follows:
M M Nm
1
Zm Z m i )
s
(m)
m H ( x
m 1 m 1 Nm i 1
M
1
M
Var H ( x ) Var H ( x)
m 1
Dm
(e.g. 𝐻(𝒙) is relatively homogeneous within 𝐷𝑚), then we do achieve variance reduction,
i.e. 𝑣𝑎𝑟 ℓ 𝑠 < 𝑣𝑎𝑟 ℓ .
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13
Stratified Sampling: Variance Reduction
Consider the following case:
p ( x) 1, x [0,1]
1/ k , 0 x 1/ 2
H ( x)
k , 1/ 2 x 1
It can easily be shown that:
(k 2 1) 2
Var[0,1] H ( x) 2
as k .
4k
Var[0,1/2] H ( x) Var[1/2,1] H ( x) 0.
Then, the only way to select the partition domains 𝐷𝑚 is by drawing samples
of 𝒙 and evaluating 𝐻(𝒙).