0% found this document useful (0 votes)
38 views15 pages

Lec28 StratifiedSampling

The document discusses stratified sampling techniques for Monte Carlo estimation. Stratified sampling divides the sample space into mutually exclusive strata and estimates the mean within each strata. The overall estimate is a weighted average of the strata means, with weights proportional to the probability of each strata. This can reduce variance compared to simple random sampling. The document provides formulas for calculating stratified sampling estimates and their variance, and discusses how to optimally allocate sample sizes across strata.

Uploaded by

hu jack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views15 pages

Lec28 StratifiedSampling

The document discusses stratified sampling techniques for Monte Carlo estimation. Stratified sampling divides the sample space into mutually exclusive strata and estimates the mean within each strata. The overall estimate is a weighted average of the strata means, with weights proportional to the probability of each strata. This can reduce variance compared to simple random sampling. The document provides formulas for calculating stratified sampling estimates and their variance, and discusses how to optimally allocate sample sizes across strata.

Uploaded by

hu jack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Stratified Sampling

Prof. Nicholas Zabaras

Email: [email protected]
URL: https://www.zabaras.com/

October 11, 2020

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 1


Contents
 Conditional Monte Carlo, Random sums

 Stratified sampling, Systematic Sampling

 The goals for today’s lecture include:

 Understand conditional Monte Carlo, and stratified sampling algorithms

Following closely:
 C. Robert, G. Casella, Monte Carlo Statistical Methods (Ch.. 1, 2, 3.1, & 3.2) (google books, slides, video)
 J. S. Liu, MC Strategies in Scientific Computing (Chapters 1 & 2)
 J-M Marin and C. P. Robert, Bayesian Core (Chapter 2)
 Statistical Computing & Monte Carlo Methods, A. Doucet (course notes, 2007)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2
Conditional Monte Carlo
 Let   H ( X )    H ( x ) p( x )dx be some expected performance measure of a computer simulation
model, where 𝑿 is the input random variable (vector) with a pdf 𝑝(𝒙) and 𝐻(𝑿) is the sample
performance measure (output random variable).
 Suppose that there is a random variable (or vector), 𝒀 ~ 𝑔(𝒚), such that the conditional expectation
𝔼[𝐻(𝑿) | 𝒀 = 𝒚 ] can be computed analytically.
 Since   H ( X )    H ( X ) | Y , it follows that 𝔼 [𝑯(𝑿) | 𝒀 ] is an unbiased estimator of ℓ.
Furthermore, it is readily seen
Var   H  X  | Y   Var  H ( X )
 Thus using the random variable 𝔼[𝑯(𝑿) | 𝒀 ] instead of 𝑯(𝑿), leads to variance reduction.
 The equation above is derived from
Var U   Var U |V    Var  U |V 
for any pair of random variables (𝑈, 𝑉).

 The conditional Monte Carlo idea is referred to as Rao-Blackwellization.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 3


Conditional Monte Carlo
 Step 1: Generate a sample 𝒀1, 𝒀2, … , 𝒀𝑁 from 𝑔(𝒚).

 Step 2: Calculate 𝔼 𝐻(𝑿)|𝒀𝑘 , 𝑘 = 1,2, … 𝑁 analytically

 Step 3: Estimate ℓ = 𝔼 𝐻(𝑿) = 𝔼 𝔼 𝐻(𝑿)|𝒀 by


𝑁
1
෠ℓ𝑐 = ෍ 𝔼 𝐻(𝑿)|𝒀𝑘
𝑁
𝑘=1

The Algorithm requires that a random variable 𝒀 be found, such that 𝔼[𝐻(𝑿) |𝒀 = 𝒚] is
known analytically for all 𝒚. Moreover, for the Algorithm to be of practical use, the following
conditions must be met:
(a) 𝒀 should be easy to generate.
(b) 𝔼[𝐻(𝑿)|𝒀 = 𝒚] should be readily computable for all values 𝒚.
(c) 𝔼[𝑉𝑎𝑟(𝐻(𝑿)|𝒀)] should be large relative to 𝑉𝑎𝑟(𝔼[𝐻(𝑿)|𝒀 ]).

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4


Conditional Monte Carlo: Example – Random Sums
 Consider the estimation of
R
 Pr  S R  x    SR  x
 , where : S R   X i
i 1

 𝑅 is a random variable with a given distribution and the {𝑋𝑖 } are i.i.d. with 𝑋𝑖 ~ 𝐹 and
independent of 𝑅. Let 𝐹𝑟 be the cdf of the random variable 𝑆𝑟, for fixed 𝑅 = 𝑟.

 Noting that
 r   r

F ( x)  Pr  X i  x   F  x   X i 
r

 i 1   i 2 
 We obtain
  R
   R

   SR  x |  X i     F  x   X i 
  i 2    i 2 
 As an estimator of ℓ based on conditioning, we can take

1 N
 Rk

c   F  x   X ki 
N k 1  i  2 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 5


Stratified Sampling
 We wish to estimate
  H ( X )    H ( x ) p ( x ) dx
 Assume that there exists a random variable 𝑌 taking values in {1, … , 𝑚}, say,
with known probabilities {𝑝𝑖 , 𝑖 = 1, … , 𝑚}, and we assume that it is easy to
sample from the conditional distribution of 𝑿 given 𝑌.

 The events {𝑌 = 𝑖}, 𝑖 = 1, … , 𝑚 form disjoint subregions, or strata, of the


sample space Ω, hence the name stratification. Using the conditioning
formula, we can write
m
   H ( X | Y )   pi  H ( X ) | Y = i 
i 1

R. Y. Rubinstein, D. P. Kroese, Simulation and the Monte Carlo Method, 2007

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 6


Stratified Sampling Estimator
 This representation suggests that we can estimate ℓ via the following stratified sampling
estimator:
𝑚 𝑁𝑖
1
෠ℓs = ෍ 𝑝𝑖 ෍ 𝐻 𝑿𝑖𝑗
𝑁𝑖
𝑖=1 𝑗=1
where Xij is the j −th sample from the conditional distribution of 𝑿 given 𝑌 = 𝑖. Here 𝑁𝑖 is the
sample size assigned to the 𝑖 −th stratum.

 The variance of the stratified sampling estimator is given by


𝑚 𝑚
𝑝𝑖2 𝑝𝑖2 𝜎𝑖2
𝑉𝑎𝑟 ℓ෠ 𝑠 = ෎ 𝑉𝑎𝑟 𝐻(𝑋)|𝑌 = 𝑖 = ෎
𝑁𝑖 𝑁𝑖
𝑖=1 𝑖=1

where  i2  Var  H ( X ) | Y  i 

 How the strata should be chosen depends very much on the problem at hand. However, for a
given particular choice of the strata, {𝑁𝑖 } can be obtained in an optimal manner.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 7


Stratified Sampling
 Assuming that a maximum number of 𝑁 samples can be collected, that is, σ𝑚
𝑖=1 𝑁𝑖 = 𝑁, the
optimal value of 𝑁𝑖 is given by
𝑚 2
pi i 1
N i*  N m which gives a minimal 𝑉𝑎𝑟 ℓ෠ ∗𝑠 = ෍ 𝑝𝑖 𝜎𝑖
𝑁
p
j 1
j j variance of 𝑖=1

 This theorem asserts that the minimal variance of ℓ෠ 𝑠 is attained for sample sizes 𝑁𝑖 that are
proportional to 𝑝𝑖𝜎𝑖 .

 Although the 𝑝𝑖’s are assumed to be known, the {𝜎𝑖 } are usually unknown. In practice, one
would estimate the {𝜎𝑖 } from "pilot" runs and then proceed to estimate the optimal sample
sizes, 𝑁𝑖∗ , from the equation above.

 A simple stratification procedure, which can achieve variance reduction without requiring prior
knowledge of 𝜎𝑖2 and 𝐻(𝑿), is presented next.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 8


Stratified Sampling
 Let the sample sizes 𝑁𝑖 be proportional to 𝑝𝑖, that is, 𝑁𝑖 = 𝑝𝑖 𝑁, 𝑖 = 1, … 𝑚. Then

𝑉𝑎𝑟 ℓ෠ 𝑠 ≤ 𝑉𝑎𝑟 ℓ෠
𝑚
𝑚
𝑝𝑖2 𝜎𝑖2 1
 Substituting 𝑁𝑖 = 𝑝𝑖 𝑁 in 𝑉𝑎𝑟 ℓ෠ = ෎
𝑠 ෠𝑠 =
yields 𝑉𝑎𝑟 ℓ ෍ 𝑝𝑖 𝜎𝑖2
𝑁𝑖 𝑁
𝑖=1 𝑖=1
 The result now follows from
𝑚

𝑁𝑉𝑎𝑟 ℓ෠ = 𝑉𝑎𝑟 𝐻(𝑋) ≥ 𝔼 𝑉𝑎𝑟 𝐻 𝑋 |𝑌 = ෍ 𝑝𝑖 𝜎𝑖2 = 𝑁𝑉𝑎𝑟 ℓ෠ 𝑠


𝑖=1
ℓ෠ is the MC estimator of ℓ = 𝔼 𝐻(𝑋)
where we used

Var  H ( X )   Var  H  X  | Y    Var   H  X  | Y   Var  H  X  | Y 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 9


Systematic Sampling Method
෠ 𝑆 is more accurate than the
 The proposition in the slide before states that the estimator ℓ

estimator ℓ.

 It effects stratification by favoring those events {𝑌 = 𝑖} whose probabilities 𝑝𝑖 are largest.


Intuitively, this cannot, in general, be an optimal assignment, since information on 𝜎𝑖2 and
𝐻(𝑿) is ignored.

 In the special case of equal weights (𝑝𝑖 = 1/𝑚 and 𝑁𝑖 = 𝑁/𝑚), the estimator ℓ෠ s =
1 𝑁𝑖
σ𝑚 𝑝
𝑖=1 𝑖 σ𝑗=1 𝐻 𝑿𝑖𝑗 reduces to
𝑁𝑖

𝑚 𝑁 Τ𝑚
1
ℓ෠ 𝑠 = ෍ ෍ 𝐻(𝑿𝑖𝑗 )
𝑁
𝑖=1 𝑗=1

and the method is known as the systematic sampling method.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10


Stratified Sampling
 The stratification process is more obvious when partition of the stochastic
space is possible:
M
  H ( X )    H ( x ) p ( x ) dx   Z m  H ( x ) p m ( x ) d x ,
D m 1 Dm

p( x )
where : pm ( x )  Dm ( x ) (conditional PDF ),
Zm
M
Z m  Pr  x  Dm    p ( x ) dx , D  Dm , Dm are disjoint
Dm m 1

 The stratified sampling algorithm can be easily implemented assuming that


the conditional PDFs and normalization factors 𝑍𝑚 are known.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11


Stratified Sampling: Algorithm
 Step 1: For 𝑚 = 1, … , 𝑀, draw 𝑁𝑚 samples x  ( m ) Nm
i i1
from the conditional PDF 𝑝𝑚.

 Evaluate the estimator of  H ( x) p


Dm
m ( x )dx at domain 𝑚:
Nm
1
m 
Nm
 i )
H (
i 1
x (m)

 Then the overall estimator of


M
  H ( X )    Z m  H ( x ) pm ( x ) d x
m 1 Dm

is as follows:
M M Nm
1
  Zm  Z m  i )
s
(m)
m H ( x
m 1 m 1 Nm i 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12


Stratified Sampling: Variance Reduction
M M Nm
1
  Zm  Z m  H (x
s
 The variance of the stratified sampling estimator is: m
(m)
i )
m 1 m 1 Nm i 1
M M

Var    Z m2Var   2 1
VarDm  H ( x ) 
 
s
 Z
  m 1 
m m
m 1 Nm

 Note that for the choices:


Z m  1/ M , N m  N / M :
M
  1
Var  H ( x )
s
Var 
  MN m 1
Dm

 If we select 𝐷𝑚 such that VarDm  H ( x )  is small on average, i.e.

M
1
M
Var  H ( x )  Var  H ( x)
m 1
Dm

(e.g. 𝐻(𝒙) is relatively homogeneous within 𝐷𝑚), then we do achieve variance reduction,
i.e. 𝑣𝑎𝑟 ℓ෠ 𝑠 < 𝑣𝑎𝑟 ℓ෠ .
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13
Stratified Sampling: Variance Reduction
 Consider the following case:
p ( x)  1, x  [0,1]
1/ k , 0  x  1/ 2
H ( x)  
 k , 1/ 2  x  1
 It can easily be shown that:

(k 2  1) 2
Var[0,1]  H ( x)   2
  as k  .
4k

 However, the variance of the stratified sampling estimator is zero since:

Var[0,1/2]  H ( x)   Var[1/2,1]  H ( x)   0.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 14


Stratified Sampling: Example
 Often 𝐻(𝒙) is not known explicitly.

 Then, the only way to select the partition domains 𝐷𝑚 is by drawing samples
of 𝒙 and evaluating 𝐻(𝒙).

 In that respect, the applicability of the stratified sampling method is limited.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 15

You might also like