Bootstrap Methods 2020
Bootstrap Methods 2020
Class Notes
Manuel Arellano
Revised: February 2, 2020
Introduction
The bootstrap is an alternative method of assessing sampling variability. It is a
mechanical procedure that can be applied in a wide variety of situations.
The bootstrap was invented and given its name by Brad Efron in a paper published in
1979 in the Annals of Statistics.
1
The idea of the bootstrap
Let Y1 , ..., YN be a random sample according to some distribution F and let
bθ N = h (Y1 , ..., YN ) be some statistic of interest. We want to estimate the
distribution of b θN
PrF b
θN r = PrF [h (Y1 , ..., YN ) r]
where the subscript F indicates the distribution of the Y ’s.
A simple estimator of PrF b θN r is the plug-in estimator. It replaces F by the
b
empirical cdf FN :
1 N
N i∑
FbN (s ) = 1 (Y i s) ,
=1
which assigns probability 1/N to each of the observed values y1 , ..., yN of Y1 , ..., YN .
The resulting estimator is then
PrFb [h (Y1 , ..., YN ) r] (1)
N
y1 , y2 , y3 Pr (y1 , y2 , y3 ) b
θN
(0, 0, 0) (1 p )3 0
(1, 0, 0) p (1 p )2 1/3
(0, 1, 0) p (1 p )2 1/3
(0, 0, 1) p (1 p )2 1/3
(1, 1, 0) p 2 (1 p ) 2/3
(1, 0, 1) p 2 (1 p ) 2/3
(0, 1, 1) p 2 (1 p ) 2/3
(1, 1, 1) p3 1
So that PrF b
θN r is determined by
r PrF b
θN = r
0 (1 p )3
1/3 3p (1 p )2
2/3 3p 2 (1 p )
1 p3
3
The idea of the bootstrap (continued)
Suppose that the observed values y1 , y2 , y3 are (0, 1, 1), so that the observed value of
b
θ N is 2/3. Therefore, our estimate of Pr b θ N = r is given by
r PrFb b
θN = r
N
3
0 1 23 = 27 1
= .037
2 2 2 6
1/3 3 3 1 3 = 27 = .222
2
2/3 3 23 1 23 = 12 27 = .444
2 3 8
1 3 = 27 = .297
The previous example is so simple that the calculation of PrFb b θN r can be done
N
analytically, but in general this type of calculation is beyond reach.
4
The idea of the bootstrap (continued)
Estimation by simulation
A standard device for (approximately) evaluating probabilities that are too di¢ cult to
calculate exactly is simulation.
To calculate the probability of an event, one generates a sample from the underlying
distribution and notes the frequency with which the event occurs in the generated
sample.
If the sample is su¢ ciently large, this frequency will provide an approximation to the
original probability with a negligible error.
Such approximation to the probability (1) constitutes the second step of the
bootstrap method.
A number M of samples Y1 , ..., YN (the “bootstrap” samples) are drawn from FbN ,
and the frequency with which
h (Y1 , ..., YN ) r
5
Numerical illustration
The discrepancy between the last two columns can be made arbitrarily small by
increasing M .
6
Bootstrap standard errors
The bootstrap procedure is very ‡exible and applicable to many di¤erent situations
such as the bias and variance of an estimator, to the calculation of con…dence
intervals, etc.
As a result of resampling we have available M estimates from the arti…cial samples:
(1 ) (M )
b
θ N , ..., b
θ N . A bootstrap standard error is then obtained as
" #
M 2 1/2
1 (m )
M 1 m∑
b
θN b
θN (2)
=1
where b
θ N = ∑M b(m )
m =1 θ N /M .
In the previous example, the bootstrap mean is bθ N = 0.664, the bootstrap standard
error is 0.271 calculated as in (2) with M = 1000, and the analytical standard error is
2 31/2
b
θN 1 b θN
4 5 = 0.272
N
where b
θ N = 2/3 and N = 3.
7
Asymptotic properties of bootstrap methods
Using the bootstrap standard error to construct test statistics cannot be shown to
improve on the approximation provided by the usual asymptotic theory, but the good
news is that under general regularity conditions it does have the same asymptotic
justi…cation as conventional asymptotic procedures.
This is good news because bootstrap standard errors are often much easier to obtain
than analytical standard errors.
Re…nements for large-sample pivotal statistics
Even better news is the fact that in many cases the bootstrap does improve the
approximation of the distribution of test statistics, in the sense that the bootstrap can
provide an asymptotic re…nement compared with the usual asymptotic theory.
The key aspect for achieving such re…nements (consisting in having an asymptotic
approximation with errors of a smaller order of magnitude in powers of the sample
size) is that the statistic being bootstrapped is asymptotically pivotal.
An asymptotically pivotal statistic is one whose limiting distribution does not depend
on unknown parameters (like standard normal or chi-square distributions).
This is the case with t-ratios and Wald test statistics, for example.
Note that for a t-ratio to be asymptotically pivotal in a regression with heteros-
kedasticity, the robust White form of the t-ratio needs to be used.
8
Asymptotic properties of bootstrap methods (continued)
The upshot of the previous discussion is that the replication of the quantity of interest
(mean, median, etc.) is not always the best way to use the bootstrap if improvements
on asymptotic approximations are sought.
In particular, when we wish to calculate a con…dence interval, it is better not to
bootstrap the estimate itself but rather to bootstrap the distribution of the t-value.
This is feasible when we have a large sample estimate of the standard error, but one is
skeptical about the accuracy of the normal probability approximation.
Such procedure will provide more accurate estimates of con…dence intervals than
either the simple bootstrap or the asymptotic standard errors.
9
An example: the sample mean
To illustrate how the bootstrap works, let us consider the estimation of a sample
mean:
yi = β + ui ,
where ui are iid with zero mean.
The OLS estimate on the original sample is:
1 N
N i∑
b
β= yi = y .
=1
Then, sampling from the original sample is equivalent to selecting N indices with
probability 1/N. Let W be a generic draw from that distribution. We have:
1 N 1 N
N i∑ N i∑
E(W jy1 , ..., yN ) = yi , Var(W jy1 , ..., yN ) = (yi y )2 .
=1 =1
Let e
β be the sample mean of a bootstrap sample. It follows from the sample mean
theorem that:
1 N
N i∑
E( e
βjy1 , ..., yN ) = E(W jy1 , ..., yN ) = yi ,
=1
and
Var(W jy1 , ..., yN ) 1 N
Var(e
βjy1 , ..., yN ) = = 2 ∑ (yi y )2 .
N N i =1
10
An example: the sample mean (continued)
This illustrates the bootstrap principle according to which the relation between e
β and
b
β is the same as the relation between b β and the true β.
11
Bootstrap con…dence intervals
The connection between con…dence intervals and signi…cance tests can be exploited
to test certain parametric hypotheses.
Suppose we wish to test H0 : θ = θ 0 against the two-sided alternative H1 : θ 6= θ 0 .
For a test at the 5% level we …rst compute a number b
κ such that
0 1
b
θN b θN
PrFb @ >bκ A = .05
N bN
σ
b
θN θ0
>b
κ.
bN
σ
13
Bootstrap hypothesis testing: linear regression example
14
Bootstrapping dependent samples
Dealing with time-series models requires to take serial dependence into account.
One way of doing it is to sample by blocks.
All we have to do is to treat the strata separately, and resample, not the basic
underlying units (the households) but rather the primary sample units (the clusters).
15
Using replicate weights
Taking strati…cation and clustering sampling features into account, either analytically
or by bootstrap, requires the availability of stratum and cluster indicators.
Generally, Statistical O¢ ces or survey providers do not make them available for
con…dentiality reasons.
To enable the estimation of the sampling distribution of estimators and test statistics
without disclosing stratum or cluster information, an alternative is to provide replicate
weights.
For example, the EFF provides 999 replicate weights. Speci…cally the EFF provides
replicate cross-section weights, replicate longitudinal weights, and multiplicity factors
(Bover, 2004).
Multiplicity factors indicate the number of times a given household appears in a
particular bootstrap sample.
16