Bootstrap
Bootstrap
1 Basic Problem
Given Y = (Y1 , Y2, · · · , Yn ) drawn iid from a (partially or even completely unknown) distri-
bution P , estimate some parameter θ of P . The estimator θ̂ is based on a statistic T (Y ).
Then produce some measure of statistical accuracy of the estimator, e.g., its mean, variance,
confidence interval, or even full distribution.
For instance, θ could be a moment of P , or the median of P , etc.
For large sample size n and under regularity conditions, asymptotic theory√ for maximum-
likelihood (ML) estimators tells us that the normalized estimation error n(θ̂M L − θ) con-
verges in distribution to N (0, 1/Iθ ), where Iθ is Fisher information. Since P is unknown, the
estimator θ̂ assumed above is generally not the ML estimator. Under regularity conditions,
the estimator θ̂ will often be consistent in probability (like the ML estimator), but won’t be
asymptotically efficient, i.e., its asymptotic variance will exceed 1/Iθ . For instance,
P assume
P has mean θ and finite variance. Then the natural estimator θ̂ = n1 T (Y ) = n1 ni=1 Yi is
consistent but not efficient – unless P is Gaussian with known variance.
In Sections 2 and 3 we consider two approaches to measure the accuracy of the estimator:
one that requires acquiring many new samples, and a modification that doesn’t. The latter
approach is the bootstrapping method.
1
d θ̂) above is not a valid estimator of the bias because evaluating it requires
(Note that Bias(
knowledge of θ!) The cumulative density function (cdf) of θ̂ is given by
Q(x) = P [θ̂ ≤ x]
1 X
K
Q̂(x) = 1 (k) .
K k=1 {θ̂ ≤x}
3 Bootstrap
The basic idea of bootstrapping is to use the procedure of Section 2 with the following
modification. Instead of drawing Y (k) from P , draw it from the empirical distribution
1X
n
P̂ (y) = 1{Yi ≤y}
n i=1
(k)
based on the n samples Y1 , · · · , Yn . In other words, draw Yi iid (with replacement) from the
uniform distribution over the observed dataset {Y1 , · · · , Yn }. We denote by Y ∗(k) , 1 ≤ k ≤ K,
the resamples obtained from this procedure and by θ̂∗(k) the estimate computed from Y ∗(k) .
Then we form the following estimator of the bias and variance of θ̂:
1 X ∗(k)
K
∗
Bias (θ̂) = θ̂ −θ
K k=1
!2
1 X ∗(k) 2 1 X ∗(k)
K K
∗
Var (θ̂) = θ̂ − θ̂ .
K k=1 K k=1
1 X
K
∗
Q̂ (x) = 1{θ̂∗(k) ≤x} .
K
k=1
Example. Let P be any distribution over the ten digits 0, 1, · · · , 9, and θ the median
of the distribution. The estimator is simply θ̂ = med(Y1 , · · · , Yn ). Say that for n = 5 we
observe Y = (5, 2, 2, 9, 7). We use bootstrapping with K = 2. Say we obtain the resamples
Y ∗(1) = (9, 2, 9, 5, 2) and Y ∗(2) = (2, 7, 7, 5, 9). The corresponding median estimates are
θ̂∗(1) = 5 and θ̂∗(2) = 7. Therefore the bootstrap estimate of the mean of θ̂ is 12 (5 + 7) = 6,
and its variance is 1.
2
When does the bootstrap work? For large n, the empirical cdf estimator P̂ converges
in the sup norm to the actual P :
a.s.
sup |P̂ (y) − P (y)| → 0 as n → ∞.
y
Thus one may heuristically expect that the bootstrap is approximately equivalent to the
procedure of Sec. 2. In fact a little bit more is required, e.g., a sufficient condition is that θ̂
be asymptotically Gaussian.
An example where the bootstrap fails is that of iid uniform Y over the interval [0, θ],
using the estimator θ̂(Y ) = max1≤i≤n Yi .
4 Extensions
The bootstrap principle may also be used in problems where the data Yi , 1 ≤ i ≤ n, are not
iid. A modification of the method of Sec. 3 is needed. The approach is best illustrated with
an example.
Example. Consider the model
Yi = a + bsi + Wi , 1≤i≤n
where θ = (a, b) is to be estimated, si is a known signal, and Wi is iid noise with mean zero
and finite variance. Consider the least-squares estimator
X
n
(â, b̂) = arg min (Yi − a − bsi )2
(a,b)
i=1
which is consistent under regularity assumptions on the signal. Note this estimator coincides
with the ML estimator if the noise distribution is Gaussian with known variance. Define the
residual errors
Ei = Yi − â − b̂si , 1 ≤ i ≤ n,
which follow approximately the same iid distribution as {Wi } if â, b̂ are accurate estimators
of a, b. Define the centered residuals
1X
n
Ẽi = Ei − Ej , 1 ≤ i ≤ n.
n j=1
3
∗(k) ∗(k)
2. Generate pseudo-data Yi = â + b̂si + Ei for 1 ≤ i ≤ n.
Then one may obtain the desired estimates of the bias, variance, etc. of the estimator from
(â∗(k) , b̂∗(k) ), 1 ≤ k ≤ K.
Related Problems. The same idea applies to problems such as estimation of parameters
in a Markov model:
Yi = Fθ (Yi−1, Yi−2 ) + Wi , 1 ≤ i ≤ n
where Wi is iid noise with mean zero and finite variance. The prediction function F is
parameterized by θ, which can be estimated using nonlinear least-squares:
X
n
θ̂ = arg min (Yi − Fθ (Yi−1 , Yi−2 ))2 .
θ
i=3
The bootstrap has also been used to assess the accuracy of spectrum estimators, hypothesis
tests, etc.
References
[1] B. Efron, “Bootstrap Methods: Another Look at the Jackknife,” Annals of Statistics,
pp. 1—26, Jan. 1979.
[3] A. Zoubir and B. Boashash, “The Bootstrap and its Application in Signal Processing,”
IEEE Signal Processing Magazine, pp. 56—76, Jan. 1998.