Bootstrap Methods and Their Applications.
Bootstrap Methods and Their Applications.
net/publication/37434447
CITATIONS READS
4,359 21,218
2 authors, including:
D. V. Hinkley
University of California, Santa Barbara
120 PUBLICATIONS 13,450 CITATIONS
SEE PROFILE
All content following this page was uploaded by D. V. Hinkley on 26 September 2015.
c
2006
A short course based on the book
‘Bootstrap Methods and their Application’,
by A. C. Davison and D. V. Hinkley
c
Cambridge University Press, 1997
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
AIDS data
◮ UK AIDS diagnoses 1988–1992.
◮ Reporting delay up to several years!
◮ Problem: predict state of epidemic at end 1992, with
realistic statement of uncertainty.
◮ Simple model: number of reports in row j and column k
Poisson, mean
µjk = exp(αj + βk ).
◮ Unreported diagnoses in period j Poisson, mean
X X
µjk = exp(αj ) exp(βk ).
k unobs k unobs
◮ Estimate total unreported diagnoses from
period j by replacing αj and βk by MLEs.
• How reliable are these predictions?
• How sensible is the Poisson model?
Anthony Davison: Bootstrap Methods and their Application, 4
Motivation
AIDS data
◮ Data (+), fits of simple model (solid), complex model
(dots)
◮ Variance formulae could be derived — painful! but useful?
◮ Effects of overdispersion, complex model, . . .?
500
400
+ ++
300
Diagnoses
+ ++ +
+ ++ + +
+ + +
+ + +
200
++
+
++ +
+++
100
+
++
+
++ +
++++
0
Goal
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Handedness data
Handedness data
Figure: Scatter plot of handedness data. The numbers show the mul-
tiplicities of the observations.
1
8
7
1
6
5
hand
1 1
4
1
3
2211
2
1 1 2 2 15534 11
1
15 20 25 30 35 40 45
dnan
Handedness data
Frequentist inference
◮ For r = 1, . . . , R:
iid
• generate random sample y1∗ , . . . , yn∗ ∼ F ;
• compute θbr using y1∗ , . . . , yn∗ ;
◮ Output after R iterations:
10
0.020
8 1
0.015
6 1
hand
0.010
4 1 1
2 2211 0.005
1 1 2 2 15534 11
0 0.000
10 15 20 25 30 35 40 45
dnan
Figure: Left: original data, with jittered vertical values. Centre and
right: two samples generated from the fitted bivariate normal distribu-
tion.
10
10
10
8
8
Correlation 0.509 Correlation 0.753 Correlation 0.533
6
6
hand
hand
hand
4
4
2
2
0
0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
dnan dnan dnan
3.5
3.0
3.0
1.0 1.5 2.0 2.5
Probability density
0.5
0.5
0.0
0.0
F unknown
◮ Replace unknown F by estimate Fb obtained
• parametrically — e.g. maximum likelihood or robust fit of
distribution F (y) = F (y; ψ) (normal, exponential, bivariate
normal, . . .)
• nonparametrically — using empirical distribution function
(EDF) of original data y1 , . . . , yn , which puts mass 1/n on
each of the yj
◮ Algorithm: For r = 1, . . . , R:
iid
• generate random sample y1∗ , . . . , yn∗ ∼ Fb ;
• compute θbr using y1 , . . . , yn ;
∗ ∗
Nonparametric bootstrap
iid
◮ Bootstrap (re)sample y1∗ , . . . , yn∗ ∼ Fb, where Fb is EDF of
y1 , . . . , yn
• Repetitions will occur!
◮ Compute bootstrap θb∗ using y1∗ , . . . , yn∗ .
◮ For handedness data take n = 37 pairs y ∗ = (dnan, hand)∗
with equal probabilities 1/37 and replacement from original
pairs (dnan, hand)
◮ Repeat this R times, to get θb∗ , . . . , θb∗
1 R
◮ See picture
◮ Results quite different from parametric simulation — why?
Figure: Left: original data, with jittered vertical values. Centre and
right: two bootstrap samples, with jittered vertical values.
8
8
Correlation 0.509 Correlation 0.733 Correlation 0.491
7
7
6
6
5
5
hand
hand
hand
4
4
3
3
2
2
1
1
10 15 20 25 30 35 40 45 10 15 20 25 30 35 40 45 10 15 20 25 30 35 40 45
dnan dnan dnan
1 X b∗ b∗ 2
R
v= θ −θ
R − 1 r=1 r
θb1∗ , . . . , θbR
∗
Handedness data
Figure: Summaries of the θb∗ . Left: histogram, with vertical line showing θ.
b
Right: normal Q–Q plot of θb∗ .
Histogram of t
2.5
2.0
0.5
1.0 1.5
Density
t*
0.0
0.5
−0.5
0.0
4
2
2
Z*
Z*
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
Theoretical Quantiles Theoretical Quantiles
Key points
◮ Estimator is algorithm
• applied to original data y1 , . . . , yn gives original θb
• applied to simulated data y1∗ , . . . , yn∗ gives θb∗
• θb can be of (almost) any complexity
• for more sophisticated ideas (later) to work, θb must often be
smooth function of data
◮ Sample is used to estimate F
• Fb ≈ F — heroic assumption
◮ Simulation replaces theoretical calculation
• removes need for mathematical skill
• does not remove need for thought
• check code very carefully — garbage in, garbage out!
◮ Two sources of error
• statistical (Fb 6= F ) — reduce by thought
• simulation (R 6= ∞) — reduce by taking R large (enough)
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
1.5
Transformed correlation coefficient
1.5
1.0
1.0
Density
0.0 0.5
0.5
−0.5
0.0
ψb = ψ(θ)
b = 1
2
b
log{(1 + θ)/(1 b
− θ)}
1 X b∗ b∗ 2
R
X R
bψ = R−1 ψbr∗ − ψ,
b vψ = ψ −ψ ,
r=1
R − 1 r=1 r
Pivots
◮ Hope properties of θb1∗ , . . . , θbR
∗ mimic effect of sampling from
original model.
◮ Amounts to faith in ‘substitution principle’: may replace
unknown F with known Fb — false in general, but often
more nearly true for pivots.
◮ Pivot is combination of data and parameter whose
distribution is independent of underlying model.
iid
◮ Canonical example: Y1 , . . . , Yn ∼ N (µ, σ 2 ). Then
Y −µ
Z= ∼ tn−1 ,
(S 2 /n)1/2
for all µ, σ 2 — so independent of the underlying
distribution, provided this is normal
◮ Exact pivot generally unavailable in nonparametric case.
Studentized statistic
◮ Idea: generalize Student t statistic to bootstrap setting
◮ Requires variance V for θb computed from y1 , . . . , yn
◮ Analogue of Student t statistic:
θb − θ
Z=
V 1/2
◮ If the quantiles zα of Z known, then
!
θb − θ
Pr (zα ≤ Z ≤ z1−α ) = Pr zα ≤ 1/2 ≤ z1−α = 1 − 2α
V
(zα no longer denotes a normal quantile!) implies that
Pr θb − V 1/2 z1−α ≤ θ ≤ θb − V 1/2 zα = 1 − 2α
Why Studentize?
D
◮ Studentize, so Z −→ N (0, 1) as n → ∞. Edgeworth series:
D
◮ If don’t studentize, Z = (θb − θ) −→ N (0, ν). Then
z z z
Pr(Z ≤ z | F ) = Φ 1/2 +n−1/2 a′ 1/2 φ 1/2 +O(n−1 )
ν ν ν
and
z z z
Pr(Z ∗ ≤ z | Fb) = Φ +n −1/2 ′
a
b φ 1/2 +Op (n−1 ).
νb1/2 νb1/2 νb
θb((R+1)α)
∗
, θb((R+1)(1−α))
∗
.
◮ Improved percentile intervals (BCa , ABC, . . .)
• Replace percentile interval with
θb((R+1)α
∗
′) , θb((R+1)(1−α
∗
′′ )) ,
Handedness data
General comparison
Caution
◮ Edgeworth theory OK for smooth statistics — beware
rough statistics: must check output.
◮ Bootstrap of median theoretically OK, but very sensitive to
sample values in practice.
◮ Role for smoothing?
.. .
10
.
..
....
........
.........
5
.............
T*-t for medians
.
.............................
..........
0
................
............
..............
........
-5
.
.........
. ......
-10
. .
-2 0 2
Quantiles of Standard Normal
Key points
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Gravity data
Series
1 2 3 4 5 6 7 8
76 87 105 95 76 78 82 84
82 95 83 90 76 78 79 86
83 98 76 76 78 78 81 85
54 100 75 76 79 86 79 82
35 109 51 87 72 87 77 77
46 109 76 79 68 81 79 76
87 100 93 77 75 73 79 77
68 81 75 71 78 67 78 80
75 62 75 79 83
68 82 82 81
67 83 76 78
73 78
64 78
Gravity data
1 2 3 4 5 6 7 8
series
Gravity data
◮ Eight series of measurements of gravitational acceleration g
made May 1934 – July 1935 in Washington DC
◮ Data are deviations from 9.8 m/s2 in units of 10−3 cm/s2
◮ Goal: Estimate g and provide confidence interval
◮ Weighted combination of series averages and its variance
estimate
P8 8
!−1
y × n i /s 2 X
i=1 i
θb = P 8 2
i
, V = ni /s2i ,
i=1 n i /s i i=1
giving
θb = 78.54, V = 0.592
and 95% confidence interval of θb ± 1.96V 1/2 = (77.5, 79.8)
. .
.. .. .
5
81 ....
.. ............
.... ...........
.... ...
......
........ .....................
.........
..... .............
..........
80
0
....
.......
....
..
....... ...
... .............
....
.
.... .....
......... ........
............
-5
79
.
........
z*
t*
...........
.........
......
-10
....
78
..
...........
.
.
....
.....
-15
....
77
...
.. .
-2 0 2 -2 0 2
Quantiles of Standard Normal Quantiles of Standard Normal
... ..
0.1 0.2 0.3 0.4 0.5 0.6 0.7
. .. .. . ...
.. ................... . .....................
. . .................................................................. . . .. .. ................... .... ..
. .... . .. .. ... ... .. . ........................................ .
.. ....................................................................................... . . ..........................
....................
. . ................................................................. .............. ....... .
... . . . . . ......................................................... ..
. . . . .. . ................................................................ ....
. . .. .
. . . .............................................. . .
. . . . ..
sqrt(v*)
sqrt(v*)
77 78 79 80 81 -15 -10 -5 0 5
t* z*
Key points
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Variance estimation
Double bootstrap
Delta method
◮ Computation of variance formulae for functions of averages
and other estimators
◮ b estimates ψ = g(θ), and θb ∼. N (θ, σ 2 /n)
Suppose ψb = g(θ)
◮ Then provided g′ (θ) 6= 0, have (D2)
b = g(θ) + O(n−1 )
E(ψ)
b = σ 2 g′ (θ)2 /n + O(n−3/2 )
var(ψ)
b = . 2 ′ b2
◮ Then var(ψ) σ
b g (θ) /n = V
◮ Example (D3): θb = Y , ψb = log θb
.
b =
◮ Variance stabilisation (D4): if var(θ) S(θ)2 /n, find
.
b =constant
transformation h such that var{h(θ)}
◮ Extends to multivariate estimators, and to
ψb = g(θb1 , . . . , θbd )
Computation of lj
◮ Write θb in weighted form, differentiate with respect to ε
◮ Sample average:
1X X
θb = y = yj = wj yj
n wj ≡1/n
Change weights:
wj 7→ ε + (1 − ε) n1 , wi 7→ (1 − ε) n1 , i 6= j
so (D5)
y 7→ y ε = εyj + (1 − ε)y = ε(yj − y) + y,
P
giving lj = yj − y and vL = n12 (yj − y)2 = n−1 n n s
−1 2
sample version is
Z Z
θb = t(Fb) = x dFb(u, x)/ u dFb(u, x) = x/u
xu − x u
θb = n o1/2 ,
(x2 − x2 )(u2 − u2 )
Jackknife
◮ Approximation to empirical influence values given by
y1 , . . . , yj−1 , yj+1 , . . . , yn
◮ Requires n + 1 calculations of θb
◮ b with
Corresponds to numerical differentiation of θ,
ε = −1/(n − 1)
Key points
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Ingredients
Examples
◮ Balsam-fir seedlings in 5 × 5 quadrats — Poisson sample?
0 1 2 3 4 3 4 2 2 1
0 2 0 2 4 2 3 3 4 2
1 1 1 1 4 1 5 2 2 3
4 1 2 5 2 0 3 2 1 1
3 1 4 3 1 0 0 2 7 0
◮ Two-way layout: row-column independence?
1 2 2 1 1 0 1
2 0 0 2 3 0 0
0 1 1 1 2 7 3
1 1 2 0 0 0 1
0 1 1 1 1 0 0
Estimation of pobs
◮ Estimate pobs by simulation from fitted null hypothesis
model Mc0 .
◮ Algorithm: for r = 1, . . . , R:
c0 ;
• simulate data set y1∗ , . . . , yn∗ from M
• calculate test statistic t∗r from y1∗ , . . . , yn∗ .
◮ Calculate simulation estimate
1 + #{t∗r ≥ tobs }
pb =
1+R
of
c0 ).
pbobs = Pr(T ≥ tobs | M
◮ Simulation and statstical errors:
pb ≈ pbobs ≈ pobs
c0
Handedness data: Bootstrap from M
1
8
2.5
7
1
3
0.5
2 2
2
2 1 1102 3 4 3 1 0.0
1
Choice of R
Pivot tests
◮ Equivalent to use of confidence intervals
◮ Idea: use (approximate) pivot such as Z = (θb − θ)/V 1/2 as
statistic to test θ = θ0
◮ Observed value of pivot is zobs = (θb − θ0 )/V 1/2
◮ Significance level is
!
θb − θ
Pr ≥ zobs | M0 = Pr(Z ≥ zobs | M0 )
V 1/2
= Pr(Z ≥ zobs | F )
.
= Pr(Z ≥ zobs | Fb)
◮ Compare observed zobs with simulated distribution of
Z ∗ = (θb∗ − θ)/V
b ∗1/2 , without needing to construct null
hypothesis model M c0
◮ Use of (approximate) pivot is essential for success
Anthony Davison: Bootstrap Methods and their Application, 66
Tests
−6 −4 −2 0 2 4 6
Test statistic
Exact tests
Figure: Simulation results for dispersion test. Left panel: R = 999 val-
ues of the dispersion statistic t∗ obtained under multinomial sampling:
the data value is tobs = 55.15 and pb = 0.25. Right panel: chi-squared
plot of ordered values of t∗ , dotted line shows χ249 approximation to
null conditional distribution.
.
0.04
80
...
dispersion statistic t*
.
....
0.03
......
..........
.
.....
......
60
......
.....
0.02
..
.........
.
......
......
.....
40
......
0.01
.
........
.
..
.....
...
.
..
0.0
20
20 40 60 80 30 40 50 60 70 80
t* chi-squared quantiles
3.0
8
Probability density
65
hand
4 3
0.5
2
0.0
1
Contingency table
1 2 2 1 1 0 1
2 0 0 2 3 0 0
0 1 1 1 2 7 3
1 1 2 0 0 0 1
0 1 1 1 1 0 0
◮ Are row and column classifications independent:
Pr(row i, column j) = Pr(row i) × Pr(column j)?
◮ Standard test statistic for independence is
X (yij − ybij )2 yi· y·j
T = , ybij =
ybij y··
i,j
.
◮ Get Pr(χ224 ≥ 38.53) = 0.048, but is T ∼ χ224 ?
y j 1 k1 y j 1 k2
y j 2 k1 y j 2 k2
Key points
Motivation
Basic notions
Confidence intervals
Several samples
Variance estimation
Tests
Regression
Linear regression
◮ Independent data (x1 , y1 ), . . ., (xn , yn ) with
yj = xTj β + εj , εj ∼ (0, σ 2 )
◮ b leverages hj , residuals
Least squares estimates β,
yj − xTj βb .
ej = ∼ (0, σ 2 )
(1 − hj )1/2
◮ Design matrix X is experimental ancillary — should be
held fixed if possible, as
b = σ 2 (X T X)−1
var(β)
if model y = Xβ + ε correct
Cement data
Table: Cement data: y is the heat (calories per gram of cement) evolved
while samples of cement set. The covariates are percentages by weight
of four constituents, tricalciumaluminate x1 , tricalcium silicate x2 , tet-
racalcium alumino ferrite x3 and dicalcium silicate x4 .
x1 x2 x3 x4 y
1 7 26 6 60 78.5
2 1 29 15 52 74.3
3 11 56 8 20 104.3
4 11 31 8 47 87.6
5 7 52 6 33 95.9
6 11 55 9 22 109.2
7 3 71 17 6 102.7
8 1 31 22 44 72.5
9 2 54 18 22 93.1
10 21 47 4 26 115.9
11 1 40 23 34 83.8
12 11 66 9 12 113.3
13 10 68 8 12 109.4
Cement data
y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + ε
Cement data
10
10
.
. .
. ..
5
5
. . . . ..... .
. . .
. . . .. ..... .................................................................... . . . .
...... .. ........ ..
beta1hat*
beta2hat*
. . .... . .
.. .. .............................................................. . .. .................................. .............................. .
........
. . .. ... .. ................................................................. . . ................
.. . .. ... ... ................................................................................
.. . .... ................ .. . . ... ..................... . ... .
0
0
. . ... ..... . . . . .......... ............... ..
.. .. ... ..
. .. . .
-5
-5
. .
.
.
. .
-10
-10
1 5 10 50 500 1 5 10 50 500
smallest eigenvalue smallest eigenvalue
Cement data
Survival data
dose x 117.5 235.0 470.0 705.0 940.0 1410
survival % y 44.000 16.000 4.000 0.500 0.110 0.700
55.000 13.000 1.960 0.320 0.015 0.006
6.120 0.019
• ••
4
50
••
•
2
•
40
•
log survival %
survival %
•
30
0 • •
•
20
-2
• •
•
10
-4
•
• ••
• • • • •
0
Survival data
◮ Case resampling
◮ Replication of outlier: none (0), once (1), two or more (•).
◮ Model-based sampling including residual would lead to
change in intercept but not slope.
0.0
• •
• •• •
-0.004
•
estimated slope
• •
1 • • •• •
• ••
1 ••• • 1 •••••• ••• •••
111 1 11 1 ••• • •••••••• •••••
1111 1
1
11
11111111 11
1
11111
1
1 1 11
11 1111111111 1 1111
•
0 1
11 11
1 1 11 1 1 11 1 1 1
0 0 01000 00 0
-0.008
0 0 0 0
0 000 0 00000 0 0 0 0 0
0 0 00 0 000
0
0 0000
0 00 000 00
0000
00 0
-0.012
AIDS data
◮ Log-linear model: number of reports in row j and column k
follows Poisson distribution with mean
µjk = exp(αj + βk )
◮ Log link function
g(µjk ) = log µjk = αj + βk
and variance function
var(Yjk ) = φ × V (µjk ) = 1 × µjk
◮ Pearson residuals:
Yjk − µbjk
rjk =
µjk (1 − hjk )}1/2
{b
◮ Model-based simulation:
∗ 1/2
Yjk =µ bjk ε∗jk
bjk + µ
Anthony Davison: Bootstrap Methods and their Application, 86
Regression
AIDS data
◮ Poisson two-way model deviance 716.5 on 413 df —
indicates strong overdispersion: φ > 1, so Poisson model
implausible
◮ Residuals highly inhomogeneous — exchangeability
doubtful
500
6
400
4
2
Diagnoses
+ ++
300
+ + +
+ ++ ++ + rP
0
+ + +
+++
200
++ -2
++
++++ +
100
+
-4
++
+++
+++++
-6
0
• Prediction error ∗
y+,j b∗+,j
−µ
∗1/2
µ
b+,j
studentized so more nearly pivotal.
◮ Form prediction intervals from R replicates.
◮ Resampling schemes:
• parametric simulation, fitted Poisson model
• parametric simulation, fitted negative binomial model
• nonparametric resampling of rP
• stratified nonparametric resampling of rP
◮ Stratification based on skewness of residuals, equivalent to
stratifying original data by values of fitted means
◮ Take strata for which
µ
bjk < 1, 1≤µ
bjk < 2, µ
bjk ≥ 2
6
5
600
4
4
3
3
2
500
1
1
0
Diagnoses
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5
400
deviance/df deviance/df
+ ++
6
300
+ +
5
+ ++ + +
+
4
+
+ ++ +
3
200
+
2
2
1
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5
deviance/df deviance/df
600
500
Diagnoses
Diagnoses
+ ++ 400
+ +
+ ++ ++ + +
+++++ +
+++ + ++
300
+++++ + + + +
+ + + ++ +
++ +
++++ + ++ +
++++
200
+
0
Key points
Summary
Books
◮ Chernick (1999) Bootstrap Methods: A Practicioner’s
Guide. Wiley
◮ Davison and Hinkley (1997) Bootstrap Methods and their
Application. Cambridge University Press
◮ Efron and Tibshirani (1993) An Introduction to the
Bootstrap. Chapman & Hall
◮ Hall (1992) The Bootstrap and Edgeworth Expansion.
Springer
◮ Lunneborg (2000) Data Analysis by Resampling: Concepts
and Applications. Duxbury Press
◮ Manly (1997) Randomisation, Bootstrap and Monte Carlo
Methods in Biology. Chapman & Hall
◮ Shao and Tu (1995) The Jackknife and Bootstrap. Springer