0% found this document useful (0 votes)
14 views

L8 Bootstrap Methods

Uploaded by

yuebb2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

L8 Bootstrap Methods

Uploaded by

yuebb2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.

5 Confidenc

Bootstrap Methods

MAST90083 Computational Statistics and Data Mining

Karim Seghouane
School of Mathematics & Statistics
The University of Melbourne

Bootstrap Methods 1/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Outline

§6.1 Introduction

§6.2 Bootstrap principle

§6.3 Parametric vs Nonparametric

§6.4 Bias correction

§6.5 Confidence Interval

§6.6 Bootstrap prediction error estimation

Bootstrap Methods 2/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Introduction

I Bootstrap methods use computer simulation to reveal aspects


of the sampling distribution for an estimator θ̂ of interest.
I With the power of modern computers the approach has broad
applicability and is now a practical and useful tool for applied
statisticians and data scientists

Bootstrap Methods 3/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Introduction

I The bootstrap is a general tool for assessing statistical


accuracy
I It is based on re-sampling strategy
I Having the estimated feature of the data that we compute
based on the sample on hand, we are interested to understand
how the estimate changes for a different sample
I Examples of features: prediction accuracy, the mean value,
etc.
I But unfortunately we cannot use more than one sample
I Solution: bootstrap

Bootstrap Methods 4/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Introduction

I The idea behind the bootstrap is an old one.


I Assume we wish to estimate a functional of a population
distribution function F , such as the population mean
Z
θ = xdF (x)

I Consider employing the same functional of the sample (or


empirical) distribution function F̂ , which in this case leads to
the sample mean
Z
θ̂ = xd F̂ (x) = x̄

Bootstrap Methods 5/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Introduction

I One can use θ̂ = x̄ to estimate θ.


I Evaluating the variability in this estimation would require the
sampling distribution of x̄.

Bootstrap Methods 6/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Empirical distribution

I The empirical distribution is that probability measure that


assigns to a set a measure equal to the proportion of samples
that lie in that set
n
1X
fˆ (x) = δ (x − xi )
n
i=1
I for a set x1 , ..., xn of i.i.d from f , where δ (x − xi ) represents a
“point mass” at xi (that assigns full probability to the point xi
and zero to all other points).
I fˆ is the discrete probability density function that assigns a
mass 1/n to each point xi , 1 ≤ i ≤ n.
I By the L.L.N. fˆ →p f as n → ∞.

Bootstrap Methods 7/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Sample and resample

I A sample X = {x1 , ..., xn } is a collection of n numbers (or


vectors), without regard to order drawn at random from the
population F .
I The xi0 s are therefore i.i.d. random variables each having the
population distribution function F
I A resample X ∗ = {x1∗ , ..., xn∗ } is an unordered collection of n
items randomly drawn from X with replacement
I It is known as a bootstrap sample and is a central step of the
nonparametric bootstrap method

Bootstrap Methods 8/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Resample

I Each xi∗ has probability 1/n of being equal to any given one
xj0 s
1
P (xi∗ = xj |X ) =
, 1 ≤ i, j ≤ n
n
I The xi∗ ’s are i.i.d. conditional on X .
I X ∗ is likely to contain repeats, all of which must be listed in
X ∗.
I Example: X ∗ = {1.5, 1.7, 1.7, 1.8} is different from
{1.5, 1.7, 1.8} and X ∗ is the same as {1.5, 1.7, 1.8, 1.7},
{1.7, 1.5, 1.8, 1.7}.

Bootstrap Methods 9/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Population and sample distribution

I F is the population distribution of X whereas F̂ is its sample


distribution.
I F̂ on the other hand is the distribution function of the
population from which X ∗ is drawn.
 
I F , F̂ is generally written (F0 , F1 ) in bootstrap iteration,
where i ≥ 1.
I Fi denotes the distribution function of a sample drawn from
Fi−1 conditional on Fi−1 .
I The i th application of the bootstrap is termed i th iteration,
not the (i − 1)th iteration

Bootstrap Methods 10/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Estimation as functional

I An estimate θ̂ is a function of the data and a functional of the


sample distribution function F̂
I Example: The sample mean
n
1X
θ̂ = θ [X ] = xi
n
i=1
  Z
θ̂ = θ F̂ = xd F̂ (x)

I whereas the population mean


Z
θ = θ (F ) = xdF (x).

Bootstrap Methods 11/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

The main idea is to use sampling from the sample to model


sampling from the population.

Bootstrap Methods 12/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

I Assume we can’t observe the “level 0 of a of multi level car


park” → it represents the population in a sampling scheme
I We wish to estimate the number n0 of cars on this level.
I Let ni denotes the number of cars in the level i
I Assuming the ration of n1 /n2 close to the ratio n0 /n1 , we
have n̂0 ' n12 /n2
I The key feature of this argument is our hypothesis that the
relationship between n2 and n1 should closely resemble that
between n1 and the unknown n0

Bootstrap Methods 13/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

I Statistical inference amounts to describing the relationship


between a sample and the population from which the sample
is drawn
I Formally: Given a functional ft from a class {ft : t ∈ τ }, we
aim to find t0 such that

E {ft (F0 , F1 ) |F0 } = EF0 {ft (F0 , F1 )} = 0


I where F0 = F (population distribution) and F1 = F̂ (sample
distribution)
I we want to find t0 the solution of the population equation
(because we need properties of the population to solve this
equation exactly)

Bootstrap Methods 14/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle
Example:

I Let θ0 = θ(F ) = θ(F0 ) be the true parameter value, such as


the r th power of a mean
Z r
θ0 = xdF0 (x)

I Let θ̂ = θ(F1 ) be the bootstrap estimate of θ0


Z r
θ̂ = xdF1 (x) = x̄ r

I where F1 is the empirical distribution function

Bootstrap Methods 15/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example: Bias correction

I Correcting θ̂ for bias is equivalent to finding t0 that solves

EF0 {ft (F0 , F1 )} = 0


I where

ft (F0 , F1 ) = θ(F1 ) − θ(F0 ) + t


I and the bias corrected estimate is θ̂ + t0

Bootstrap Methods 16/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example: Confidence interval


I To construct a symmetric, (1 − α) confidence interval for θ0 is
equivalent to using

ft (F0 , F1 ) = I {θ(F1 ) − t ≤ θ(F0 ) ≤ θ(F1 ) + t} − (1 − α)


I where I (.) denotes the indicator of the event that the true
parameter value θ(F0 ) lies in the interval
h i
[θ(F1 ) − t0 , θ(F1 ) + t0 ] = θ̂ − t0 , θ̂ + t0
I minus the nominal coverage 1 − α of the interval. Asking that

E {ft (F0 , F1 ) |F0 } = 0


I is equivalent to insisting that t be chosen so that the interval
has zero coverage error.
Bootstrap Methods 17/69
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

I The equation

E {ft (F0 , F1 ) |F0 } = 0


I provides an explicite description of the relationship between F0
and F1 we are trying to determine.
I The analogue in the case of the number of cars problem is

n0 − tn1 = 0
I where ni is the number of cars on level “i”

Bootstrap Methods 18/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

I If we had t = t0 solving the equation, then n0 = t0 n1 .


I The estimation of t0 is obtained from the pair (n1 , n2 ) we
know

n1 − tn2 = 0
I we obtain the solution t̂0 of this equation and thereby

n12
n̂0 = t̂0 n1 =
n2
I is the estimate of n0

Bootstrap Methods 19/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

I Similarly, the population equation

E {ft (F0 , F1 ) |F0 } = 0


I is solved via the sample equation

E {ft (F1 , F2 ) |F1 } = 0


I where F2 is the distribution function of a sample drawn from
F1 is the analogue of n2 .

Bootstrap Methods 20/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

I The solution t̂0 is a function of the sample values


I The idea is that the solution of the sample equation should be
a good approximation of the solution of the population
equation
I The population equation is not obtainable in practice
I → this is the bootstrap principle.

Bootstrap Methods 21/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap principle

I We call t̂0 and E {ft (F1 , F2 ) |F1 } “the bootstrap estimates” of


t0 and E {ft (F0 , F1 ) |F0 }.
I They are obtained by replacing F0 and F1 in the formulae for
t0
I The bootstrap version of the bias corrected estimate is θ̂ + t̂0
h i
I The bootstrap confidence interval is θ̂ − t̂0 , θ̂ + t̂0 called the
symmetric percentile method confidence interval for θ0

Bootstrap Methods 22/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Parametric vs Nonparametric

I In both parametric and nonparametric problems, the inference


is based on a sample X of size n (n i.i.d. observations of the
population)
I In nonparametric case F1 , is the empirical distribution
function of X
I Similarly F2 is the empirical distribution function of a sample
drawn at random from the population F1

Bootstrap Methods 23/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Nonparametric

I It is the empirical distribution of a sample X ∗ drawn randomly


with replacement from X
I If we denote the population by X0 , then we have a nest of
sampling operations
I X is drawn at random from X0
I X ∗ is drawn at random from X

Bootstrap Methods 24/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Parametric

I In this case F0 is assumed completely known up to a finite


vector λ0 of unknown parameters.

I F0 = F(λ0 ) is an element of a class F(λ) , λ ∈ Λ of possible
distributions
I Then F1 = F(λ̂) , the distribution function obtained using the
sample estimate λ̂ obtained from X often (but not necessary)
using maximum likelihood estimate
I Let X ∗ denotes the sample drawn at random from F(λ̂) and
F2 = F(λ̂∗ )
I In both cases, X ∗ is obtained by resampling from a
distribution determined by the original sample X

Bootstrap Methods 25/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example

I Estimate of the MSE


 2 n o
τ 2 = E θ̂ − θ0 = E [θ(F1 ) − θ(F0 )]2 |F0
I has bootstrap estimate
 2  n o
2
τ̂ = E θ̂ − θ̂ |X = E [θ(F2 ) − θ(F1 )]2 |F1

I where θ̂∗ = θ [X ∗ ] is an estimate version of θ̂ obtained using


X ∗ instead of X

Bootstrap Methods 26/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bias correction

I Here we have

ft (F0 , F1 ) = θ(F1 ) − θ(F0 ) + t


I and the sample equation

E {ft (F1 , F2 ) |F1 } = E {θ(F2 ) − θ(F1 ) + t|F1 } = 0


I whose solution is

t = t̂0 = θ(F1 ) − E {θ(F2 )|F1 }

Bootstrap Methods 27/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bias correction

I The bootstrap bias-reduced estimate is this

θ̂1 = θ̂ + t̂0 = θ(F1 ) + t̂0 = 2θ(F1 ) − E {θ(F2 )|F1 }


I Note that the estimate θ̂ = θ(F1 ) is also a bootstrap
functional since it is obtained by replacing F1 for F0 in the
functional formula θ0 = θ(F0 ).
I the expectation E {θ(F2 )|F1 } is computed (or approximated)
by Monte Carlo simulation

Bootstrap Methods 28/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bias correction

I Draw B resamples {Xb∗ , 1 ≤ b ≤ B} independently from the


distribution function F1
I In the nonparametric case F1 is the empirical distribution of
the the sample X
I Let F2b denotes the empirical distribution function of Xb∗

I In the parametric case, λ̂b = λ (Xb∗ ) is the estimate of λ0
obtained from Xb∗ and F2b = F(λ̂∗ )
b

Bootstrap Methods 29/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bias correction

I Define θ̂b∗ = θ(F2b ), then

B B
1 X −1
X
ûB = θ(F2b ) = B θ̂b∗
B
b=1 b=1
I converge to (as B → ∞)
n o
û = E {θ(F2 )|F1 } = E θ̂∗ |X

Bootstrap Methods 30/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example

I Let
Z
µ= xdF0 (x) and assume θ0 = θ(F0 ) = µ3

I X = {x1 , ..., xn } and


n
1X
x̄ = xi
n
i=1
I In nonparametric approach

θ̂ = θ(F1 ) = x̄ 3

Bootstrap Methods 31/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example

I In nonparametric approach
 !3 
 1Xn 
E {θ(F1 )|F0 } = EF0 xi
 n 
i=1
 !3 
n
1 X 
=E µ+ (xi − µ)
 n 
i=1

= µ3 + n−1 3µσ 2 + n−2 γ


I where σ 2 = E (x − µ)2 and γ = E (x − µ)3 denote the
population variance and skewness

Bootstrap Methods 32/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example

I In the nonparametric case

E {θ(F2 )|F1 } = x̄ 3 + n−1 3x̄ σ̂ 2 + n−2 γ̂


I where σ̂ 2 = n−1 (xi − x̄) and γ̂ = n−1 (xi − x̄)3 denote
P P
the sample variance and skewness
I Therefore the bootstrap bias reduced estimate is

θ̂1 = 2θ(F1 ) − E {θ(F2 )|F1 } = 2x̄ 3 − x̄ 3 + n−1 3x̄ σ̂ 2 + n−2 γ̂




= x̄ 3 − n−1 3x̄ σ̂ 2 − n−2 γ̂

Bootstrap Methods 33/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example

I If the population is normal N(µ, σ 2 ), γ = 0 and

E {θ(F1 )|F0 } = µ3 + n−1 3µσ 2


I The maximum likelihood could be used to estimate
λ̂ = x̄, σ̂ 2


I θ(F2 ) is the statistic θ̂ computed for a sample from a normal


x̄, σ̂ 2 distribution and in direct analogy we have


E {θ(F2 )|F1 } = x̄ 3 + n−1 3x̄ σ̂ 2


I Therefore

θ̂1 = 2θ(F1 ) − E {θ(F2 )|F1 } = x̄ 3 − n−1 3x̄ σ̂ 2

Bootstrap Methods 34/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example

I The estimate θ̂1 represent improvement in the sense of bias


reduction on the basic bootstrap estimate θ̂ = θ(F1 )
I To check the bias reduction observe that for general
distributions with finite third moments

E(x̄ 3 ) = µ3 + n−1 3µσ 2 + n−2 γ

E(x̄ σ̂ 2 ) = µσ 2 + n−1 (γ − µσ 2 ) − n−2 γ

E(γ̂) = γ 1 − 3n−1 + 2n−2




Bootstrap Methods 35/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Example

I In the case of general population

E(θ̂1 ) − θ0 = n−2 3(µσ 2 − γ) + n−3 6γ − n−4 2γ


I In the case of normal population

E(θ̂1 ) − θ0 = n−2 3µσ 2

Bootstrap Methods 36/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Remarks

I Therefore bootstrap bias reduction has diminished bias to at


most O n−2 in each case.


I This is compared with the bias of θ̂ which is of size n−1 unless


µ = 0.
I Bootstrap bias correction reduces the order of magnitude of
the bias by the factor n−1 .

Bootstrap Methods 37/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Nonparametric bootstrap

I Finding the ideal bootstrap estimate of E {ft (F1 , F2 )|F1 }


requires complete enumeration of F̂2 , which is not practical
when the sample size n is even moderate.
n
1X
F2 (x) = I (xi∗ ≤ x)
n
i=1

where X∗
= {x1∗ , ..., xn∗ }
is obtained by sampling randomly
with replacement from the original X = {x1 , ..., xn }.
I Instead, B i.i.d. samples, each of size n, are drawn from
F̂ = F1 , producing B nonparametric bootstrap samples.
Denote them as Xi∗ = {xi1 ∗ , · · · , x ∗ } iid
in = F1 for i = 1, · · · , B.

Bootstrap Methods 38/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Nonparametric bootstrap

I The empirical estimation of {θ̂(F2i ), i = 1, · · · , B} is used to


approximate the ideal bootstrap equation E {ft (F1 , F2 )|F1 }
which further approximates the population equation of
E {ft (F0 , F1 )|F0 }, allowing inference.
I The simulation error in approximating the ideal bootstrap of
E {ft (F1 , F2 )|F1 } can be made arbitrarily small by increasing
B.
I A key requirement of bootstrapping is that the data to be
resampled must be an i.i.d. sample.

Bootstrap Methods 39/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Parametric bootstrap

I When a parametric model is assumed for the data, namely


iid
x1 , · · · , xn = F (x|θ), the cdf F (x|θ) can be parametrically
estimated by F (x|θ̂) instead of being estimated by the
empirical cdf F̂ .
I To estimate the distribution of E {ft (F0 , F1 )|F0 }, one can
draw B i.i.d. samples, each of size n, from F (x|θ̂), producing
B parametric bootstrap samples. Denote them as
Xi∗ = {xi1 ∗ , · · · , x ∗ } iid
in = F (x|θ̂) for i = 1, · · · , B.

Bootstrap Methods 40/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Parametric bootstrap

 
I The empirical estimation {ft Xi∗ , F (x|θ̂) , i = 1, · · · , B} is
then used to approximate the ideal bootstrap equation
E {ft (F1 , F2 )|F1 } and further approximates the population
equation of E {ft (F0 , F1 )|F0 }
I If the parametric model is not good, the parametric
bootstrap can give misleading inference.

Bootstrap Methods 41/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrapping samples in regression

I Consider a multiple regression model, Yi = xT


i β + εi , for
iid
i = 1, · · · , n, where ε1 , · · · , εn = F with EF (εi ) = 0 and
VarF (ε) = σ 2 .
I The observed data are {z1 = (x1 , y1 ), · · · , zn = (xn , yn )}.
I It is wrong to generate bootstrap samples from {y1 , · · · , yn }
and from {x1 , · · · , xn } independently, because {y1 , · · · , yn }
are not i.i.d. samples.
I Two appropriate ways to construct bootstrap samples from
the observed data are bootstrap the residuals and
bootstrap the cases.

Bootstrap Methods 42/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrapping samples in regression

Bootstrap the residuals


1. Fit the regression model to the observed data. Obtain the
fitted responses ŷi = xT
i θ̂ and residuals ε̂i = yi − ŷi .
2. Bootstrap residuals from {ε̂1 , · · · , ε̂n } to get {ε̂∗1 , · · · , ε̂∗n }.
Note {ε̂1 , · · · , ε̂n } are not i.i.d. but roughly so if the
regression model is correct.
3. Create a bootstrap sample of responses: Yi∗ = ŷi + ε̂∗i for
i = 1, · · · , n.
4. Fit the regression model to {(x1 , Y1∗ ), · · · , (xn , Yn∗ )} to get
bootstrap estimate (θ̂∗ , σ̂ ∗ ) of (θ, σ).
5. Repeat this process B times to obtain {(θ̂1∗ , σ̂1∗ ), · · · (θ̂B∗ , σ̂B∗ )},
from which an empirical cdf (F2 ) can be built for inference.

Bootstrap Methods 43/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrapping samples in regression

Bootstrap the cases (also called paired bootstrap)


1. Treat the observed data {z1 = (x1 , y1 ), · · · , zn = (xn , yn )} as
i.i.d. from a cdf F (x, y ).
2. Create a bootstrap sample {Z∗1 , · · · , Z∗n } by sampling with
replacement from {z1 , · · · , zn }.
3. Fit the regression model to {Z∗1 , · · · , Z∗n } to get bootstrap
estimate (θ̂∗ , σ̂ ∗ ) of (θ, σ).
4. Repeat this process B times to obtain {(θ̂1∗ , σ̂1∗ ), · · · (θ̂B∗ , σ̂B∗ )},
from which an empirical cdf can be built for inference.
5. Bootstrapping the cases is less sensitive to violations in the
regression model assumptions (i.e. adequacy of the model and
constancy of σ 2 ) than bootstrapping the residuals.

Bootstrap Methods 44/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap Bias correction resume

I The population and sample equations are given by

E {θ(F1 ) − θ(F0 ) + t|F0 } = 0

E {θ(F2 ) − θ(F1 ) + t|F1 } = 0


I The solution of the latter is
n o
t = t̂ = θ(F1 ) − E {θ(F2 )|F1 } = θ̂ − E θ̂∗ |F1

Bootstrap Methods 45/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap Bias correction resume


I The bootstrap bias corrected estimator is thus
n o
θ̂1 = θ̂bc = θ̂ + t̂ = 2θ̂ − E θ̂∗ |F̂
n o
I Estimation of E θ̂∗ |F1 is obtained through numerical
approximation.
I Condition on X , we compute independent values of θ̂1∗ , ..., θ̂B∗
and take
B
1 X ∗
θ̂b
B
b=1
n o
to be the numerical approximation to E θ̂∗ |F̂

Bootstrap Methods 46/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap estimation of bias(θ̂) and se(θ̂)


q
I Bias(θ̂) = EF (θ̂) − θ and se(θ̂) = VarF (θ̂) are the two basic
attributes of the estimator θ̂ that we can use bootstrap
analysis to estimate.
I Suppose θ = T (F ) = θ(F0 ) and θ̂ = T (F̂ ) = θ(F1 ) or
θ̂ = T (F (·|θ̂)) for some functional T .
or
Let R(X , F ) = T (F̂ ) − T (F ) = T (F (·|θ̂)) − T (F ) = θ̂ − θ.
I Then bias(θ̂) = EF [R(X , F )] and Var(θ̂) = VarF [R(X , F )] are
population moments of R(X , F ), which can be estimated by
the population moments of the ideal bootstrap distribution of
R(X ∗ , F̂ ) or R(X ∗ , F (·|θ̂)) per the bootstrap principle.
I They can be further estimated by the sample moments of
R(X ∗ , F̂ ) or R(X ∗ , F (·|θ̂)), calculated from the bootstrap
samples.
Bootstrap Methods 47/69
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Nonparametric bootstrap estimation of bias(θ̂) and se(θ̂)


Computing steps for obtaining nonparametric bootstrap
estimates of bias(θ̂) and se(θ̂) are as following:
1◦ Compute θ̂ from the observed sample xn = (x1 , · · · , xn ).
2◦ Generate B (typically B ≥ 999) nonparametric bootstrap
samples of size n from the observed sample.
3◦ For each bootstrap sample, compute an estimate of θ in the
same way as estimating θ by θ̂. The new estimates of θ are
called the bootstrap replicates of θ̂ and are denoted as
θ̂1∗ , · · · , θ̂B∗ .
4◦ Compute θ̂∗ = B −1 B ∗
P
r =1 θ̂r and estimate
q bias(θ̂) by
1 PB
bB (θ̂) = θ̂∗ − θ̂; compute seB (θ̂) = B−1 ∗ ∗ 2
r =1 (θ̂r − θ̂ )
and estimate se(θ̂) by seB (θ̂).

Bootstrap Methods 48/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Parametric bootstrap estimation of bias(θ̂) and se(θ̂)

Parametric bootstrap estimation proceeds the same way as the


nonparametric bootstrap estimation except for in step 2◦ where
bootstrap samples of size n are generated from F (x|θ̂).
Remark:
I A bootstrap estimate of MSE(θ̂) = EF [(θ̂ − θ)2 ] may be
obtained as MSEB (θ̂) = B1 B ∗ 2
P
r =1 (θ̂r − θ̂) .

Bootstrap Methods 49/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Confidence interval

Bootstrap replicates of θ̂, generated by the boot() function in


boot package, can be used to construct CIs for θ.
The boot package can compute 5 types of bootstrap CIs for θ:
1. Percentile (or basic percentile)
2. Normal approximation
3. Basic (or residual)
All of are computed using the boot.ci() function.

Bootstrap Methods 50/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Confidence interval

I A symmetric confidence interval for θ0 = θ(F0 ) may be


constructed by applying the resampling principle using

ff (F0 , F1 ) = I {θ(F1 ) − t ≤ θ(F0 ) ≤ θ(F1 ) + t} − (1 − α)

I This is for 100(1 − α)% confidence interval for θ0


I Example: For a confidence level α = 0.05 we have a 95%
confidence interval
I The sample equation

E {ff (F1 , F2 ) |F1 } = 0

Bootstrap Methods 51/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Confidence interval

I leads to the equation

P {θ(F2 ) − t ≤ θ(F1 ) ≤ θ(F2 ) + t|F1 } − (1 − α) = 0

I and

t̂0 = inf {t : P {θ(F2 ) − t ≤ θ(F1 ) ≤ θ(F2 ) + t|F1 } − (1 − α) ≥ 0}

I is a solution
h i
I θ̂ − t̂0 , θ̂ + t̂0 is a bootstrap confidence interval for
θ0 = θ(F0 ), called a two sided symmetric percentile interval

Bootstrap Methods 52/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Confidence interval

I Other nominal 100(1 − α)% percentile


h intervalsi include: the
two-sided equal-tailed interval θ̂ − t̂01 , θ̂ + t̂02 where t̂01 and
t̂02 solve
α
P {θ(F1 ) ≤ θ(F2 ) − t|F1 } −
=0
2
 α
P {θ(F1 ) ≥ θ(F2 ) + t|F1 } − 1 − =0
2
I It is called equal-tailed because it attempts to place equal
probability in each tail
    α
P θ0 ≤ θ̂ − t̂01 ≈ P θ0 ≥ θ̂ + t̂02 ≈
2

Bootstrap Methods 53/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Confidence interval

I The ideal form of this interval obtained by solving the


population equation rather than the sample equation does
place equal probability in each tail
 
I The one-sided interval −∞, θ̂ + t̂03 where t̂03 solves

P {θ(F1 ) ≤ θ(F2 ) + t|F1 } − (1 − α) = 0


I is also a nominal 100(1 − α)% percentile interval.

Bootstrap Methods 54/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Confidence interval

I Otherh 100(1 − α)% percentile


i interval
 are 
Î2 = θ̂ − t̂02 , θ̂ + t̂01 and Î1 = −∞, θ̂ + t̂04 where t̂04
solves

P {θ(F1 ) ≤ θ(F2 ) − t|F1 } − α = 0


 
I Define θ̂∗ = θ(F2 ), Ĥ(x) = P θ̂∗ ≤ x|X and
n o
Ĥ −1 (α) = inf x : Ĥ(x) ≥ α
h i
I Then Î2 = Ĥ −1 (α/2), Ĥ −1 (1 − α/2) and
 
Î1 = −∞, Ĥ −1 (1 − α)

Bootstrap Methods 55/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Normal approximation

I In many cases, θ̂−θ


se(θ̂)
≈d N(0, 1), e.g., when θ̂ is the MLE
I Then an approximate 100(1 − α)% CI for θ would be
θ̂ ± z1− α2 × se(θ̂) where z1− α2 = Φ(1 − α2 ).
I If bootstrap replicates are available, we use seB (θ̂) to estimare
seB (θ) (if otherwise difficult to estimate), and estimate θ by
θ̂ − bB (θ̂) = 2θ̂ − θ̂∗ (θ̂ − bias(θ̂) is an ”unbiased estimator” of
θ). This suggests the 100(1 − α)% normal approximation
based bootstrap CI for θ:
h i
(2θ̂ − θ̂∗ ) − z1− α2 · seB (θ̂), (2θ̂ − θ̂∗ ) + z1− α2 · seB (θ̂)

Bootstrap Methods 56/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Percentile bootstrap confidence intervals

h i
θ̂(∗α [B+1]) , θ̂((1−

α
)[B+1])
2 2

I Uses the distribution of the statistics from the bootstrap


directly for the percentiles approximation.
I Tends to be highly asymmetric.
I It is prone to bias and inaccurate coverage probabilities.
I Works better if θ is a location parameter.

Bootstrap Methods 57/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Percentile bootstrap confidence intervals

A justification on the percentile method bootstrap CI for θ


I Assume the existence of a continuous and strictly increasing
transformation ψ, and a continuous cdf H with symmetric pdf
d
(implying H(z) = 1 − H(−z)) such that ψ(θ̂) − ψ(θ) = H.
I This assumption is likely to be reasonable although it may be
difficult to find such ψ and H. However, it turns out that we
don’t need explicit specification of ψ and H.
I Now we know
h i
P hα/2 ≤ ψ(θ̂) − ψ(θ) ≤ h1−α/2 = 1 − α (1)

where hα is the α quantile of H.

Bootstrap Methods 58/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Percentile bootstrap confidence intervals


A justification on the percentile method CI for θ (continued)
I Applying the bootstrap principle to (1), we have
h i
1−α ≈ P ∗ hα/2 ≤ ψ(θ̂∗ ) − ψ(θ̂) ≤ h1−α/2
h i
= P ∗ hα/2 + ψ(θ̂) ≤ ψ(θ̂∗ ) ≤ h1−α/2 + ψ(θ̂)
h    i
= P ∗ ψ −1 hα/2 + ψ(θ̂) ≤ θ̂∗ ≤ ψ −1 h1−α/2 + ψ(θ̂) .(2)
   
I Hence ψ −1 hα/2 +ψ(θ̂) ≈ ξα/2 and ψ −1 h1−α/2 +ψ(θ̂) ≈ ξ1−α/2
with ξα being the α quantile of the ideal bootstrap
distribution P ∗ () of θ̂∗ = θ(F2 ) which can be estimated by
θ̂α∗ ≈ θ̂([B+1]α)
∗ , the sample quantiles (order statistic) from B
bootstrap replicates of θ̂∗ .

Bootstrap Methods 59/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Percentile bootstrap confidence intervals

A justification on the percentile method CI for θ (continued)


I On the other hand, (1) can be rewritten as
h    i
P ψ −1 hα/2 +ψ(θ̂) ≤ θ ≤ ψ −1 h1−α/2 +ψ(θ̂) = 1 − α
(3)
noting that H has a symmetric pdf so that hα/2 = −h1−α/2 .
I Therefore, by comparing (2) and (3) we know
h i h i
∗ ∗ ∗ ∗
θ̂α/2 , θ̂1−α/2 ≈ θ̂([B+1]α/2) , θ̂([B+1](1−α/2))

can serve as an approximate 100(1 − α)% C.I. for θ, which is


called the (basic) percentile bootstrap CI.

Bootstrap Methods 60/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Basic (or residual)


I Taking ψ to be the identical transformation, eq. (2) becomes

P[hα/2 ≤ θ̂ − θ ≤ h1−α/2 ] = 1 − α (4)

We call θ̂ − θ the residual of the estimator θ̂.


I By the bootstrap principle, hα ≈ (θ̂∗ − θ̂)α = θ̂α∗ − θ̂ where θ̂α∗
is the α sample quantile of θ̂∗ . Using this approximation, (4)

becomes P[θ̂α/2 ∗
− θ̂ ≤ θ̂ − θ ≤ θ̂1−α/2 − θ̂] ≈ 1 − α, which is
∗ ∗
P[2θ̂ − θ̂1−α/2 ≤ θ ≤ 2θ̂ − θ̂α/2 ] ≈ 1 − α.
I This suggests the following approximate 100(1 − α)% basic
(or residual) bootstrap CI for θ:
∗ ∗ ∗ ∗
[2θ̂−θ̂1−α/2 , 2θ̂−θ̂α/2 ] ≈ [2θ̂−θ̂([B+1](1−α/2)) ≤ θ ≤ 2θ̂−θ̂([B+1]α/2) ]

Bootstrap Methods 61/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation

Training error: we observe D = (xi , yi ), i = 1, . . . , N.

yi = f (xi ) + εi , where εi i.i.d E [εi ] = 0, E [ε2i ] = σε2


N
1 X 2
err
¯ = yi − fˆ(xi )
N
i=1

is not a true reflection of


" #
 2
Err = E y − fˆ(x)

where both y and x are drawn randomly from the population (or y
only when considering fixed x).

Bootstrap Methods 62/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation


Generally, err
¯ < Err,
N
" #
1 X  0 ˆ 0 2
Errin = E yi − f (xi )
N
i=1
we observe N new responses at each of the training xi , i = 1, . . . , N

I
¯ ] + Errin − E [err
Errin = E [err ¯]
| {z }
op
I
N
2 X
Errin − E [err
¯ ]= cov(ŷi , yi )
N
i=1
I
ˆ = err
Err ¯ + op
ˆ
Bootstrap Methods 63/69
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation

From D = (xi , yi ), i = 1, · · · , N obtain B bootstrap training


samples Db = (xi∗b , yi∗b ), i = 1, . . . , N, b = 1, . . . , B.
I For each Db estimate fˆ∗b and estimate the error from the
original sample
N
1 X 2
errb = yi − fˆ∗b (xi )
N
i=1

I The sample bootstrap estimate of the error

B B N
1 X 1 XX 2
errB = errb = yi − fˆ∗b (xi )
B NB
b=1 b=1 i=1

Bootstrap Methods 64/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation

I Another estimate of the error is given by

N
1 X  ∗b ˆ∗b 2
errb∗ = yi − f (xi )
N
i=1

I
B N
1 X X  ∗b ˆ∗b 2
errB∗ = yi − f (xi )
NB
b=1 i=1

Bootstrap Methods 65/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation

I
B
1 X
ˆ b = errb − errb∗ and op
op ˆB = opb
B
b=1
I
ˆ = err
Err ¯ + op
ˆB
However errb is underestimating Err because these are common
observations.

Bootstrap Methods 66/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation

The chance that the ith observations (xi , yi ) from D is selected at


least once to be in the Db is
h i 1
P (xi , yi ) ∈ Db = 1 − (1 − )n → 1 − e −1 ≈ 0.632
n

I Thus on average, about 37% of the observations in D are left


out of each bootstrap sample.

Bootstrap Methods 67/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation

I Each bootstrap sample contains about 0.632N distinct


observations
1 X 2
errOOBi = yi − fˆb (xi )
Bi
b∈Ci

Ci : set of indices of the bootstrap samples that do not


contain (xi , yi ).
Bi = |Ci | is the number of each bootstrap samples.

Bootstrap Methods 68/69


§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.5 Confidenc

Bootstrap prediction error estimation

I
N
1 X
errOOB = errOOB i
N
i=1
I h i
(0.632)
op
ˆB = 0.632 errOOB − err
I
ˆ (0.632) = 0.368err + op
Err ˆB
(0.632)

Bootstrap Methods 69/69

You might also like